Blog Post

SEO

Nadine

Wolff

published on:

24.05.2018

Visualizing Internal Linking with Gephi – Part 1: Preparing the Data

Table of Contents

No table of contents available
No table of contents available
No table of contents available

Internal linking is an important topic for search engine optimization and website usability. On the one hand, search engine crawlers must be able to capture all relevant pages, topic clusters, and a prioritization of these. On the other hand, the user should be able to navigate through the site quickly and easily and find the page they are looking for promptly. But how do you check if your internal linking concept is working?

gephi interne Verlinkung visualisieren

Visualizing Internal Linking: Step by Step

With Gephi, the internal linking of a website can be examined and visualized. Each landing page is represented as a single point, showing its relationship to another page (which page is linked and which pages link to it) and even in color to identify topic clusters within the website.

Gephi only uses the imported dataset for this. Everything else is calculated, excluding subjectivity. This tool is suitable for both agencies and companies, as new insights can always be gained with Gephi.

We explain step by step how to visualize internal linking and identify measures from it.

How do I install Gephi

To use Gephi, Java 1.8 must at least be installed.

The software download can be started directly from the manufacturer's website. The software is available for Windows, Mac OS X, and Linux. The installation should be executed with administrator rights and should proceed without problems. Easy enough.

When opening the program, the following error message may appear. It doesn't matter if Java was installed before or after Gephi's installation.

Gephi-Fehler: Cannot find Java 1.8 or higher

Fig.2 Gephi Error: Cannot find Java 1.8 or higher

The error occurs because Gephi does not recognize the path to the Java folder. Therefore, it must be entered manually in the corresponding file. In the gephi.conf file, the path to Java must be specified. This file must be opened with Notepad or a similar editor and is usually located under the path:

C:\Program Files\Gephi-0.9.2\etc (Or wherever Gephi is installed on the computer)

and contains the default entry “#jdkhome=“/path/to/jdk”” on line 11.

gephi-datei-mit-fehler

Fig.3 Content of gephi.conf File - Faulty

This must be replaced with “jdkhome=“C:\Program Files (x86)\Java\jre1.8.0_161”” (Here, the path to the Java folder must be specified. It's important to remove the # at the beginning of the line.)

gephi-datei-mit-fehler

Fig.4 Content of gephi.conf File - Correct

Afterwards, Gephi can be opened without problems.

How do I create the data table for Gephi

Unfortunately, Gephi cannot inherently display internal linking. In most cases, an additional tool is needed to crawl the domain and reproduce the internal linking. If not dealing with Amazon, Wikipedia, or Rakuten, the tool Screaming Frog should suffice. For larger sites, one should resort to Audisto, deepcrawl, or similar.

For a domain that has fewer than 500 URLs, the free version of Screaming Frog can be used. Beyond that, the paid version is required, which is cheaper than most competitors at £149 per year (approximately €170).

In the following, we will consider using Screaming Frog.

Crawling with Screaming Frog

For internal linking, we need the HTML pages of the domain. Other files such as JavaScript, images, CSS, or similar can be excluded directly before the crawl. This can be set under Configuration > Spider. However, this is only possible in the paid version, but saves a bit of work later.

Einstellungen Screaming Frog

Fig.5 Crawl Settings in Screaming Frog

The crawl can then be started by entering the correct URL of the homepage. Depending on the domain, this can take a few minutes to several hours. Once completed, all internal links can be exported. It's best to start the download under Bulk Export > Response Code > Success (2xx) inlinks.

Only the internal links with a status code of 200 (ok) as the target will thus be exported. This directly excludes redirects (300s) and error pages (400s). This table only needs to be prepared for Gephi before the domain can be represented and appropriate measures captured.

Preparing Data for Gephi with Excel

The file from Screaming Frog contains a lot of data that is not needed for Gephi. For this reason, the file needs to be prepared. Any spreadsheet program such as Excel or OpenOffice Calc can be used for this. We prefer Excel and will only address this program and its corresponding functions in the following steps. Ideally, proceed according to the following scheme:

  1. Remove Columns D – I: These columns contain information that is not needed in Gephi.

  2. Filter Column A for everything that does not contain “AHREF” and remove them: This leaves only the relevant links (This step can be skipped if the Spider Configuration was used)

  3. Remove Column A: This column is no longer needed after step 2

  4. Remove Duplicates from Columns A and B: If pages link to another page multiple times, Gephi still recognizes this as just one link. To reduce the file accordingly for Gephi, these should be removed. Mark both columns and then go to Data > Data Tools > Remove Duplicates

  5. Remove Self-Links: Many pages always link to themselves (Through main navigation, logo, footer, incorrectly set breadcrumb). These are also included in the data table, but are not useful for later representation. To remove these, use the following formula in column C: =IF(A1=B1,1,0) (Example for row 1 of the document). Apply this formula accordingly to all rows. Then filter within the column for “1” and remove all these rows. Afterwards, column C can be removed again.

  6. Export External Links: The table often contains links to external domains. To identify these, sort Column B alphabetically. Alternatively, you can search within the column for anything that does not contain the hostname of your own domain. These rows should definitely be removed.

  7. Remove Image and PDF Links: Images and PDFs are sometimes also linked within a text. This can best be filtered out in Column B. Into the filter search insert the common endings (.pdf, .png, .jpg, .gif) one after the other and then remove the rows.

  8. Sort Out Non-Relevant URLs: Depending on the domain, this step can take more or less time, but it's all the more important because otherwise the image in Gephi will be filled with unnecessary data, making the graphic less clear. It's best to open a second Excel file and paste Columns A and B of the original document one below the other in Column A of the new document. Then all duplicates from this new column must be removed. Sort this column. To get a clear picture in Gephi, the list has to be gone through and all pages captured that are irrelevant for internal linking (Privacy, T&Cs, Imprint, Paginated pages, Search, etc.) and/or provide no added value for the search engine (Are on noindex). These must be removed from the original document from Column A and Column B. Search and remove each relevant page individually within the column. This can also be done more simply via a VLOOKUP.

  9. Rename Column B to “Target”: For Gephi, Column B must be renamed, otherwise the tool cannot process the file.

  10. Remove Hostname: Since we are only looking at internal linking and only such links are now contained in the rows, the hostname should now be removed. This simplifies the visualization later in Gephi. Press CTRL + H and search for the hostname (example for our domain: https://www.internetwarriors.de) and then click replace (Important: The slash after the hostname should not be removed, otherwise the homepage will disappear)

  11. Combine Columns: Gephi can only process CSV files. Accordingly, the Excel file must be prepared as a CSV. A concatenation formula can help with this. Simply enter this formula in Column C =A1&","&B1 (Example for row 1 in Column C). Apply this to all rows. The formula then has to be removed. To do this, highlight the entire Column C, press CTRL + C, right-click and select 'Paste Values' under Paste Options. The formula is now removed from the column and has been replaced by the values. Now Columns A & B can be removed, so the new values become the new Column A.

  12. Export as CSV: Now the file only needs to be exported as a CSV file. Under File > Save As with the file type CSV (MS-DOS).

In the second part, we will go more into the functionalities of Gephi and how visualization can occur in detail.

What can we do for you?

Are you looking for an agency for online marketing? We are happy to advise you from strategic creative plans to operational implementation along the new EU-GDPR. We look forward to your inquiry.

Nadine

Wolff

As a long-time expert in SEO (and web analytics), Nadine Wolff has been working with internetwarriors since 2015. She leads the SEO & Web Analytics team and is passionate about all the (sometimes quirky) innovations from Google and the other major search engines. In the SEO field, Nadine has published articles in Website Boosting and looks forward to professional workshops and sustainable organic exchanges.

no comments yet

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Newsletter