DE

EN

Blog Post

SEO

Nadine

Wolff

published on:

19.07.2018

The Perfect Sitemap

Table of Contents

No table of contents available
No table of contents available
No table of contents available

The use of sitemaps has important aspects for search engine optimization. The visibility of the website can be increased and the pages can be indexed.

In this article, we show you how to build a sitemap and which pages and media content need to be listed with which relevant attributes. This article focuses solely on the most important sitemap when it comes to SEO: the XML sitemap.


What is a sitemap?

A sitemap is a file that provides search engines with detailed information about all the pages and content of your website. Sitemaps use organization, navigation, and labeling systems to help Google and other search engines crawl and rank your website correctly.

Generally, your website benefits from a sitemap, as it provides search engines with a roadmap for crawling and indexing all the pages listed in the sitemap. There are several types of sitemaps, with HTML and XML sitemaps being relevant in the SEO field. The term HTML website essentially refers to website navigation.

Additionally, there are a number of other sitemaps:

  • Visual sitemaps

  • Alphabetical sitemaps

  • XML sitemaps

  • HTML sitemaps

  • Mobile sitemaps

  • News sitemaps

  • Video sitemaps

XML Sitemaps

The search engines receive a targeted overview of the structural setup of the website through the sitemap. The sitemap must be created in a "machine-readable" format. In this context, we speak of an XML sitemap.

XML stands for "Extensible Markup Language". The use of an XML sitemap offers many advantages. Website operators, in particular, can greatly benefit from integrating an XML sitemap onto their website.

  •  One advantage is that using an XML sitemap can help search engine crawlers locate and index new URLs on your website. For very large pages and pages that regularly have new or updated content, such as e-commerce websites, it is difficult for the crawler to find current pages and index them. An XML sitemap allows for more precise control of the bot.

  • If the content on your website is not optimally linked, the pages cannot reference each other. A sitemap can ensure that all relevant pages are seen by Google or other search engines.

  • If you have many content pages on your website that are not well linked and do not reference each other, you can use an XML sitemap to inform Google or other search engines about these pages and ensure they are not overlooked.

  • A sitemap can control crawlers and prioritize pages. This allows the bots to know which pages to focus on. This is done using the "priority" tag. Example: <priority>0.8</priority>

It is recommended to work with an index sitemap and to create so-called "sub-sitemaps" within it. For very large sites, it is advisable to create a separate sitemap for each category or main area of the site. Additionally, separate XML sitemaps should also be created for images, videos, and PDF files.

The reason for this splitting is, on the one hand, the size of a sitemap, which must not be exceeded (maximum 50MB), and better control of all indexed pages in the Google Search Console. Therefore, it is very important that the XML sitemap is submitted to the Google Search Console.

Example of a sitemap index file:

  • https://www.example-url.de/sitemap-index.xml

  • https://www.example-url.de/sitemaps-pages.xml

  • https://www.example-url.de/sitemaps-posts.xml

  • https://www.example-url.de/sitemaps-images.xml

  • https://www.example-url.de/sitemaps-videos.xml

  • https://www.example-url.de/sitemaps-media.xml

URL XML Sitemap

It is important to store the sitemap index file at the top level of the URL structure.

A complete sitemap includes these details:

  • URL

  • Lastmod: The date of the last change of the URL

  • Changefreq: The frequency with which the content of the page is changed (optional, as Google does not attach much importance to this indication)

  • Priority: The weighting of the individual page in relation to the other URLs of the website (although optional, it is relevant for controlling the crawler)

[caption id="attachment_21502" align="aligncenter" width="508"]

Code-Darstellung einer XML-Sitemap für eine URL

Representation of an XML sitemap with a single URL[/caption]

The use of information for the last change of the page, change frequency, and priority has no direct influence on the website's ranking. They are intended to make the Googlebot's work more efficient.

However, if unrealistic values are used, e.g., if the priority of all existing pages is set to 1 or the change date of all pages is always the same day, this can harm the quality of a sitemap. Google then ignores the information or, in the worst case, may not use the sitemap at all.

According to Google's statement, the most important information in the sitemap is the "lastmod" attribute, as it allows the bot to recognize whether the most current version of a URL has been indexed or if the URL needs to be re-crawled.

Further information on creating a URL sitemap can be found here.

Image XML Sitemap

Additionally, a special image sitemap should be created. This can pass image-specific information to search engines that ensure the images are better found in the search engines' image search. Such an image sitemap should list the image URL (mandatory), as well as the title and caption.

Here is a screenshot of a sample image sitemap:

Code-Ausschnitt einer Bilder Sitemap

Example of an image sitemap

Video XML Sitemap

For the videos embedded on the website, a video sitemap should also be created, which transfers specific information about the videos to the search engines. A video sitemap should at least include information about the URL where the video is embedded, as well as the title, description of the video, and the URL of the video itself or the YouTube link.

Code-Ausschnitt einer Video Sitemap

Example of a video sitemap

Further information on a video sitemap is available here.

It is not an issue to enter multiple sitemaps in the Google Search Console. For this reason, separate URL sitemaps, image sitemaps, and video sitemaps should be created for all main categories. This allows for significantly easier error analysis if not all URLs, images, or videos are indexed by Google.

Sitemap Generator – multiple ways to the sitemap

Many content management systems offer specialized plugins that automatically create sitemaps and also update them regularly. For WordPress, for instance, the SEO Plugin Yoast is a good option. Nowadays, there is also a Yoast extension for Typo3, which takes a lot of work off your hands.
Additionally, there are many online sitemap generators that provide various sitemaps. For example, the crawling tool Screaming Frog has a sitemap generator built-in.

Screenshot des crawling tools screaming frog mit sitemap konfigurationen

Generate sitemaps with Screaming Frog

 

Screaming Frog offers the option to choose specific configurations. For instance, the "Last Modified" attribute may not be considered during creation. However, the disadvantage here is that updating the sitemap must be done manually. There are also sitemap generators that offer an automatic update. Unfortunately, these are not always free.

Sitemap in the robots.txt and in the Google Search Console

It is important that the index sitemap is listed in the robots.txt. The robots.txt file gives instructions to the search engines' crawlers. By listing the path for the .xml file, the crawler finds the sitemap faster and can process it quickly.

Example of a robots.txt file:

User-agent: *Sitemap: https://www.example-url.de/sitemap-index.xml

The use of an index sitemap is mainly for the control of indexed pages, videos, or images. Therefore, the sitemap must be submitted in the Google Search Console. After some time, you will receive information on how many pages have been submitted (i.e., are present in the sitemap) and how many of them are in the Google index.

In the new Google Search Console, it is now also very easy to investigate which URLs have been submitted but were considered by Google during indexing. This allows for more detailed controls and the definition of further measures.


Abbildung eines Google Search Console Screenshots mit den Inhalten der Sitemap

Contents of the sitemap in the Google Search Console

Here are some facts about the sitemap:

  • The size of a sitemap can be up to 50 MB (or 50,000 URLs) according to official Google information. It can take a while to download such a large file. While Google is very critical of website loading time and sees this as a ranking factor, the loading time of the XML sitemap does not matter as long as there is no timeout.

  • In the image sitemap, no more than 1000 images may be specified per URL.

  • It is crucial to ensure that sitemaps are updated automatically. In many common CM systems, there are extensions that can generate sitemaps and keep them up to date daily.

  • Google recommends keeping the sitemap filenames of the index sitemap and the sub-sitemaps as identical as possible.

Conclusion

If you have correctly created, integrated, and submitted your XML sitemap to the Google Search Console, you indicate to Google which pages you consider to be of high quality and worthy of indexing. XML sitemaps are not magical methods that ensure automatic indexing but provide the Googlebot with important clues that increase your chances of indexing.


What can we do for you?

Would you like to create a sitemap on your website? Do you have questions about the correct integration or creation of a sitemap? We are happy to assist you in creating the perfect sitemap. We look forward to your inquiry.

Nadine

Wolff

As a long-time expert in SEO (and web analytics), Nadine Wolff has been working with internetwarriors since 2015. She leads the SEO & Web Analytics team and is passionate about all the (sometimes quirky) innovations from Google and the other major search engines. In the SEO field, Nadine has published articles in Website Boosting and looks forward to professional workshops and sustainable organic exchanges.

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter