DE

EN

Blog Post

SEO

Nadine

Wolff

published on:

23.06.2016

The Most Common Causes of Duplicate Content

Table of Contents

No table of contents available
No table of contents available
No table of contents available

Search engines want to deliver the most relevant information to users and organize their data efficiently. A "phenomenon" that counteracts this goal is duplicate content or double content. But how does duplicate content occur? We have compiled the most common causes of duplicate content for you and show you how to avoid such content.

The most common cause of duplicate content is the conscious or unconscious copying of content. Copying product descriptions, definitions, press releases, and other content from the web to publish on your own site poses a significant problem. Depending on the extent to which website operators follow this strategy, in the worst case scenario, search engines may penalize the affected pages. As a result, the keywords gradually disappear from the rankings.

Online shops are also affected. Each product with its own URL requires an individual product description. An alternative would be to exclude the product from indexing, which is rather counterproductive for online shops. After all, the goal of a shop is to sell goods. If potential buyers do not find the product pages in search engines, there is a risk that your shop will not be found and the goods will remain in stock.

Problem: Domains are accessible with and without www

Many websites are accessible with both www and without www. This is problematic because, from the perspective of search engines, all URLs are duplicated.

https://www.domain.de/https://domain.de/

Decide on a primary variant. Most website operators opt for the www variant. Make sure to redirect all internal links to this variant to optimally use the internal link juice. You have two options available to handle this problem and establish your main variant.

  • .htaccess file: With this file, you can specify that the variant without www permanently redirects to the variant with www.

  • Google Search Console: Google is aware of this issue. You have the option in Google Search Console to specify which variant you prefer.

Are trailing slashes problematic?

Anyone surfing the web notices different URL writing styles. There are URLs with and without a slash at the end. For example:

https://www.domain.de/https://www.domain.de

Strictly speaking, these are also two different URLs or documents, which can potentially cause duplicate content if they contain the same content. Even if Google can automatically canonize according to its announcement, it is advisable to use a consistent schema.

Homepage with or without index.html – what's the difference?

The homepage of a domain should never be reachable via multiple URLs.

https://www.domain.de

https://www.domain.de/

https://www.domain.de/index.html


As described, the trailing slash for modern browsers is not a problem, as they usually remove it before sending. This differs when it comes to specifying "index.html." Apply a -tag on the https://www.internetwarriors.de/index.html page. This tag prevents duplicate content and directs the complete link power to the correct URL. It tells the search engine that the homepage is always defined with https://www.internetwarriors.de/.

This tag looks like this:

<link rel="canonical" href=https://www.domain.de/>

Handling test and development servers

When website operators conduct major work on their sites, they usually create a copy of their site. They use this page to test new design elements or programming without affecting the public website. Depending on the size of the website, different people may work on this test system, so this test environment is usually accessible via the web. URLs for such test systems typically look as follows:

https://test.domain.de

https://www.domain.de/test/

https://www.test-domain.de


If you forget to protect the subdomain or path from search engine access, duplicate content arises since search engines index both the test and live pages.

To prevent this, the following options are available:

  • Protect the test page with a password using the .htaccess file.

  • Restrict access to all web crawlers through the robots.txt.

This ensures your test page does not end up in the search engine index, preventing duplicate content.

If you consider, providing the test page only with the noindex meta tag, remember to remove it when going live. Otherwise, you risk that search engines remove the public website from the index. However, this is only problematic if the test system replaces the current live system.

How to accurately depict print views

Websites often offer a print view. There are two options to implement this. In the output of the web page on the printer, you can style the URL differently through CSS media controls. This is done with a line in the head section of the page code. It can look like this:

Since this is the same document, this variant is safe and free from duplicate content.

Another variant is controlling the print view through a standalone URL or parameter. They can look like this:

https://www.domain.de/blog.html?print=1https://www.domain.de/blog-druckansicht.html

Since these are two different URLs with the same content, the likelihood of duplicate content at this point is high. It is recommended to exclude the print view from indexing. Furthermore, you should mark the link to the print view with a nofollow attribute.

Dealing with functional parameters

In shop systems and content management systems, functional parameters often exist to control views. These are usually parameters in product categories that, for example, sort by brand names or prices. They might look like this:

Sort by brand:https://www.domain.de/kategorie.html?sort=brand

Sort by price:https://www.domain.de/kategorie.html?sort=price

Note that these parameters only change the sorting, not the content. Therefore, duplicate content may arise. Prevent this by blocking the parameters using robots.txt, Google Search Console, or the noindex meta tag. Also consider this with session IDs, pagination, internal search, and product information like size or color.

What we can do for you

Are you concerned about duplicate content on your website or want to learn more about the topic? Contact us and we will help you identify duplicate content on your site and resolve the causes.

Nadine

Wolff

As a long-time expert in SEO (and web analytics), Nadine Wolff has been working with internetwarriors since 2015. She leads the SEO & Web Analytics team and is passionate about all the (sometimes quirky) innovations from Google and the other major search engines. In the SEO field, Nadine has published articles in Website Boosting and looks forward to professional workshops and sustainable organic exchanges.

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter

Address

Bülowstraße 66

Aufgang D3

10783 Berlin

Legal Information

Newsletter