Identifying Common Issues With Duplicate Content

So, what’s duplicate content and why should we care?

For the purpose of this post and search engine optimisation (SEO) in general, we will define ‘duplicate content’ as substantial blocks of matching or nearly identical content published on multiple web pages or domains.

For example, if you publish content on one website or domain and then add the exact same piece to another website or domain, you’ll have added duplicate content to the web.

Why should we care about this? From an SEO perspective, the principal concern relates to the way in which search engines such as Google see and treat duplicate content. Simply put, they are not very keen on it and it’s fair to say that if you have big problems with duplicate content it can have a negative effect on the SEO performance of your website.

There are two main ways by which duplicate content issues can develop.

One is simply by people duplicating content; for example, conducting a simple copy/paste job instead of putting effort into creating unique content. The solution to this method is quite simple; write unique content for your website – do not copy and paste from other websites.

The second and probably most common cause of content duplication relates to technical issues. In this post, we’re going to describe some of the most common, as well as some ways and tools you can use to identify such problems.

Canonicalisation

Canonical issues are probably the most common cause of duplicate content, and what may appear on first glance to be a set of URLs pointing to the same content can look very different from a search engine’s perspective.

For example:

example.com
www.example.com
www.example.com/index.php
example.com/home.asp

To the search engine, the four URLs above are viewed as four separate pages. Therefore unless you implement a redirect or use canonical tags, search engines will likely treat each one as duplicate content.

This doesn’t include just the homepage (as in the example above) but in fact any other page which can be accessed via multiple URLs.

For example:

example.com/page/
www.example.com/page/

On top of the ‘www’ version and one without, you can also add ‘https’ and non-https versions of your URL to the list of duplicate pages search engines may identify:

http://www.example.com/page/
https://www.example.com/page/

This issue is very easy to identify and fix. You can do this by simply typing the different variations of the URLs into your browser’s address bar. There are some handy, free tools to check for redirects like this redirect path plugin.

How to fix it:

Google suggests implementing a 301 redirect as well as setting a preferred domain in your search console and being consistent with your internal linking. So, for example, if your preferred URL is: http://www.example.com/ then link internally to http://www.example.com/ and not http://example.com/

You can also use a canonical tag to indicate the preferred URL https://support.google.com/webmasters/answer/139066?hl=en#2

Thin page content and large boilerplate content

This often happens on eCommerce websites where we see a large amount of boilerplate content and very little unique content. This is most common on product pages which include big menus, large footers, lengthy delivery info (which is duplicated on every product page) and usually just one or two sentences of unique content in the product description.

When combined, these elements will make such pages appear as duplicate content. The solution to this problem is to increase the amount of unique content and to minimise unnecessary boilerplate content. Therefore in the example of our eCommerce product pages, instead of having a lengthy block of text featuring the delivery policy that is in turn duplicated across every single product page, you should instead create a dedicated returns page and link to it from the product pages.

CMS system creating duplication

It’s very important that you understand how your website’s content management system (CMS) works, as they can sometimes be the root cause of duplicate content. For example, a CMS can display the same content in many formats. A good example of this would be a blog post displayed on the blog’s homepage, within the blog category, buried in the archive and under the author’s own archive.

Faceted navigation

Faceted navigation, although often vary useful from the user experience perspective, can cause duplicate content issues.

This is a huge subject and we will not be able to cover it in this short post, but for now check this article on Google’s webmaster central blog.

Useful tools

Copyscape – quickly check for duplicates across the web.

Siteliner – check content duplication across your own website.

SEMrush – although not dedicate specifically to finding duplicate content, the site audit feature will help you find pages with duplicate content.

Wrap up

There you have it – some of the most common duplicate content issues. This is by no means an exhaustive list but we hope it will give you a good idea on some of the most common duplicate content issues you may encounter.