Code
Duplicate page found
Duplicate content is an SEO issue that arises from the same content appearing multiple times on different URLs on the same domain.
It's a problem for website owners and SEO professionals as there is no clear definition of duplicate content. What is known, however, is that duplicate content hurts rankings.
Duplicate content can exist when
- redirects are not properly set up,
- the server isn't correctly configured (with or without a baskslash at the end)
- content is published twice or more on a website under different URLs
Why it is important
In general, Google doesn't want to rank pages with duplicate content. And the Google bot will try to optimise the budget for a website or will simply guess which page the original page is.
-
To specify which URL that you want people to see in search results. You might prefer
people reach your green dresses product page via
https://www.example.com/dresses/green/greendress.html
rather thanhttps://example.com/dresses/cocktail?gclid=ABCD
. -
To consolidate link signals for similar or duplicate pages. It helps search engines to be
able to consolidate the information they have for the individual URLs (such as links to them)
into a single, preferred URL. This means that links from other sites to
http://example.com/dresses/cocktail?gclid=ABCD
get consolidated with links tohttps://www.example.com/dresses/green/greendress.html
. - To simplify tracking metrics for a single product or topic. With a variety of URLs, it's more challenging to get consolidated metrics for a specific piece of content.
- To manage syndicated content. If you syndicate your content for publication on other domains, you want to ensure that your preferred URL appears in search results.
- To avoid spending crawling time on duplicate pages. You want Googlebot to get the most out of your site, so it's better for it to spend time crawling new (or updated) pages on your site, rather than crawling the desktop and mobile versions of the same pages.
How the audit works
The duplicate content audit uses different signals to identify potential issues with duplicated content.
A page will be flagged as duplicate if:
- the body and html are exactly the same
- a page has a meta element indicating the canonical URL
- the hashing algorithm find a nearly-identical page
Nearly identical
The nearly-identical flag happens when the text and content are very close.
This audit is triggered if the page has more than 50 unique words. Page with lesser content are excluded from this audit.
Fixing the problem
Resolving the duplicate content problem is highly dependent on why it happens that the same content appears twice
If if is a server problem, when an url is both available with and without slash, it can resolved by setting up the webserver configuration
Exact or nearly identical
Content that is nearly the same or published twice on a website can be removed when one of the 2 (or many) is identified as the original one.
In order to avoid losing out on traffic, a web professional should consult the web analytics to identify which one receives the most traffic.
Once the primary document is identified, a redirect should be created for the duplicate pages. this would allow any traffic that still comes to the duplicate URLs, to arrive at the original page.