Seo

Duplicate page found

Duplicate content is an SEO issue that arises from the same content appearing multiple times on different URLs on the same domain.

It's a problem for website owners and SEO professionals as there is no clear definition of duplicate content. What is known, however, is that duplicate content hurts rankings.

Duplicate content can exist when

redirects are not properly set up,
the server isn't correctly configured (with or without a baskslash at the end)
content is published twice or more on a website under different URLs

Why it is important

In general, Google doesn't want to rank pages with duplicate content. And the Google bot will try to optimise the budget for a website or will simply guess which page the original page is.

To specify which URL that you want people to see in search results. You might prefer people reach your green dresses product page via https://www.example.com/dresses/green/greendress.html rather than https://example.com/dresses/cocktail?gclid=ABCD.
To consolidate link signals for similar or duplicate pages. It helps search engines to be able to consolidate the information they have for the individual URLs (such as links to them) into a single, preferred URL. This means that links from other sites to http://example.com/dresses/cocktail?gclid=ABCD get consolidated with links to https://www.example.com/dresses/green/greendress.html.
To simplify tracking metrics for a single product or topic. With a variety of URLs, it's more challenging to get consolidated metrics for a specific piece of content.
To manage syndicated content. If you syndicate your content for publication on other domains, you want to ensure that your preferred URL appears in search results.
To avoid spending crawling time on duplicate pages. You want Googlebot to get the most out of your site, so it's better for it to spend time crawling new (or updated) pages on your site, rather than crawling the desktop and mobile versions of the same pages.

How the audit works

The duplicate content audit uses different signals to identify potential issues with duplicated content.

A page will be flagged as duplicate if:

the body and html are exactly the same
a page has a meta element indicating the canonical URL
the hashing algorithm find a nearly-identical page

Nearly identical

The nearly-identical flag happens when the text and content are very close.

This audit is triggered if the page has more than 50 unique words. Page with lesser content are excluded from this audit.

Fixing the problem

Resolving the duplicate content problem is highly dependent on why it happens that the same content appears twice

If if is a server problem, when an url is both available with and without slash, it can resolved by setting up the webserver configuration

Exact or nearly identical

Content that is nearly the same or published twice on a website can be removed when one of the 2 (or many) is identified as the original one.

In order to avoid losing out on traffic, a web professional should consult the web analytics to identify which one receives the most traffic.

Once the primary document is identified, a redirect should be created for the duplicate pages. this would allow any traffic that still comes to the duplicate URLs, to arrive at the original page.