Canonicals

Dan Sharp

Posted 19 September, 2018 by Dan Sharp in

Canonicals

The canonicals tab shows canonical link elements and HTTP canonicals discovered during a crawl. The filters show common issues discovered for canonicals.

The rel=”canonical” element helps specify a single preferred version of a page when it’s available via multiple URLs. It’s a hint to the search engines to help prevent duplicate content, by consolidating indexing and link properties to a single URL to use in ranking.

The canonical link element should be placed in the head of the document and looks like this in HTML:

<link rel="canonical" href="https://www.example.com/" >

You can also use rel=”canonical” HTTP headers, which looks like this:

Link: <http://www.example.com>; rel="canonical"

Columns

This tab includes the following columns.

Address – The URL crawled.
Occurrences – The number of canonicals found (via both link element and HTTP).
Indexability – Whether the URL is indexable or Non-Indexable.
Indexability Status – The reason why a URL is Non-Indexable. For example, if it’s canonicalised to another URL.
Canonical Link Element 1/2 etc – Canonical link element data on the URL. The SEO Spider will find all instances if there are multiple.
HTTP Canonical 1/2 etc – Canonical issued via HTTP. The SEO Spider will find all instances if there are multiple.
Meta Robots 1/2 etc – Meta robots found on the URL. The SEO Spider will find all instances if there are multiple.
X-Robots-Tag 1/2 etc – X-Robots-tag data. The SEO Spider will find all instances if there are multiple.
rel=“next” and rel=“prev” – The SEO Spider collects these HTML link elements designed to indicate the relationship between URLs in a paginated series.

Filters

This tab includes the following filters.

Contains Canonical – The page has a canonical URL set (either via link element, HTTP header or both). This could be a self-referencing canonical URL where the page URL is the same as the canonical URL, or it could be ‘canonicalised’, where the canonical URL is different to the page URL.
Self Referencing – The URL has a canonical which is the same URL as the page URL crawled (hence, it’s self referencing). Ideally only canonical versions of URLs would be linked to internally, and every URL would have a self-referencing canonical to help avoid any potential duplicate content issues that can occur (even naturally on the web, such as tracking parameters on URLs, other websites incorrectly linking to a URL that resolves etc).
Canonicalised – The page has a canonical URL that is different to itself. The URL is ‘canonicalised’ to another location. This means the search engines are being instructed to not index the page, and the indexing and linking properties should be consolidated to the target canonical URL. These URLs should be reviewed carefully. In a perfect world, a website wouldn’t need to canonicalise any URLs as only canonical versions would be linked to, but often they are required due to various circumstances outside of control, and to prevent duplicate content.
Missing – There’s no canonical URL present either as a link element, or via HTTP header. If a page doesn’t indicate a canonical URL, Google will identify what they think is the best version or URL. This can lead to ranking unpredicatability, and hence generally all URLs should specify a canonical version.
Multiple – There’s multiple canonicals set for a URL (either multiple link elements, HTTP header, or both combined). This can lead to unpredictability, as there should only be a single canonical URL set by a single implementation (link element, or HTTP header) for a page.
Multiple Conflicting – Pages with multiple canonicals set for a URL that have different URLs specified (via either multiple link elements, HTTP header, or both combined). This can lead to unpredictability, as there should only be a single canonical URL set by a single implementation (link element, or HTTP header) for a page.
Non-Indexable Canonical – The canonical URL is a non-indexable page. This will include canonicals which are blocked by robots.txt, no response, redirect (3XX), client error (4XX), server error (5XX) or are ‘noindex’. Canonical versions of URLs should always be indexable, ‘200’ response pages. Therefore, canonicals that go to non-indexable pages should be corrected to the resolving indexable versions.
Canonical Is Relative – Pages that have a relative rather than absolute rel=”canonical” link tag. While the tag, like many HTML tags, accepts both relative and absolute URLs, it’s easy to make subtle mistakes with relative paths that could cause indexing-related issues.
Unlinked – URLs that are only discoverable via rel=”canonical” and are not linked-to via hyperlinks on the website. This might be a sign of a problem with internal linking, or the URLs contained in the canonical.
Invalid Attribute In Annotation – Pages with a rel=”canonical” annotation that includes an alternate version using an hreflang, lang, media, or type attribute. Adding certain attributes to the link element changes the meaning of the annotation to denote a different device or language version. These annotations are ignored and not used for canonicalisation by Google.
Contains Fragment URL – Pages with a rel=”canonical” that includes a fragment URL in the href attribute. Google generally doesn’t support fragment URLs. These annotations are ignored and not used for canonicalisation by Google.
Outside <head> – Pages with a canonical link element that is outside of the head element in the HTML. The canonical link element should be within the head element, or search engines will ignore it.

Please see our Learn SEO guide on canonicals, and our ‘How to Audit Canoncials‘ tutorial.

Dan Sharp

Dan Sharp is founder & Director of Screaming Frog. He has developed search strategies for a variety of clients from international brands to small and medium-sized businesses and designed and managed the build of the innovative SEO Spider software.