Table of Contents
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Saving, opening, exporting & importing crawls
- User agent
- Checking memory allocation
- XML sitemap creation
- Crawl analysis
- Command line interface set-up
- Command line interface
- Search function
- User Interface
Spider Crawl Tab
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for on-page filters
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Spelling & grammar
- Robots.txt settings
- Custom robots.txt
- URL rewriting
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- User Interface
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Memory allocation
- Storage mode
Lower Window Tabs
Right Side Window Tabs
The canonicals tab shows canonical link elements and HTTP canonicals discovered during a crawl. The filters show common issues discovered for canonicals.
The rel=”canonical” element helps specify a single preferred version of a page when it’s available via multiple URLs. It’s a hint to the search engines to help prevent duplicate content, by consolidating indexing and link properties to a single URL to use in ranking.
The canonical link element should be placed in the head of the document and looks like this in HTML:
<link rel="canonical" href="https://www.example.com/" >
You can also use rel=”canonical” HTTP headers, which looks like this:
Link: <http://www.example.com>; rel="canonical"
- Address – The URL crawled.
- Occurrences – The number of canonicals found (via both link element and HTTP).
- Indexability – Whether the URL is indexable or Non-Indexable.
- Indexability Status – The reason why a URL is Non-Indexable. For example, if it’s canonicalised to another URL.
- Canonical Link Element 1/2 etc – Canonical link element data on the URL. The SEO Spider will find all instances if there are multiple.
- HTTP Canonical 1/2 etc – Canonical issued via HTTP. The SEO Spider will find all instances if there are multiple.
- Meta Robots 1/2 etc – Meta robots found on the URL. The SEO Spider will find all instances if there are multiple.
- X-Robots-Tag 1/2 etc – X-Robots-tag data. The SEO Spider will find all instances if there are multiple.
- rel=“next” and rel=“prev” – The SEO Spider collects these HTML link elements designed to indicate the relationship between URLs in a paginated series.
This tab includes the following columns.
This tab includes the following filters.
- Contains Canonical – The page has a canonical URL set (either via link element, HTTP header or both). This could be a self-referencing canonical URL where the page URL is the same as the canonical URL, or it could be ‘canonicalised’, where the canonical URL is different to the page URL.
- Self Referencing – The URL has a canonical which is the same URL as the page URL crawled (hence, it’s self referencing). Ideally only canonical versions of URLs would be linked to internally, and every URL would have a self-referencing canonical to help avoid any potential duplicate content issues that can occur (even naturally on the web, such as tracking parameters on URLs, other websites incorrectly linking to a URL that resolves etc).
- Canonicalised – The page has a canonical URL that is different to itself. The URL is ‘canonicalised’ to another location. This means the search engines are being instructed to not index the page, and the indexing and linking properties should be consolidated to the target canonical URL. These URLs should be reviewed carefully. In a perfect world, a website wouldn’t need to canonicalise any URLs as only canonical versions would be linked to, but often they are required due to various circumstances outside of control, and to prevent duplicate content.
- Missing – There’s no canonical URL present either as a link element, or via HTTP header. If a page doesn’t indicate a canonical URL, Google will identify what they think is the best version or URL. This can lead to ranking unpredicatability, and hence generally all URLs should specify a canonical version.
- Multiple – There’s multiple canonicals set for a URL (either multiple link elements, HTTP header, or both combined). This can lead to unpredictability, as there should only be a single canonical URL set by a single implementation (link element, or HTTP header) for a page.
- Non-Indexable Canonical – The canonical URL is a non-indexable page. This will include canonicals which are blocked by robots.txt, no response, redirect (3XX), client error (4XX), server error (5XX) or are ‘noindex’. Canonical versions of URLs should always be indexable, ‘200’ response pages. Therefore, canonicals that go to non-indexable pages should be corrected to the resolving indexable versions.
For more information on canonicals, please read our guide on ‘How to Audit Canoncials‘.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top