How To Audit AMP Using The SEO Spider
This tutorial walks you through how you can use the Screaming Frog SEO Spider to audit Accelerated Mobile Pages (AMP) quickly and efficiently. The SEO Spider uses the official AMP validator to allow bulk validation of URLs.
To get started, you’ll need to download the SEO spider which is free in lite form, for up to 500 URLs. You can download via the buttons in the right hand side bar. Crawling AMP URLs via the rel=”amphtml” link tag, requires paid access. However, you can upload a list of AMP URLs in the free version and analyse and validate them as well.
The SEO Spider will find AMP URLs, report on common SEO issues and validate them by checking on required HTML mark-up, prohibited HTML elements as per the specifications and more.
You have two options to analyse and validate AMP, which you can skip to the relevant section by clicking on your preference below –
Crawl A Site To Audit AMP
This section of the guide shows how to set-up a crawl to discover AMP URLs, audit and validate them.
1) Select ‘Extract AMP Links’ & ‘Crawl AMP Links’ under ‘Config > Spider’
2) Crawl The Website
Open up the SEO Spider, type or copy in the website you wish to crawl in the ‘enter url to spider’ box and hit ‘Start’.
The website will be crawled and AMP URLs will be discoverd via any rel=”amphtml” link tags within the HTML. Wait until the crawl finishes and reaches 100%.
3) View The AMP Tab
The AMP tab will show any AMP URLs discovered. It has has 17 filters (as shown in the image below) that help you identify common SEO or validation issues.
15 of the filters are available to view immediately during or at the end of a crawl. However, a couple of the filters require calculation at the end of the crawl via post ‘Crawl Analysis‘ for them to be populated with data (more on this in just a moment).
The right hand ‘overview’ pane, displays a ‘(Crawl Analysis Required)’ message against filters that require post crawl analysis to be populated with data.
4) Click ‘Crawl Analysis > Start’ To Populate AMP Filters
To populate these two AMP filters you simply need to click a button to start crawl analysis.
However, if you have configured ‘Crawl Analysis’ previously, you may wish to double check, under ‘Crawl Analysis > Configure’ that ‘AMP’ is ticked.
You can also untick other items that also require post crawl analysis to make this step quicker.
When crawl analysis has completed the ‘analysis’ progress bar will be at 100% and the filters will no longer have the ‘(Crawl Analysis Required)’ message.
5) Click ‘AMP’ & View Populated Filters
After performing post crawl analysis, all AMP filters will now be populated with data where applicable. In the example below, some of the AMP URLs are ‘non-200 responses’, which are in this case, 404 errors.
You’re able to filter by the following SEO related items –
- Non-200 Response – The AMP URLs do not respond with a 200 ‘OK’ status code. These will include URLs blocked by robots.txt, no responses, redirects, client and server errors.
- Non-Confirming Canonical – The canonical desktop version of the URL, does not contain a rel=”amphtml” URL back to the AMP URL. This could simply be missing from the desktop version, or there might be a configuration issue with the AMP canonical.
- Missing Non-AMP Canonical – The AMP URLs canonical does not go to a desktop version, but to another AMP URL.
- Non-Indexable Canonical – The AMP canonical URL is a non-indexable page. Generally the desktop equivalent should be an indexable page.
- Indexable – The AMP URL is indexable. AMP URLs with a desktop equivalent should be non-indexable (as they should have a canonical to the desktop equivalent). Standalone AMP URLs (without an equivalent) should be indexable.
- Non-Indexable – The AMP URL is non-indexable. This is usually because they are correctly canonicalised to the desktop equivalent.
The following filters help identify common issues relating to AMP specifications. The SEO Spider uses the official AMP Validator for validation of AMP URLs.
- Missing HTML AMP Tag – AMP HTML documents must contain a top-level HTML or HTML AMP tag.
- Missing/Invalid Doctype HTML Tag – AMP HTML documents must start with the doctype, doctype HTML.
- Missing Head Tag – AMP HTML documents must contain head tags (they are optional in HTML).
- Missing Body Tag – AMP HTML documents must contain body tags (they are optional in HTML).
- Missing Canonical – AMP URLs must contain a canonical tag inside their head that points to the regular HTML version of the AMP HTML document, or to itself if no such HTML version exists.
- Missing/Invalid Meta Charset Tag – AMP HTML documents must contain a meta charset=”utf-8″ tag as the first child of their head tag.
- Missing/Invalid Meta Viewport Tag – AMP HTML documents must contain a meta name=”viewport” content=”width=device-width,minimum-scale=1″ tag inside their head tag. It’s also recommended to include initial-scale=1.
- Missing/Invalid AMP Script – AMP HTML documents must contain a script async src=”https://cdn.ampproject.org/v0.js” tag inside their head tag.
- Missing/Invalid AMP Boilerplate – AMP HTML documents must contain the AMP boilerplate code in their head tag.
- Contains Disallowed HTML – This flags any AMP URLs with disallowed HTML for AMP.
- Other Validation Errors – This flags any AMP URLs with other validation errors not already covered by the above filters.
6) View The AMP URL Source By Clicking ‘Inlinks’
If an AMP URL errors, you’ll want to know the source of those errors. To do this, simply click on a URL in the top window pane and then click on the ‘Inlinks’ tab at the bottom to populate the lower window pane.
The ‘amphtml’ type, are references to a URL from rel=”amphtml” link tags within the head of the HTML.
Here’s a close up view of the ‘inlinks’ lower window tab –
This is showing the desktop URL (https://www.telegraph.co.uk/business/essential-insights/cyber-resilience/) has a rel=”amphtml” link tag to the AMP version (https://www.telegraph.co.uk/business/essential-insights/cyber-resilience/amp/), which is a 404 error.
7) Use The ‘Bulk Export > AMP > X Inlinks’ Exports
To bulk export AMP inlink data, use the ‘bulk export > AMP’ top level menu.
In the screenshot above, this would export all AMP URLs that don’t respond with a ‘200’ response code, and the respective inlinks (the source pages that link to the 404s).
Upload & Audit AMP URLs Seperately
Alternatively, you can audit AMP URLs seperately, by uploading them directly in list mode. It is possible to crawl both the AMP, and desktop equivalents in list mode by only uploading the AMP versions. This process is outlined below.
1) Click ‘Mode > List’
Via the top level menu. This enables you to upload a list of AMP URLs.
2) Tick ‘Always Follow Canonicals’ Under ‘Config > Spider > Advanced’.
In list mode, only the URLs uploaded will be crawled. Therefore, to crawl the desktop equivalents as well (or the canonicals of standalone AMP), use this configration.
This will mean the canonicals of AMP URLs uploaded will be crawled, regardless of the crawl depth set by list mode.
3) Copy AMP URLs, Then Click ‘Upload > Paste’
This uploads them into the SEO Spider so they can be crawled.
Click ‘OK’ twice, and crawl the AML URLs until the crawl finishes.
3) Follow The Process Outlined from Point 3 In the Guide Above
Now you can follow the same process outlined from point 3 in the ‘Crawl A Site To Check AMP‘ section above. This includes running a crawl analysis at the end of a crawl to populate filters within the AMP tab.
While a list mode crawl is obviously not as comprehensive as a full website crawl, by uploading the AMP URLs and crawling their canonicals, the SEO Spider will analyse the source relationships. Thus, this is a great way to quickly spot check AMP.
The guide above should help illustrate the simple steps required to bulk audit and validate Mobile Accelerated Pages (AMP) across a website.