How To Audit Hreflang Using The SEO Spider
Hreflang lets Google know that you have multiple versions of a page for different languages, or languages and regions, to enable them to return the appropriate version to users under search.
While Hreflang is a very simple concept, it’s incredibly difficult to get right at scale, and is often implemented incorrectly, or with numerous errors.
This tutorial walks you through how you can use the Screaming Frog SEO Spider tool to check hreflang implementation quickly and efficiently. The SEO Spider will crawl rel=”alternate” hreflang annotations in HTML, via HTTP Header or in XML Sitemaps and report upon their set-up and common errors.
1) Select ‘Extract Hreflang’ & ‘Crawl Crawl Hreflang’ under ‘Config > Spider’
‘Configuration’ is available in the top level menu of the SEO Spider.
This will mean URLs referenced in hreflang annotations will also be crawled, as well as extracted and reported.
Once these options have been selected, click ‘OK’.
2) To Crawl Hreflang In XML Sitemaps, Select ‘Crawl Linked XML Sitemaps’ Under ‘Config > Spider’
Then choose to discover the XML Sitemaps via robots.txt (this requires a ‘Sitemap: https://www.example.com/sitemap.xml entry), or supply the destination of the XML Sitemap.
If hreflang is implemented via link elements, or HTTP Header you don’t need to follow this step.
If you’re not sure if or how hreflang is implemented, crawl the XML sitemap anyway, and the SEO Spider will discover anywhere which has hreflang annotations.
3) Crawl The Website
Open up the SEO Spider, type or copy in the website you wish to crawl in the ‘Enter URL to spider’ box and hit ‘Start’.
The website will be crawled and rel=”alternate” hreflang annotations in HTML, via HTTP Header or in XML Sitemaps will be discovered.
Now grab a coffee and wait until the progress bar reaches 100%, and the crawl is completed.
4) View The Hreflang Tab
The Hreflang tab shows all URLs found in a crawl and will show any rel=”alternate” hreflang annotations discovered and referenced by a URL in columns to the right in the main window pane. ‘Occurences’ counts the number of hreflang that has been discovered for each URL.
The Hreflang tab has has 13 filters (as shown in the image below) that help you identify common SEO issues.
12 of the 13 filters are available to view immediately during, or at the end of a crawl. The ‘unlinked hreflang URLs’ filter requires calculation at the end of the crawl via post ‘Crawl Analysis‘ for it to be populated with data (more on this in just a moment).
The right hand ‘overview’ pane, displays a ‘(Crawl Analysis Required)’ message against this filter that requires post crawl analysis to be populated with data.
5) Click ‘Crawl Analysis > Start’ To Populate Hreflang Filters
To populate the ‘unlinked hreflang URLs’ filter, you simply need to click a button to start crawl analysis.
However, if you have configured ‘Crawl Analysis’ previously, you may wish to double check, under ‘Crawl Analysis > Configure’ that ‘Hreflang’ is ticked.
You can also untick other items that also require post crawl analysis to make this step quicker.
When crawl analysis has completed the ‘analysis’ progress bar will be at 100% and the filters will no longer have the ‘(Crawl Analysis Required)’ message.
6) Click ‘Hreflang’ & View Populated Filters
After performing post crawl analysis, all hreflang filters will now be populated with data where applicable.
The hreflang data collected can then be reviewed in columns, to ensure the implementation is as required. You’re able to filter by the following SEO related items –
- Contains Hreflang – These are simply any URLs that have rel=”alternate” hreflang annotations from any implementation, whether link element, HTTP header or XML Sitemap.
- Non-200 Hreflang URLs – These are URLs contained within rel=”alternate” hreflang annotations that do not have a 200 response code, such as URLs blocked by robots.txt, no responses, 3XX (redirects), 4XX (client errors) or 5XX (server errors). Hreflang URLs must be crawlable and indexable and therefore non-200 URLs are treated as errors, and ignored by the search engines. The non-200 hreflang URLs can be seen in the lower window ‘URL Info’ pane with a ‘non-200’ confirmation status. They can be exported in bulk via the ‘Reports > Hreflang > Non-200 Hreflang URLs’ export.
- Unlinked Hreflang URLs – These are URLs that are only discoverable via rel=”alternate” hreflang link annotations. Hreflang annotations do not pass PageRank like a traditional anchor tag, so this might be a sign of a problem with internal linking, or the URLs contained in the hreflang annotation.
- Missing Confirmation Links – These are URLs with missing return links (or ‘return tags’ in Google Search Console) to them, from their alternate pages. Hreflang is reciprocal, so all alternate versions must confirm the relationship. When page X links to page Y using hreflang to specify it as it’s alternate page, page Y must have a return link. No return links means the hreflang annotations may be ignored or not interpreted correctly. The missing confirmation links URLs can be seen in the lower window ‘URL Info’ pane with a ‘missing’ confirmation status. They can be exported in bulk via the ‘Reports > Hreflang > Missing Confirmation Links’ export.
- Inconsistent Language & Region Confirmation Links – This filter includes URLs with inconsistent language and regional return links to them. This is where a return link has a different language or regional value than the URL is referencing itself. The inconsistent language confirmation URLs can be seen in the lower window ‘URL Info’ pane with an ‘Inconsistent’ confirmation status. They can be exported in bulk via the ‘Reports > Hreflang > Inconsistent Language Confirmation Links’ export.
- Non Canonical Confirmation Links – URLs with non canonical confirmation links to them. Hreflang should only include canonical versions of URLs. So this filter picks up return links that go to URLs that are not canonical versions of URLs. The non canonical confirmation URLs can be seen in the lower window ‘URL Info’ pane with a ‘Non Canonical’ confirmation status. They can be exported in bulk via the ‘Reports > Hreflang > Non Canonical Confirmation Links’ export.
- Noindex Confirmation Links – Confirmation links which have a ‘noindex’ meta tag. All pages within a set should be indexable, and hence any return URLs with ‘noindex’ may result in the hreflang relationship being ignored. The noindex confirmation links URLs can be seen in the lower window ‘URL Info’ pane with a ‘noindex’ confirmation status. They can be exported in bulk via the ‘Reports > Hreflang > Noindex Confirmation Links’ export.
- Incorrect Language & Region Codes – This simply verifies the language (in ISO 639-1 format) and optional regional (in ISO 3166-1 Alpha 2 format) code values are valid. Unsupported hreflang values can be viewed in the lower window ‘URL Info’ pane with an ‘invalid’ status.
- Multiple Entries – URLs with multiple entries to a language or regional code. For example, if page X links to page Y and Z using the same ‘en’ hreflang value annotation. This filter will also pick up multiple implementations, for example, if hreflang annotations were disovered as link elements and via HTTP header.
- Missing Self Reference – URLs missing a self referencing hreflang attribute. URLs should have their own self referencing rel=”alternate” hreflang annotation.
- Not Using Canonical – URLs not using the canonical URL on the page, in it’s own hreflang annotation. Hreflang should only include canonical versions of URLs.
- Missing X-Default – URLs missing an X-Default hreflang attribute. This is optional, and not necessarily an error or issue.
- Missing – URLs missing an hreflang attribute completely. These might be valid of course, if they aren’t multiple versions of a page.
7) View The Lower Window Pane ‘URL Info’ tab To View Errors
The ‘URL Info’ tab at the bottom displays helpful granular information on specific hreflang errors encountered.
For example, for the ‘Noindex Confirmation Links’ filter, it will show the ‘hreflang confirmation status’ of the alternate pages. As per the example below, you can see which URLs have been marked as ‘noindex’.
These can then be bulk exported via ‘Reports > Hreflang > Nonindex Confirmation Links’.
Another example, is direct from Google. Yes, even Google find it difficult to get hreflang right! Google include hreflang annotations in their XML Sitemaps and there are a number of issues. Reviewing Google’s search XML Sitemap (https://www.google.com/sitemap_search.xml) we can see it has 169 ‘Incorrect Language & Region Codes’.
There are 362 hreflang for each URL in the XML Sitemap, which manually auditing would be extremely painful. But, by using the ‘Incorrect Language & Region Codes’ filter and then reviewing (or exporting and filtering) the lower window ‘URL Info’ tab, you can quickly identify the ‘language code valid’ column with an ‘invalid’ status.
Reviewing the list of ISO 639-1 language codes, you can see that ‘fl’ isn’t a valid language for Filipino, it should be ‘tl’.
8) Use The ‘Reports > Hreflang > X’ Exports To Bulk Export Source URLs & Errors
To bulk export details of source pages, that contain errors or issues for hreflang, use the ‘Reports > Hreflang’ options.
For example, the ‘Reports > Hreflang > Non-200 Hreflang URLs’ export, will include details of the source pages that contain the rel=”alternate” hreflang annotations to the exact URLs that error or redirect.
This can sometimes be easier to digest, than in the user interface, as source URLs and hreflang links are included in individual rows.
The guide above should help illustrate the simple steps required to test and validate hreflang annotations across a website using the SEO Spider tool.
If you have any further queries about hreflang testing in the SEO Spider tool, then just get in touch via support.