Screaming Frog SEO Spider Update – Version 11.0
We are delighted to announce the release of Screaming Frog SEO Spider version 11.0, codenamed internally as ‘triples’, which is a big hint for those in the know.
In version 10 we introduced many new features all at once, so we wanted to make this update smaller, which also means we can release it quicker. This version includes one significant exciting new feature and a number of smaller updates and improvements. Let’s get to them.
1) Structured Data & Validation
Structured data is becoming increasingly important to provide search engines with explicit clues about the meaning of pages, and enabling special search result features and enhancements in Google.
The SEO Spider now allows you to crawl and extract structured data from the three supported formats (JSON-LD, Microdata and RDFa) and validate it against Schema.org specifications and Google’s 25+ search features at scale.
To extract and validate structured data you just need to select the options under ‘Config > Spider > Advanced’.
Structured data itemtypes will then be pulled into the ‘Structured Data’ tab with columns for totals, errors and warnings discovered. You can filter URLs to those containing structured data, missing structured data, the specific format, and by validation errors or warnings.
The structured data details lower window pane provides specifics on the items encountered. The left-hand side of the lower window pane shows property values and icons against them when there are errors or warnings, and the right-hand window provides information on the specific issues discovered.
The right-hand side of the lower window pane will detail the validation type (Schema.org, or a Google Feature), the severity (an error, warning or just info) and a message for the specific issue to fix. It will also provide a link to the specific Schema.org property.
In the random example below from a quick analysis of the ‘car insurance’ SERPs, we can see lv.com have Google Product feature validation errors and warnings. The right-hand window pane lists those required (with an error), and recommended (with a warning).
As ‘product’ is used on these pages, it will be validated against Google product feature guidelines, where an image is required, and there are half a dozen other recommended properties that are missing.
The right-hand window pane explains that this is because the format needs to be two-letter ISO 3166-1 alpha-2 country codes (and the United Kingdom is ‘GB’). If you check the page in Google’s structured data testing tool, this error isn’t picked up. Screaming Frog FTW.
The SEO Spider will validate against 26 of Google’s 28 search features currently and you can see the full list in our structured data section of the user guide.
As many of you will be aware, frustratingly Google don’t currently provide an API for their own Structured Data Testing Tool (at least a public one we can legitimately use) and they are slowly rolling out new structured data reporting in Search Console. As useful as the existing SDTT is, our testing found inconsistency in what it validates, and the results sometimes just don’t match Google’s own documented guidelines for search features (it often mixes up required or recommended properties for example).
We researched alternatives, like using the Yandex structured data validator (which does have an API), but again, found plenty of inconsistencies and fundamental differences to Google’s feature requirements – which we wanted to focus upon, due to our core user base.
There are plenty of nuances in structured data and this feature will not be perfect initially, so please do let us know if you spot any issues and we’ll fix them up quickly. We obviously recommend using this new feature in combination with Google’s Structured Data Testing Tool as well.
2) Structured Data Bulk Exporting
As you would expect, you can bulk export all errors and warnings via the ‘reports’ top-level menu.
The ‘Validation Errors & Warnings Summary’ report is a particular favourite, as it aggregates the data to unique issues discovered (rather than reporting every instance) and shows the number of URLs affected by each issue, with a sample URL with the specific issue. An example report can be seen below.
This means the report is highly condensed and ideal for a developer who wants to know the unique validation issues that need to be fixed across the site.
3) Multi-Select Details & Bulk Exporting
You can now select multiple URLs in the top window pane, view specific lower window details for all the selected URLs together, and export them. For example, if you click on three URLs in the top window, then click on the lower window ‘inlinks’ tab, it will display the ‘inlinks’ for those three URLs.
You can also export them via the right click or the new export button available for the lower window pane.
Obviously this scales, so you can do it for thousands, too.
This should provide a nice balance between exporting everything in bulk via the ‘Bulk Export’ menu and then filtering in spreadsheets, or the previous singular option via the right click.
4) Tree-View Export
If you didn’t already know, you can switch from the usual ‘list view’ of a crawl to a more traditional directory ‘tree view’ format by clicking the tree icon on the UI.
However, while you were able to view this format within the tool, it hasn’t been possible to export it into a spreadsheet. So, we went to the drawing board and worked on an export which seems to make sense in a spreadsheet.
When you export from tree view, you’ll now see the results in tree view form, with columns split by path, but all URL level data still available. Screenshots of spreadsheets generally look terrible, but here’s an export of our own website for example.
This allows you to quickly see the break down of a website’s structure.
5) Visualisations Improvements
We have introduced a number of small improvements to our visualisations. First of all, you can now search for URLs, to find specific nodes within the visualisations.
By default, the visualisations have used the last URL component for naming of nodes, which can be unhelpful if this isn’t descriptive. Therefore, you’re now able to adjust this to page title, h1 or h2.
Finally, you can now also save visualisations as HTML, as well as SVGs.
6) Smart Drag & Drop
You can drag and drop any file types supported by the SEO Spider directly into the GUI, and it will intelligently work out what to do. For example, you can drag and drop a saved crawl and it will open it.
You can drag and drop a .txt file with URLs, and it will auto switch to list mode and crawl them.
You can even drop in an XML Sitemap and it will switch to list mode, upload the file and crawl that for you as well.
Nice little time savers for hardcore users.
7) Queued URLs Export
You’re now able to view URLs remaining to be crawled via the ‘Queued URLs’ export available under ‘Bulk Export’ in the top level menu.
This provides an export of URLs discovered and in the queue to be crawled (in order to be crawled, based upon a breadth-first crawl).
8) Configure Internal CDNs
You can now supply a list of CDNs to be treated as ‘Internal’ URLs by the SEO Spider.
This feature is available under ‘Configuration > CDNs’ and both domains and subfolder combinations can be supplied. URLs will then be treated as internal, meaning they appear under the ‘Internal’ tab, will be used for discovery of new URLs, and will have data extracted like other internal URLs.
9) GA Extended URL Matching
Finally, if you have accounts that use extended URL rewrite filters in Google Analytics to view the full page URL (and convert /example/ to www.example.com/example) in the interface, they break what is returned from the API, and shortcuts in the interface (i.e they return www.example.comwww.example.com/example).
This means URLs won’t match when you perform a crawl obviously. We’ve now introduced an algorithm which will take this into account automatically and match the data for you, as it was really quite annoying.
Version 11.0 also includes a number of smaller updates and bug fixes, outlined below.
- The ‘URL Info’ and ‘Image Info’ lower window tabs has been renamed from ‘Info’ to ‘Details’ respectively.
- ‘Auto Discover XML Sitemaps via robots.txt’ has been unticked by default for list mode (it was annoyingly ticked by default in version 10.4!).
- There’s now a ‘Max Links per URL to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 10k max.
- There’s now a ‘Max Page Size (KB) to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 50k.
- There are new tool tips across the GUI to provide more helpful information on configuration options.
- The HTML parser has been updated to fix an error with unquoted canonical URLs.
- A bug has been fixed where GA Goal Completions were not showing.
That’s everything. If you experience any problems with the new version, then please do just let us know via support and we can help. Thank you to everyone for all their feature requests, bug reports and general support, Screaming Frog would not be what it is, without you all.
Now, go and download version 11.0 of the Screaming Frog SEO Spider.
Small Update – Version 11.1 Released 13th March 2019
We have just released a small update to version 11.1 of the SEO Spider. This release is mainly bug fixes and small improvements –
- Add 1:1 hreflang URL report, available under ‘Reports > Hreflang > All hreflang URLs’.
- Cleaned up the preset user-agent list.
- Fix issue reading XML sitemaps with leading blank lines.
- Fix issue with parsing and validating structured data.
- Fix issue with list mode crawling more than the list.
- Fix issue with list mode crawling of XML sitemaps.
- Fix issue with scheduling UI unable to delete/edit tasks created by 10.x.
- Fix issue with visualisations, where the directory tree diagrams were showing the incorrect URL on hover.
- Fix issue with GA/GSC case insensitivty and trailing slash options.
Small Update – Version 11.2 Released 9th April 2019
We have just released a small update to version 11.2 of the SEO Spider. This release is mainly bug fixes and small improvements –
- Update to schema.org 3.5 which was released on the 1st of April.
- Update splash screen, so it’s not always on top and can be dragged.
- Ignore HTML inside amp-list tags.
- Fix crash in visualisations when focusing on a node and using search.
- Fix issue with ‘Bulk Export > Queued URLs’ failing for crawls loaded from disk.
- Fix issue loading scheduling UI with task scheduled by version 10.x.
- Fix discrepancy between master and detail view Structured Data warnings when loading in a saved crawl.
- Fix crash parsing RDF.
- Fix ID stripping issue with Microdata parsing.
- Fix crashing in Google Structured Data validation.
- Fix issue with JSON-LD parse errors not being shown for pages with multiple JSON-LD sections.
- Fix displaying of Structured Data values to not include escape characters.
- Fix issue with not being able to read Sitemaps containing a BOM (Byte Order Mark).
- Fix Forms based Authentication so forms can be submitted by pressing enter.
- Fix issue with URLs ending ?foo.xml throwing off list mode.
- Fix GA to use URL with highest number of sessions when configuration options lead to multiple GA URLs matching.
- Fix issue opening crawls via .seospider files with ++ in their file name.