Screaming Frog SEO Spider Update – Version 7.0
I’m delighted to announce Screaming Frog SEO Spider version 7.0, codenamed internally as ‘Spiderman’.
Since the release of rendered crawling in version 6.0, our development team have been busy working on more new and exciting features. Let’s take a look at what’s new in 7.0.
1) ‘Fetch & Render’ (Rendered Screen Shots)
Viewing the rendered page is vital when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t rely on Google’s own Fetch & Render in Search Console.
2) Blocked Resources
The SEO Spider now reports on blocked resources, which can be seen individually for each page within the ‘Rendered Page’ tab, adjacent to the rendered screen shots.
The blocked resources can also be seen under ‘Response Codes > Blocked Resource’ tab and filter. The pages this impacts and the individual blocked resources can also be exported in bulk via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.
3) Custom robots.txt
You can download, edit and test a site’s robots.txt using the new custom robots.txt feature under ‘Configuration > robots.txt > Custom’. The new feature allows you to add multiple robots.txt at subdomain level, test directives in the SEO Spider and view URLs which are blocked or allowed.
During a crawl you can filter blocked URLs based upon the custom robots.txt (‘Response Codes > Blocked by robots.txt’) and see the matches robots.txt directive line.
Custom robots.txt is a useful alternative if you’re uncomfortable using the regex exclude feature, or if you’d just prefer to use robots.txt directives to control a crawl.
The custom robots.txt uses the selected user-agent in the configuration and works well with the new fetch and render feature, where you can test how a web page might render with blocked resources.
We considered including a check for a double UTF-8 byte order mark (BOM), which can be a problem for Google. According to the spec, it invalidates the line – however, this will generally only ever be due to user error. We don’t have any problem parsing it and believe Google should really update their behaviour to make up for potential mistakes.
Please note – The changes you make to the robots.txt within the SEO Spider, do not impact your live robots.txt uploaded to your server. You can read more about testing robots.txt in our user guide.
4) hreflang Attributes
First of all, apologies, this one has been a long time coming. The SEO Spider now extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header. They are also extracted from Sitemaps when crawled in list mode.
While users have historically used custom extraction to collect hreflang, by default these can now be viewed under the ‘hreflang’ tab, with filters for common issues.
While hreflang is a fairly simple concept, there’s plenty of issues that can be encountered in the implementation. We believe this is the most comprehensive auditing for hreflang currently available anywhere and includes checks for missing confirmation links, inconsistent languages, incorrect language/regional codes, non-canonical confirmation links, multiple entries, missing self-reference, not using the canonical, missing the x-default, and missing hreflang completely.
Additionally, there are four new hreflang reports available to allow data to be exported in bulk (under the ‘reports’ top level menu) –
- Errors – This report shows any hreflang attributes which are not a 200 response (no response, blocked by robots.txt, 3XX, 4XX or 5XX responses) or are unlinked on the site.
- Missing Confirmation Links – This report shows the page missing a confirmation link, and which page requires it.
- Inconsistent Language Confirmation Links – This report shows confirmation pages which use different language codes to the same page.
- Non Canonical Confirmation Links – This report shows the confirmation links which are to non canonical URLs.
This feature can be fairly resource-intensive on large sites, so extraction and crawling are entirely configurable under ‘Configuration > Spider’.
5) rel=”next” and rel=”prev” Errors
This report highlights errors and issues with rel=”next” and rel=”prev” attributes, which are of course used to indicate paginated content.
The report will show any rel=”next” and rel=”prev” URLs which have a no response, blocked by robots.txt, 3XX redirect, 4XX, or 5XX error (anything other than a 200 ‘OK’ response).
This report also provides data on any URLs which are discovered only via a rel=”next” and rel=”prev” attribute and are not linked-to from the site (in the ‘unlinked’ column when ‘true’).
6) Maintain List Order Export
One of our most requested features has been the ability to maintain the order of URLs when uploaded in list mode, so users can then export the data in the same order and easily match it up against the original data.
Unfortunately, it’s not as simple as keeping the order within the interface, as the SEO Spider performs some normalisation under the covers and removes duplicates, which meant it made more sense to produce a way to export data in the original order.
Hence, we have introduced a new ‘export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface (when in list mode) which produces an export with data in the same order as it was uploaded.
The data in the export will be in the same order and include all of the exact URLs in the original upload, including duplicates or any fix-ups performed.
7) Web Forms Authentication (Crawl Behind A Login)
The SEO Spider has supported basic and digest standards-based authentication for some time, which enables users to crawl staging and development sites. However, there are other web forms and areas which require you to log in with cookies which have been inaccessible, until now.
We have introduced a new ‘authentication’ configuration (under ‘Configuration > Authentication), which allows users to log in to any web form within the SEO Spider Chromium browser, and then crawl it.
This means virtually all password-protected areas, intranets and anything which requires a web form login can now be crawled.
Please note – This feature is extremely powerful and often areas behind logins will contain links to actions which a user doesn’t want to press (for example ‘delete’). The SEO Spider will obviously crawl every link, so please use responsibly, and not on your precious fantasy football team. With great power comes great responsibility(!).
We have also included some other smaller updates and bug fixes in version 7.0 of the Screaming Frog SEO Spider, which include the following –
- All images now appear under the ‘Images’ tab. Previously the SEO Spider would only show ‘internal’ images from the same subdomain under the ‘images’ tab. All other images would appear under the ‘external’ tab. We’ve changed this behaviour as it was outdated, so now all images appear under ‘images’ regardless.
- The URL rewriting ‘remove parameters’ input is now a blank field (similar to ‘include‘ and ‘exclude‘ configurations), which allows users to bulk upload parameters one per line, rather than manually inputting and entering each separate parameter.
- The SEO Spider will now find the page title element anywhere in the HTML (not just the HEAD), like Googlebot. Not that we recommend having it anywhere else!
- Introduced tri-state row sorting, allowing users to clear a sort and revert back to crawl order.
- The maximum XML sitemap size has been increased to 50MB from 10MB, in line with Sitemaps.org updated protocol.
- Fixed a crash in custom extraction!
- Fixed a crash when using the date range Google Analytics configuration.
- Fixed exports ignoring column order and visibility.
- Fixed issue where SERP title and description widths were different for master view and SERP Snippet table on Windows for Thai language.
We hope you like the update! Please do let us know if you experience any problems, or discover any bugs at all.
Thanks to everyone as usual for all the feedback and suggestions for improving the Screaming Frog SEO Spider.
Now go and download version 7.0 of the SEO Spider!
Small Update – Version 7.1 Released 15th December 2016
We have just released a small update to version 7.1 of the SEO Spider. This release includes –
- Fix crash on startup for users of OSX 10.8 and below.
- Show decoded versions of hreflang URLs in the UI.
- Fix issue with connecting to a SSLv3 only web servers.
- Handle standards based authentication when performing forms based authentication.
- Handle popup windows when peforming forms based authenticaion.
- Fix typo in hreflang filter.
Small Update – Version 7.2 Released 30th January 2017
We have just released a small update to version 7.2 of the SEO Spider. This release includes –
- Basic High DPI support for Linux (Configuration > User Interface > ‘Enable GTK Windows Look and Feel’).
- Fix issue with SERP panel truncating.
- Fix crash in hreflang processing.
- Fix unable to start on 32-bit Linux.
- Fix crash in tree view when moving columns.
- Fix hreflang ‘missing confirmation links’ filter not checking external URLs.
- Fix status code of ‘illegal cookie’.
- Fix crash when going to ‘Configuration > API Access > Google Analytics’.
- Fix crash when sorting on the redirect column.
- Fix crash in custom extraction.
- Fix ‘Enable Rendered Page Screen Shots’ setting not saving.
- Fix ‘Inconsistent Language Confirmation Links’ report, reporting the wrong ‘Actual Language’.
- Fix: NullPointerException when saving a crawl.