Posted 12 December, 2016 by in Screaming Frog SEO Spider

Screaming Frog SEO Spider Update – Version 7.0

I’m delighted to announce Screaming Frog SEO Spider version 7.0, codenamed internally as ‘Spiderman’.

Since the release of rendered crawling in version 6.0, our development team have been busy working on more new and exciting features. Let’s take a look at what’s new in 7.0.

1) ‘Fetch & Render’ (Rendered Screen Shots)

You can now view the rendered page the SEO Spider crawled in the new ‘Rendered Page’ tab which dynamically appears at the bottom of the user interface when crawling in JavaScript rendering mode. This populates the lower window pane when selecting URLs in the top window.

Screaming Frog SEO Spider 7.0

This feature is enabled by default when using the new JavaScript rendering functionality and allows you to set the AJAX timeout and viewport size to view and test various scenarios. With Google’s much discussed mobile first index, this allows you to set the user-agent and viewport as Googlebot Smartphone and see exactly how every page renders on mobile.

rendered page screen shots

Viewing the rendered page is vital when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t rely on Google’s own Fetch & Render in Search Console.

2) Blocked Resources

The SEO Spider now reports on blocked resources, which can be seen individually for each page within the ‘Rendered Page’ tab, adjacent to the rendered screen shots.

Rendered page blocked resources

The blocked resources can also be seen under ‘Response Codes > Blocked Resource’ tab and filter. The pages this impacts and the individual blocked resources can also be exported in bulk via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.

Blocked Resources

3) Custom robots.txt

You can download, edit and test a site’s robots.txt using the new custom robots.txt feature under ‘Configuration > robots.txt > Custom’. The new feature allows you to add multiple robots.txt at subdomain level, test directives in the SEO Spider and view URLs which are blocked or allowed.

Custom Robots.txt

During a crawl you can filter blocked URLs based upon the custom robots.txt (‘Response Codes > Blocked by robots.txt’) and see the matches robots.txt directive line.

Blocked by robots.txt

Custom robots.txt is a useful alternative if you’re uncomfortable using the regex exclude feature, or if you’d just prefer to use robots.txt directives to control a crawl.

The custom robots.txt uses the selected user-agent in the configuration and works well with the new fetch and render feature, where you can test how a web page might render with blocked resources.

We considered including a check for a double UTF-8 byte order mark (BOM), which can be a problem for Google. According to the spec, it invalidates the line – however, this will generally only ever be due to user error. We don’t have any problem parsing it and believe Google should really update their behaviour to make up for potential mistakes.

Please note – The changes you make to the robots.txt within the SEO Spider, do not impact your live robots.txt uploaded to your server.

4) hreflang Attributes

First of all, apologies, this one has been a long time coming. The SEO Spider now extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header. They are also extracted from Sitemaps when crawled in list mode.

While users have historically used custom extraction to collect hreflang, by default these can now be viewed under the ‘hreflang’ tab, with filters for common issues.

hreflang

While hreflang is a fairly simple concept, there’s plenty of issues that can be encountered in the implementation. We believe this is the most comprehensive auditing for hreflang currently available anywhere and includes checks for missing confirmation links, inconsistent languages, incorrect language/regional codes, non-canonical confirmation links, multiple entries, missing self-reference, not using the canonical, missing the x-default, and missing hreflang completely.

Additionally, there are four new hreflang reports available to allow data to be exported in bulk (under the ‘reports’ top level menu) –

  • Errors – This report shows any hreflang attributes which are not a 200 response (no response, blocked by robots.txt, 3XX, 4XX or 5XX responses) or are unlinked on the site.
  • Missing Confirmation Links – This report shows the page missing a confirmation link, and which page requires it.
  • Inconsistent Language Confirmation Links – This report shows confirmation pages which use different language codes to the same page.
  • Non Canonical Confirmation Links – This report shows the confirmation links which are to non canonical URLs.

This feature can be fairly resource-intensive on large sites, so extraction and crawling are entirely configurable under ‘Configuration > Spider’.

5) rel=”next” and rel=”prev” Errors

This report highlights errors and issues with rel=”next” and rel=”prev” attributes, which are of course used to indicate paginated content.

The report will show any rel=”next” and rel=”prev” URLs which have a no response, blocked by robots.txt, 3XX redirect, 4XX, or 5XX error (anything other than a 200 ‘OK’ response).

This report also provides data on any URLs which are discovered only via a rel=”next” and rel=”prev” attribute and are not linked-to from the site (in the ‘unlinked’ column when ‘true’).

6) Maintain List Order Export

One of our most requested features has been the ability to maintain the order of URLs when uploaded in list mode, so users can then export the data in the same order and easily match it up against the original data.

Unfortunately, it’s not as simple as keeping the order within the interface, as the SEO Spider performs some normalisation under the covers and removes duplicates, which meant it made more sense to produce a way to export data in the original order.

Hence, we have introduced a new ‘export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface (when in list mode) which produces an export with data in the same order as it was uploaded.

Maintain list order export

The data in the export will be in the same order and include all of the exact URLs in the original upload, including duplicates or any fix-ups performed.

7) Web Forms Authentication (Crawl Behind A Login)

The SEO Spider has supported basic and digest standards-based authentication for some time, which enables users to crawl staging and development sites. However, there are other web forms and areas which require you to log in with cookies which have been inaccessible, until now.

We have introduced a new ‘authentication’ configuration (under ‘Configuration > Authentication), which allows users to log in to any web form within the SEO Spider Chromium browser, and then crawl it.

web form authentication

This means virtually all password-protected areas, intranets and anything which requires a web form login can now be crawled.

Please note – This feature is extremely powerful and often areas behind logins will contain links to actions which a user doesn’t want to press (for example ‘delete’). The SEO Spider will obviously crawl every link, so please use responsibly, and not on your precious fantasy football team. With great power comes great responsibility(!).

You can block the SEO Spider from crawling links or areas by using the exclude or custom robots.txt.

Other Updates

We have also included some other smaller updates and bug fixes in version 7.0 of the Screaming Frog SEO Spider, which include the following –

  • All images now appear under the ‘Images’ tab. Previously the SEO Spider would only show ‘internal’ images from the same subdomain under the ‘images’ tab. All other images would appear under the ‘external’ tab. We’ve changed this behaviour as it was outdated, so now all images appear under ‘images’ regardless.
  • The URL rewriting ‘remove parameters’ input is now a blank field (similar to ‘include‘ and ‘exclude‘ configurations), which allows users to bulk upload parameters one per line, rather than manually inputting and entering each separate parameter.
  • The SEO Spider will now find the page title element anywhere in the HTML (not just the HEAD), like Googlebot. Not that we recommend having it anywhere else!
  • Introduced tri-state row sorting, allowing users to clear a sort and revert back to crawl order.
  • The maximum XML sitemap size has been increased to 50MB from 10MB, in line with Sitemaps.org updated protocol.
  • Fixed a crash in custom extraction!
  • Fixed a crash when using the date range Google Analytics configuration.
  • Fixed exports ignoring column order and visibility.
  • Fixed cookies set via JavaScript not working in rendered mode.
  • Fixed issue where SERP title and description widths were different for master view and SERP Snippet table on Windows for Thai language.

We hope you like the update! Please do let us know if you experience any problems, or discover any bugs at all.

Thanks to everyone as usual for all the feedback and suggestions for improving the Screaming Frog SEO Spider.

Now go and download version 7.0 of the SEO Spider!

Small Update – Version 7.1 Released 15th December 2016

We have just released a small update to version 7.1 of the SEO Spider. This release includes –

  • Fix crash on startup for users of OSX 10.8 and below.
  • Show decoded versions of hreflang URLs in the UI.
  • Fix issue with connecting to a SSLv3 only web servers.
  • Handle standards based authentication when performing forms based authentication.
  • Handle popup windows when peforming forms based authenticaion.
  • Fix typo in hreflang filter.