Screaming Frog SEO Spider Update – Version 6.0
I’m excited to announce version 6.0 of the Screaming Frog SEO Spider, codenamed internally as ‘render-Rooney’.
Our team have been busy in development and have some very exciting new features ready to release in the latest update. This includes the following –
There were two things we set out to do at the start of the year. Firstly, understand exactly what the search engines are able to crawl and index. This is why we created the Screaming Frog Log File Analyser, as a crawler will only ever be a simulation of search bot behaviour.
After much research and testing, we integrated the Chromium project library for our rendering engine to emulate Google as closely as possible. Some of you may remember the excellent ‘Googlebot is Chrome‘ post from Joshua G on Mike King’s blog back in 2011, which discusses Googlebot essentially being a headless browser.
The new rendering mode is really powerful, but there are a few things to remember –
- Typically crawling is slower even though it’s still multi-threaded, as the SEO Spider has to wait longer for the content to load and gather all the resources to be able to render a page. Our internal testing suggests Google wait approximately 5 seconds for a page to render, so this is the default AJAX timeout in the SEO Spider. Google may adjust this based upon server response and other signals, so you can configure this to your own requirements if a site is slower to load a page.
- The crawling experience is quite different as it can take time for anything to appear in the UI to start with, then all of a sudden lots of URLs appear together at once. This is due to the SEO Spider waiting for all the resources to be fetched to render a page before the data is displayed.
2) Configurable Columns & Ordering
You’re now able to configure which columns are displayed in each tab of the SEO Spider (by clicking the ‘+’ in the top window pane).
You can also drag and drop the columns into any order and this will be remembered (even after a restart).
To revert back to the default columns and ordering, simply right click on the ‘+’ symbol and click ‘Reset Columns’ or click on ‘Configuration > User Interface > Reset Columns For All Tables’.
3) XML Sitemap & Sitemap Index Crawling
The SEO Spider already allows crawling of XML sitemaps in list mode, by uploading the .xml file (number 8 in the ‘10 features in the SEO Spider you should really know‘ post) which was always a little clunky to have to save it if it was already live (but handy when it wasn’t uploaded!).
So we’ve now introduced the ability to enter a sitemap URL to crawl it (‘List Mode > Download Sitemap’).
Previously if a site had multiple sitemaps, you’d have to upload and crawl them separately as well.
Now if you have a sitemap index file to manage multiple sitemaps, you can enter the sitemap index file URL and the SEO Spider will download all sitemaps and subsequent URLs within them!
This should help save plenty of time!
4) Improved Custom Extraction – Multiple Values & Functions
We listened to feedback that users often wanted to extract multiple values, without having to use multiple extractors. For example, previously to collect 10 values, you’d need to use 10 extractors and index selectors (, etc) with XPath.
We’ve changed this behaviour, so by default, a single extractor will collect all values found and report them via a single extractor for XPath, CSS Path and Regex. If you have 20 hreflang values, you can use a single extractor to collect them all and the SEO Spider will dynamically add additional columns for however many are required. You’ll still have 9 extractors left to play with as well. So a single XPath such as –
Will now collect all values discovered.
You can still choose to extract just the first instance by using an index selector as well. For example, if you just wanted to collect the first h3 on a page, you could use the following XPath –
Functions can also be used anywhere in XPath, but you can now use it on its own as well via the ‘function value’ dropdown. So if you wanted to count the number of links on a page, you might use the following XPath –
I’d recommend reading our updated guide to web scraping for more information.
5) rel=“next” and rel=“prev” Elements Now Crawled
The SEO Spider can now crawl rel=“next” and rel=“prev” elements whereas previously the tool merely reported them. Now if a URL has not already been discovered, the URL will be added to the queue and the URLs will be crawled if the configuration is enabled (‘Configuration > Spider > Basic Tab > Crawl Next/Prev’).
rel=“next” and rel=“prev” elements are not counted as ‘Inlinks’ (in the lower window tab) as they are not links in a traditional sense. Hence, if a URL does not have any ‘Inlinks’ in the crawl, it might well be due to discovery from a rel=“next” and rel=“prev” or a canonical. We recommend using the ‘Crawl Path Report‘ to show how the page was discovered, which will show the full path.
There’s also a new ‘respect next/prev’ configuration option (under ‘Configuration > Spider > Advanced tab’) which will hide any URLs with a ‘prev’ element, so they are not considered as duplicates of the first page in the series.
6) Updated SERP Snippet Emulator
Earlier this year in May Google increased the column width of the organic SERPs from 512px to 600px on desktop, which means titles and description snippets are longer. Google displays and truncates SERP snippets based on characters’ pixel width rather than number of characters, which can make it challenging to optimise.
Our previous research showed Google used to truncate page titles at around 482px on desktop. With the change, we have updated our research and logic in the SERP snippet emulator to match Google’s new truncation point before an ellipses (…), which for page titles on desktop is around 570px.
Our research shows that while the space for descriptions has also increased they are still being truncated far earlier at a similar point to the older 512px width SERP. The SERP snippet emulator will only bold keywords within the snippet description, not in the title, in the same way as the Google SERPs.
Please note – You may occasionally see our SERP snippet emulator be a word out in either direction compared to what you see in the Google SERP. There will always be some pixel differences, which mean that the pixel boundary might not be in the exact same spot that Google calculate 100% of the time.
We are still seeing Google play to different rules at times as well, where some snippets have a longer pixel cut off point, particularly for descriptions! The SERP snippet emulator is therefore not always exact, but a good rule of thumb.
We have also included some other smaller updates and bug fixes in version 6.0 of the Screaming Frog SEO Spider, which include the following –
- A new ‘Text Ratio’ column has been introduced in the internal tab which calculates the text to HTML ratio.
- Google updated their Search Analytics API, so the SEO Spider can now retrieve more than 5k rows of data from Search Console.
- There’s a new ‘search query filter’ for Search Console, which allows users to include or exclude keywords (under ‘Configuration > API Access > Google Search Console > Dimension tab’). This should be useful for excluding brand queries for example.
- There’s a new configuration to extract images from the IMG srcset attribute under ‘Configuration > Advanced’.
- The new Googlebot smartphone user-agent has been included.
- Updated our support for relative base tags.
- Removed the blank line at the start of Excel exports.
- Fixed a bug with word count which could make it less accurate.
- Fixed a bug with GSC CTR numbers.
I think that’s just about everything! As always, please do let us know if you have any problems or spot any bugs at all.
Thanks to everyone for all the support and continued feedback. Apologies for any features we couldn’t include in this update, we are already working on the next set of updates and there’s plenty more to come!
Now go and download version 6.0 of the SEO Spider!
Small Update – Version 6.1 Released 3rd August 2016
We have just released a small update to version 6.1 of the SEO Spider. This release includes –
- Java 8 update 66 is now required on all platforms, as this update fixes several bugs in Java.
- Reduced certificate verification to be more tolerant when crawling HTTPS sites.
- Fixed a crash when using the date range configuration for Google Analytics integration.
- Fixed an issue with the lower window pane obscuring the main data window for some users.
- Fixed a crash in custom extraction.
- Fixed crash when starting a crawl without a selection in the overview window.
- Fixed an issue with being too strict on parsing title tags. Google seem to use them regardless of valid HTML head elements.
- Fixed a crash for Windows XP/Vista/Server 2003/Linux 32 bit users, which are not supported for rendering mode.
Update – Version 6.2 Released 16th August 2016
We have just released a small update to version 6.2 of the SEO Spider. This release includes –
- Fix for several crashes.
- Fix for the broken unavailable_after in the directives filter.
- Fix double clicking .seospider files on OS X that didn’t load the crawl file.
- Multiple extractions instances are now grouped together.
- Export now respects column order and visibility preferences.