Screaming Frog SEO Spider Update – Version 9.0
I’m delighted to announce the release of Screaming Frog SEO Spider 9.0, codenamed internally as ‘8-year Monkey’.
Our team have been busy in development working on exciting new features. In our last update, we released a new user interface, in this release we have a new and extremely powerful hybrid storage engine. Here’s what’s new.
1) Configurable Database Storage (Scale)
The SEO Spider has traditionally used RAM to store data, which has enabled it to have some amazing advantages; helping to make it lightning fast, super flexible, and providing real-time data and reporting, filtering, sorting and search, during crawls.
However, storing data in memory also has downsides, notably crawling at scale. This is why version 9.0 now allows users to choose to save to disk in a database, which enables the SEO Spider to crawl at truly unprecedented scale for any desktop application while retaining the same, familiar real-time reporting and usability.
The default crawl limit is now set at 5 million URLs in the SEO Spider, but it isn’t a hard limit, the SEO Spider is capable of crawling significantly more (with the right hardware). Here are 10 million URLs crawled, of 26 million (with 15 million sat in the queue) for example.
We have a hate for pagination, so we made sure the SEO Spider is powerful enough to allow users to view data seamlessly still. For example, you can scroll through 8 million page titles, as if it was 800.
The reporting and filters are all instant as well, although sorting and searching at huge scale will take some time.
It’s important to remember that crawling remains a memory intensive process regardless of how data is stored. If data isn’t stored in RAM, then plenty of disk space will be required, with adequate RAM and ideally SSDs. So fairly powerful machines are still required, otherwise crawl speeds will be slower compared to RAM, as the bottleneck becomes the writing speed to disk. SSDs allow the SEO Spider to crawl at close to RAM speed and read the data instantly, even at huge scale.
By default, the SEO Spider will store data in RAM (‘memory storage mode’), but users can select to save to disk instead by choosing ‘database storage mode’, within the interface (via ‘Configuration > System > Storage), based upon their machine specifications and crawl requirements.
Users without an SSD, or are low on disk space and have lots of RAM, may prefer to continue to crawl in memory storage mode. While other users with SSDs might have a preference to just crawl using ‘database storage mode’ by default. The configurable storage allows users to dictate their experience, as both storage modes have advantages and disadvantages, depending on machine specifications and scenario.
Please see our guide on how to crawl very large websites for more detail on both storage modes.
The saved crawl format (.seospider files) are the same in both storage modes, so you are able to start a crawl in RAM, save, and resume the crawl at scale while saving to disk (and vice versa).
2) In-App Memory Allocation
First of all, apologies for making everyone manually edit a .ini file to increase memory allocation for the last 8-years. You’re now able to set memory allocation within the application itself, which is a little more user-friendly. This can be set under ‘Configuration > System > Memory’. The SEO Spider will even communicate your physical memory installed on the system, and allow you to configure it quickly.
Increasing memory allocation will enable the SEO Spider to crawl more URLs, particularly when in RAM storage mode, but also when storing to database. The memory acts like a cache when saving to disk, which allows the SEO Spider to perform quicker actions and crawl more URLs.
3) Store & View HTML & Rendered HTML
You can turn this feature on under ‘Configuration > Spider > Advanced’ and ticking the appropriate ‘Store HTML’ & ‘Store Rendered HTML’ options, and also export all the HTML code by using the ‘Bulk Export > All Page Source’ top-level menu.
We have some additional features planned here, to help users identify the differences between the static and rendered HTML.
4) Custom HTTP Headers
The SEO Spider already provided the ability to configure user-agent and Accept-Language headers, but now users are able to completely customise the HTTP header request.
5) XML Sitemap Improvements
You’re now able to create XML Sitemaps with any response code, rather than just 200 ‘OK’ status pages. This allows flexibility to quickly create sitemaps for a variety of scenarios, such as for pages that don’t yet exist, that 301 to new URLs and you wish to force Google to re-crawl, or are a 404/410 and you want to remove quickly from the index.
If you have hreflang on the website set-up correctly, then you can also select to include hreflang within the XML Sitemap.
Please note – The SEO Spider can only create XML Sitemaps with hreflang if they are already present currently (as attributes or via the HTTP header). More to come here.
6) Granular Search Functionality
Previously when you performed a search in the SEO Spider it would search across all columns, which wasn’t configurable. The SEO Spider will now search against just the address (URL) column by default, and you’re able to select which columns to run the regex search against.
This obviously makes the search functionality quicker, and more useful.
7) Updated SERP Snippet Emulator
Google increased the average length of SERP snippets significantly in November last year, where they jumped from around 156 characters to over 300. Based upon our research, the default max description length filters have been increased to 320 characters and 1,866 pixels on desktop within the SEO Spider.
The lower window SERP snippet preview has also been updated to reflect this change, so you can view how your page might appear in Google.
It’s worth remembering that this is for desktop. Mobile search snippets also increased, but from our research, are quite a bit smaller – approx. 1,535px for descriptions, which is generally below 230 characters. So, if a lot of your traffic and conversions are via mobile, you may wish to update your max description preferences under ‘Config > Spider > Preferences’. You can switch ‘device’ type within the SERP snippet emulator to view how these appear different to desktop.
As outlined previously, the SERP snippet emulator might still be occasionally a word out in either direction compared to what you see in the Google SERP due to exact pixel sizes and boundaries. Google also sometimes cut descriptions off much earlier (particularly for video), so please use just as an approximate guide.
8) Post Crawl API Requests
Finally, if you forget to connect to Google Analytics, Google Search Console, Majestic, Ahrefs or Moz after you’ve started a crawl, or realise at the very end of a crawl, you can now connect to their API and ‘request API data’, without re-crawling all the URLs.
Version 9.0 also includes a number of smaller updates and bug fixes, outlined below.
- While we have introduced the new database storage mode to improve scalability, regular memory storage performance has also been significantly improved. The SEO Spider uses less memory, which will enable users to crawl more URLs than previous iterations of the SEO Spider.
- The ‘exclude‘ configuration now works instantly, as it is applied to URLs already waiting in the queue. Previously the exclude would only work on new URLs discovered, and rather than those already found and waiting in the queue. This meant you could apply an exclude, and it would be some time before the SEO Spider stopped crawling URLs that matched your exclude regex. Not anymore.
- The ‘inlinks’ and ‘outlinks’ tabs (and exports) now include all sources of a URL, not just links (HTML anchor elements) as the source. Previously if a URL was discovered only via a canonical, hreflang, or rel next/prev attribute, the ‘inlinks’ tab would be blank and users would have to rely on the ‘crawl path report’, or various error reports to confirm the source of the crawled URL. Now these are included within ‘inlinks’ and ‘outlinks’ and the ‘type’ defines the source element (ahref, HTML canonical etc).
- You can now choose to ‘cancel’ either loading in a crawl, exporting data or running a search or sort.
- We’ve added some rather lovely line numbers to the custom robots.txt feature.
- To match Google’s rendering characteristics, we now allow blob URLs during JS rendering crawl.
- We renamed the old ‘GA & GSC Not Matched’ report to the ‘Orphan Pages‘ report, so it’s a bit more obvious.
- URL Rewriting now applies to list mode input.
- There’s now a handy ‘strip all parameters’ option within URL Rewriting for ease.
- The Chromium version used for rendering is now reported in the ‘Help > Debug’ dialog.
- List mode now supports .gz file uploads.
- The SEO Spider now includes Java 8 update 161, with several bug fixes.
- Fix: Ahrefs integration requesting domain and subdomain data multiple times.
- Fix: Ahrefs integration not requesting information for HTTP and HTTPS on (sub)domain level.
- Fix: The crawl path report was missing some link types, which has now been corrected.
- Fix: Incorrect robots.txt behaviour for rules ending *$.
- Fix: Auth Browser cookie expiration date invalid for non UK locales.
That’s everything for now. This is a big release and one which we are proud of internally, as it’s new ground for what’s achievable for a desktop application. It makes crawling at scale more accessible for the SEO community, and we hope you all like it.
As always, if you experience any problems with our latest update, then do let us know via support and we will help and resolve any issues.
We’re now starting work on version 10, where some long standing feature requests will be included. Thanks to everyone for all their patience, feedback, suggestions and continued support of Screaming Frog, it’s really appreciated.
Now, please go and download version 9.0 of the Screaming Frog SEO Spider and let us know your thoughts.
Small Update – Version 9.1 Released 8th March 2018
We have just released a small update to version 9.1 of the SEO Spider. This release is mainly bug fixes and small improvements –
- Monitor disk usage on user configured database directory, rather than home directory. Thanks to Mike King, for that one!
- Stop monitoring disk usage in Memory Storage Mode.
- Make sitemap reading support utf-16.
- Fix crash using Google Analytics in Database Storage mode.
- Fix issue with depth stats not displaying when loading in a saved crawl.
- Fix crash when viewing Inlinks in the lower window pane.
- Fix crash in Custom Extraction when using xPath.
- Fix crash when embedded browser initialisation fails.
- Fix crash importing crawl in Database Storage Mode.
- Fix crash when sorting/searching main master view.
- Fix crash when editing custom robots.txt.
- Fix jerky scrolling in View Source tab.
- Fix crash when searching in View Source tab.