Screaming Frog SEO Spider Update – Version 3.0
I’m delighted to announce version 3.0 of the Screaming Frog SEO Spider, named internally as ‘walkies’.
This update includes a new way of analysing a crawl, additional sitemap features and insecure content reporting, which will help with all those HTTPS migrations! As always thanks to everyone for their continued support, feedback and suggestions for the tool.
So, let’s get straight to it. The new features in version 3.0 of the tool include the following –
1) Tree View
You can now switch from the usual ‘list view’ of a crawl, to a more traditional directory ‘tree view’ format, while still mantaining the granular detail of each URL crawled you see in the standard list view.
This additional view will hopefully help provide an alternative perspective when analysing a website’s architecture.
The SEO Spider doesn’t crawl this way natively, so switching to ‘tree view’ from ‘list view’ will take a little time to build, & you may see a progress bar on larger crawls for instance. This has been requested as a feature for quite sometime, so thanks to all for their feedback.
2) Insecure Content Report
We have introduced a ‘protocol’ tab, to allow you to easily filter and analyse by secure and non secure URLs at a glance (as well as other protocols potentially in the future). As an extension to this, there’s also a new ‘insecure content’ report which will show any HTTPS URLs which have insecure elements on them. It’s very easy to miss some insecure content, which often only get picked up on go live in a browser.
So if you’re working on HTTP to HTTPS migrations, this should be particularly useful. This report will identify any secure pages, which link out to insecure content, such as internal HTTP links, images, JS, CSS, external CDN’s, social profiles etc.
Here’s a quick example of how a report might look (with insecure images in this case) –
Please note, this report will only pick up on items we crawl, rather than everything rendered in a browser.
3) Image Sitemaps & Updated XML Sitemap Features
You can now add images to your XML sitemap or create an image sitemap file.
As shown in the screenshot above, you now have the ability to include images which appear under the ‘internal’ tab from a normal crawl, or images which sit on a CDN (and appear under the ‘external’ tab).
Typically you don’t want to include images like logos in an image sitemap, so you can also choose to only include images with a certain number of source attribute references. To help with this, we have introduced a new column in the ‘images’ tab which shows how many times an image is referenced (IMG Inlinks).
This is a nice easy way to exclude logos or social media icons, which are often linked to sitewide for example. You can also right-click and ‘remove’ any images or URLs you don’t want to include obviously too! The ‘IMG Inlinks’ is also very useful when viewing images with missing alt text, as you may wish to ignore social profiles without them etc.
There’s now also plenty more options when generating an XML sitemap. You can choose whether to include ‘noindex’, canonicalised, paginated or PDFs in the sitemap for example. Plus you now also have greater control over the lastmod, priority and change frequency.
4) Paste URLs In List Mode
To help save time, you can now paste URLs directly into the SEO Spider in ‘list’ mode, or enter URLs manually (into a window) and upload a file like normal.
Hopefully these additional options will be useful and help save time, particularly when you don’t want to save a file first to upload.
5) Improved Bulk Exporting
We plan on making the exporting function entirely customisable, but for now bulk exporting has been improved so you can export all inlinks (or ‘source’ links) to the custom filter and directives, such as ‘noindex’ or ‘canonicalised’ pages if you wish to analyse crawl efficiency for example.
Thanks to the awesome Aleyda for this suggestion.
6) Windows Look & Feel
There’s a new ‘user interface’ configuration for Windows only, that allows users to enable ‘Windows look and feel’. This will then adhere to the scaling settings a user has, which can be useful for some newer systems with very high resolutions.
It’s also rather colourful in ‘tree view’.
We have also performed other updates in the version 3.0 of the Screaming Frog SEO Spider, which include the following –
- You can now view the ‘Last-Modified’ header response within a column in the ‘Internal’ tab. This can be helpful for tracking down new, old, or pages within a certain date range. ‘Response time’ of URLs has also been moved into the internal tab as well (which used to just be in the ‘Response Codes’ tab, thanks to RaphSEO for that one).
- The parser has been updated so it’s less strict about the validity of HTML mark-up. For example, in the past if you had invalid HTML mark-up in the HEAD, page titles, meta descriptions or word count may not always be collected. Now the SEO Spider will simply ignore it and collect the content of elements regardless.
- There’s now a ‘mobile-friendly’ entry in the description prefix dropdown menu of the SERP panel. From our testing, these are not used within the description truncation calculations by Google (so you have the same amount of space for characters as pre there introduction).
- We now read the contents of robots.txt files only if the response code is 200 OK. Previously we read the contents irrespective of the response code.
- Loading of large crawl files has been optimised, so this should be much quicker.
- We now remove ‘tabs’ from links, just like Google do (again, as per internal testing). So if a link on a page contains the tab character, it will be removed.
- We have formatted numbers displayed in filter total and progress at the bottom. This is useful when crawling at scale! For example, you will see 500,000 rather than 500000.
- The number of rows in the filter drop down have been increased, so users don’t have to scroll.
- The default response timeout has been increased from 10 secs to 20 secs, as there appears to be plenty of slow responding websites still out there unfortunately!
- The lower window pane cells are now individually selectable, like the main window pane.
- The ‘search’ button next to the search field has been removed, as it was fairly redundant as you can just press ‘Enter’ to search.
- There’s been a few updates and improvements to the GUI that you may notice.
- (Updated) – The ‘Overview Report’ now also contains the data you can see in the right hand window pane ‘Response Times’ tab. Thanks to Nate Plaunt and I believe a couple of others who also made the suggestion (apologies for forgetting anyone).
We have also fixed a number of reported bugs, which include –
- Fixed a bug with ‘Depth Stats’, where the percentage didn’t always add up to 100%.
- Fixed a bug when crawling from the domain root (without www.) and the ‘crawl all subdomains’ configuration ticked, which caused all external domains to be treated as internal.
- Fixed a bug with inconsistent URL encoding. The UI now always shows the non URL encoded version of a URL. If a URL is linked to both encoded and unencoded, we’ll now only show the URL once.
- Fixed a crash in Configuration->URL Rewriting->Regex Replace, as reported by a couple of users.
- Fixed a crash for a bound checking issue, as reported by Ahmed Khalifa.
- Fixed a bug where unchecking the ‘Check External’ tickbox still checks external links, that are not HTML anchors (so still checks images, CSS etc).
- Fixed a bug where the leading international character was stripped out from SERP title preview.
- Fixed a bug when crawling links which contained a new line. Google removes and ignores them, so we do now as well.
- Fixed a bug where AJAX URLs are UTF-16 encoded using a BOM. We now derive encoding from a BOM, if it’s present.
Hopefully that covers everything! We hope the new features are helpful and we expect our next update to be significantly larger. If you have any problems with the latest release, do just pop through the details to support, and as always, we welcome any feedback or suggestions.
You can download the SEO Spider 3.0 now. Thanks to everyone for their awesome support.
Small Update – Version 3.1 Released 24th February 2015
We have just released another small update to version 3.1 of the Screaming Frog SEO Spider. There’s a couple of tweaks and some bug fixes from the update, which include –
- The insecure content report has been improved to also include canonicals. So if you have a secure HTTPS URL, with an insecure HTTP canonical, these will be identified within the ‘insecure content’ report now, as well.
- Increased the size of the URL input field by 100px in Spider mode.
- Fixed a bug with ‘Respect Canonicals’ option, not respecting HTTP Header Canonicals.
- Fixed a bug with ‘Crawl Canonicals’ not crawling HTTP Header Canonicals.
- Fixed a crash on Windows, when users try to use the ‘Windows look and feel’, but have an older version of Java, without JavaFX.
- Fixed a bug where we were not respecting ‘nofollow’ directives in the X-Robots-Tag Header, as reported by Merlinox.
- Fixed a bug with the Sitemaps file writing ‘priorities’ attribute with a comma, rather than a full stop, due to user locale.
- Updated the progress percentage & average response time to format according to default locale.
- Fixed a crash caused by parsing pages with an embed tag containing an invalid src attribute, eg embed src=”about:blank”.
Small Update – Version 3.2 Released 4th March 2015
We have just released another small update to version 3.2 of the Screaming Frog SEO Spider. Again, this is just a smaller update with feedback from users and includes –
- Updated the insecure content report to report insecure HTTP content on HTTPS URLs more accurately.
- Fixed a bug causing a crash during a right click ‘re-spider’ of URLs reported by a few users.
- Fixed slow loading of CSV files.
- Fixed a bug reported with double URL encoding.
Small Update – Version 3.3 Released 23rd March 2015
We have just released another small update to version 3.3 of the Screaming Frog SEO Spider. Similar to the above, this is just a small release with a few updates, which include –
- Fixed a relative link bug for URLs.
- Updated the right click options for ‘Show Other Domains On This IP’, ‘Check Index > Yahoo’ and OSE to a new address.
- CSV files now don’t include a BOM (Byte Order Mark). This was needed before we had excel export integration. It causes problems with some tools parsing the CSV files, so has been removed, as suggested by Kevin Ellen.
- Fixed a couple of crashes when using the right click option.
- Fixed a bug where images only linked to via an HREF were not included in a sitemap.
- Fixed a bug effecting users of 8u31 & JDK 7u75 and above trying to connect to SSLv3 web servers.
- Fixed a bug with handling of mixed encoded links.
You can download the SEO Spider 3.3 now.
Thanks to everyone for all their comments on the latest version and feeback as always.