SEO Spider FAQ

Download | User Guide | FAQ | Support | Terms | Purchase

This section covers some frequently asked questions about the Screaming Frog SEO Spider. The FAQ includes –


Purchasing A Licence


Common Problems



Contact & Support

SEO Spider

Common Queries


Custom Extraction

Google Analytics Integration

We will be adding to the FAQ as we get more feedback.

What Additional Features Does A Licence Provide? Top ↑

A licence removes the 500 URI crawl limit, allows you to save and upload crawls, opens up all the configuration options and the custom source code search, custom extraction and Google Analytics integration features. We also provide support for technical issues related to the SEO spider for licensed users.

In the same way as the free ‘lite’ version, there are no restrictions on the number of websites you can crawl with a licence. Licences are however, individual per user. If you have five members of the team who would like to use the licenced version, you will need five licences.

How Many Users Are Permitted To Use One Licence? Top ↑

Licences are individual per user. A single licence key is for a single authorised user. If you have five users using five copies of the Screaming Frog SEO Spider software, you will require 5 separate licences.

Please see section 3 of our terms and conditions for full details.

Can I Use My Licence On More Than One Device? Top ↑

Yes. The licence allows you to install the SEO Spider on multiple computers. Licences are individual per user. Please see section 3 of our terms and conditions for full details.

Why Is My Licence Key Saying It’s Invalid? Top ↑

If the SEO Spider says your ‘licence key is invalid’, then please check the following, as the licence keys we provide always work.

If your licence key still does not work, then please contact support with the details.

Why Can’t My Licence Key Be Saved? Top ↑

The SEO Spider stores the licence in a file called licence.txt in the users home directory in a ‘.ScreamingFrogSEOSpider’ folder. You can see this location by going to Help->Debug and looking at the line labeled “Licence File”.

Please check the following to resolve this issue:

I Have Lost My Licence or Invoice, How Do I Get Another One? Top ↑

If you have lost your licence key or invoice from the 22nd of September 2014 onwards, please login to your account to retrieve the details. If you have lost your account password, then simply request it again via the form.

If you purchased a licence before the 22nd of September 2014, then please contact support[at] with your username or e-mail you used to pay for the premium version.

Is It Possible To Move My Licence To A New Computer? Top ↑

Yes, please take a note of your licence key (you can find this under ‘licence’ and ‘enter licence key’ in the software), then uninstall the SEO spider on the old computer, before installing and entering your licence on the new machine. If you experience any issues during this move, please contact our support.

How Do I Buy A Licence? Top ↑

Simply click on the ‘buy a licence’ option in the SEO Spider ‘licence’ menu or visit our purchase a licence page directly.

You can then create an account & make payment. When this is complete, you will be provided with your licence key to open up tool & remove the crawl limit. If you have just purchased a licence and have not received your licence, please check your spam / junk folder. You can also view your licence(s) details and invoice(s) by logging into your account.

Please note, the account login has only been active from the 22nd of September 2014. If you purchased before this date, it won’t be available and you can contact us for any information.

How Much Does The Screaming Frog SEO Spider Cost? Top ↑

As standard you download the lite version of the tool which is free. However, without a licence the SEO spider is limited to crawling a maximum of 500 URIs each crawl. The configuration options of the spider and the custom source code search feature are also only available in the licensed version.

For £99 per annum you can purchase a licence which opens up the spider’s configuration options and removes restrictions on the 500 URI maximum crawl. A licence is required per individual using the tool. When the licence expires, the SEO spider returns to the restricted free lite version.

Do You Offer Discounts On Bulk Licence Purchases? Top ↑

Yes, please see our SEO spider licence page for more details on discounts.

What Payment Methods Do You Accept & From Which Countries? Top ↑

We accept PayPal and most major credit and debit cards. Payment via American Express is covered here. The price of the SEO spider is in pound sterling (GBP). If you are outside of the UK, please take a look at the current exchange rate to work out the cost. (The automatic currency conversion will be dependent on the current foreign exchange rate and perhaps your card issuer).

We do not accept cheques (or checks!)

Do You Accept Payment Via American Express? Top ↑

Not directly, but by selecting the “PayPal” option on our Billing page:


Then choosing “Check Out as Guest” when you get to PayPal you should be able to use your Amex card. This option is available in most countries. If you don’t get this option, please use another card, PayPal or contact us via support and you can set-up to pay via bank transfer.

How Do I Renew My Licence? Top ↑

At the moment the best way is to simply purchase another licence upon expiry. Licences do not auto renew – so if you do not want to renew your licence you will not be charged and need to take no action.

I Have Purchased A Licence, Why Have I Not Received It? Top ↑

If you have just purchased a licence and have not received your licence, please check your spam / junk folder. Licences are sent immediately upon purchase. You can also view your licence(s) details and invoice(s) by logging into your account.

Please also check your payment method. If you have paid via an e-cheque, then the licence will only be sent when it has cleared. Paypal explains this as well.

I’m A Business In The EU, Can I Pay Without VAT? Top ↑

Yes. To do this you must have a valid VAT number and enter this on the Billing page during checkout. Select business and enter your VAT number as shown below:


Your VAT number will be checked against the VIES system and VAT removed if it is valid. The VIES system does go down from time to time, so if this happens please try again later. Unfortunately we cannot refund VAT once a purchase has been made.

Why Is My Credit Card Payment Being Declined? Top ↑

There are a few reasons this could happen:

Do You Work With Resellers? Top ↑

Resellers can purchase an SEO spider licence online on behalf of a client. Please be aware that licence usernames are automatically generated from the first and last names used in the Billing Address entered during checkout. If you require a custom username, then please request a PayPal invoice in advance.

For resellers who are unable to purchase online with PayPal or a credit card and encumber us with admin such as vendor forms, we reserve the right to charge an administration fee of £50.

How Is The Software Delivered? Top ↑

The software needs to be downloaded from our website, the licence key is delivered electronically by email.

What Is The Part Number? Top ↑

There is no part number or SKU.

What Is The Reseller Price? Top ↑

We do not offer discounted rates for resellers. The price is GBP at £99 per year, per user.

Where Can I Get Company Information? Top ↑

On our contact page.

Where Can I Get Licencing Terms? Top ↑

Licencing details can be found here.

Where Can I Get Form W-9 Information? Top ↑

Screaming Frog is a UK based company, so this is not applicable.

Can I Get A Quote In A Currency Other Than GBP? Top ↑

No, we only sell in GBP.

Why Won’t The SEO Spider Start? Top ↑

This is nearly always due to an out of date version of Java. If you are running the PC version, please make sure you have the latest version of Java. If you are running the Mac version, please make sure you have the most up to date version of the OS which will update Java. Please uninstall, then reinstall the spider and try again.

Why Won’t The SEO Spider Crawl My Website? Top ↑

This could be for a number of reasons:

Why Am I Experiencing Slow Down? Top ↑

There are a number of reasons why you might be experiencing slow crawl rate or slow down of the spider. These include –

Why Am I Experience Slow Down Or Hanging Upon Exports & Saving Crawls? Top ↑

This will generally be due to the SEO spider reaching its memory limit. Please read how to increase memory.

Why Does The SEO Spider Freeze? Top ↑

This will generally be due to the SEO spider reaching its memory limit. Please read how to increase memory.

Why Am I Experiencing A ‘Could not create the Java virtual machine’ Message After Increasing Memory? Top ↑

If you have just increased your memory allocation and now receive a ‘Could not create the Java virtual machine’ error message like this –

could not create the java virtual machine

It will be due to one of the two following reasons –

Please note, this is covered in the memory section of the user guide as well.

Why Do I Get A “Connection Refused” Response when Connecting to a Secure Site? Top ↑

You may get connection refused on sites that use stronger crypto algorithms than is supported by default in Java. You will see a “Connection Refused” in the Status column on the SEO Spider interface. The log file will show a line like this:

2015-01-19 09:10:03,218 [SpiderWorker 1] WARN - IO Exception for url: '' reason: ' Received fatal alert: handshake_failure'

You can view the log file(s) by either going to the location shown for ‘Log File’ under Help->Debug, or downloading and unzipping the log files from Help->Debug->Save Logs.

Due to import restrictions Java cannot supply this stronger crypto support by default. You can however, install the Java higher strength crypto support by downloading the following:

For Java 8: Java 8 Security Fix

For Java 7: Java 7 Security Fix

If you download, unzip and follow the instructions in the README.txt file you should be able to crawl your site successfully. Note that you can find where your <java-home> directory is set to by running up the SEO Spider and going to Help->Debug and looking at the Java section.

For more background information see here and scroll down to the section “Adding stronger algorithms: JCE Unlimited Strength”.

Why Do I Get A “Connection Refused” Response? Top ↑

Connection refused is displayed in the Status column when the SEO Spiders connection attempt has been refused at some point between the local machine and website.

If this happens for all sites consistently then it is an issue with the local machine/network. Please check the following:

If this is preventing you from crawling at all on a particular site, please see try the following:

If this is happening intermittently during a crawl then please try the following:

Why Do I Get A “Connection Error” Response? Top ↑

Connection error, or connection timeout is a message when there is an issue in receiving a response at all.

This is generally due to network issues or proxy settings.

Please check that you can connect to the internet. If you have changed the SEO spider proxy settings (under configuration, proxy), please ensure that these are correct (or they are switched off).

Why Do I Get A “Connection Timeout” Response? Top ↑

Connection timeout occurs when the SEO Spider struggles to receive an HTTP response at all and the request times out. It can often be due to a slow responding website or server when under load, or it can be due to network issues. We recommend the following –

Why Do I Get A “403 Forbidden” Error Response? Top ↑

The 403 forbidden status codes occurs when a web server denies access to the SEO spider’s request for some reason.

If this happens consistently and you can see the website in a browser, it could be the web server behaves differently depending on User Agent. In the premium version try adjusting the User Agent setting under Configuration->HTTP Header->User Agent. For example, try crawling as a bot, such as ‘Googlebot’, or as a browser, such as ‘Chrome’.

If this happens intermittently during a crawl, it could be due to the speed the spider is requesting pages overwhelming the server. In the premium version of the SEO spider you can reduce the speed of requests. If you are running the ‘lite’ version you may find that right clicking the URL and choosing re-spider will help.

Why Do I Get A “503 Service Unavailable” Error Response? Top ↑

The 503 Service Unavailable status code occurs when a web server denies access to the SEO spider’s request for some reason.

If this happens consistently and you can see the website in a browser, it could be the web server behaves differently depending on User Agent. In the premium version try adjusting the User Agent setting under Configuration->HTTP Header->User Agent. For example, try crawling as a bot, such as ‘Googlebot’, or as a browser, such as ‘Chrome’.

If this happens intermittently during a crawl, it could be due to the speed the spider is requesting pages overwhelming the server. In the premium version of the SEO spider you can reduce the speed of requests. If you are running the ‘lite’ version you may find that right clicking the URL and choosing re-spider will help.

Why Is The Character Encoding Incorrect? Top ↑

The SEO spider determines the character encoding of a web page by the “charset=” parameter in the http Content-Type header, eg:

“text/html; charset=UTF-8”

You can see this in the SEO spider’s interface in the ‘Content’ columns (in various tabs). If this is not present in the http header, the SEO spider will then read the first 2048 bytes of the html page to see if there is a charset within the html.

For example –

“meta http-equiv=”Content-Type” content=”text/html; charset=windows-1255″

If this is not the case, we continue assuming the page is UTF-8.

The spider does log any character encoding issues. If there is a specific page that is causing problems, perform a crawl of only that page by setting the maximum number of URLs to crawl to be 1, then crawling the URL. You may see a line in the trace.txt log file (the location is – C:UsersYourprofile.ScreamingFrogSEOSpidertrace.txt):

20-06-12 20:32:50 INFO Unsupported Encoding ‘windows-‘ reverting to ‘UTF-8’ on page ‘’ windows-‘. This could be an error on the site or you may need to install an additional language pack.

The solution to fix this is to specify the format of the data by either the Content-Type field of the accompanying HTTP header or ensuring the charset parameter in the source code is within the first 2048 bytes of the html within the head element.

Why Are Page Titles &/Or Meta Desciptions Not Being Displayed/Displayed Incorrectly? Top ↑

If the site or URL in question has page titles and meta descriptions, but one (or both!) are not showing in the SEO Spider this is generally due to invalid html markup between the opening html element and the close head element. The html markup between these elements in the source code has to valid, without errors, for page titles and meta descriptions to be parsed and collected by the SEO Spider.

The SEO Spider does not execute JavaScript. Modifications to any html elements via JavaScript will not be seen by the SEO Spider.

We recommend validating the html using the free W3C markup validation tool. A really nice feature here is the ‘Show Source’ button, which can be very insightful to identify specific errors.

We recommend fixing any html markup errors and then crawling the URL(s) again for these elements to be collected.

Why is the SEO Spider Not Finding Images? Top ↑

There are generally two reasons for this:

How Do I View Alt Text Of Images Hosted On A CDN? Top ↑

A content delivery network (or CDN) will either operate from an external domain or a sub domain and hence if one is being used to serve images they will typically appear under the ‘External’ tab, rather than the ‘Images’ tab. Please ensure robots.txt is not blocking the SEO Spider from crawling the CDN. To view alt text of the images, you can still use the ‘image info’ tab in the lower window pane still.

To export all image alt text, simply use the ‘bulk export’ and ‘all in links’ export. When you have the data in a spread sheet, simply filter for ‘type’ as ‘IMG’ and the destination URL to ‘does not contain’ ‘’. This will then display all images and their alt text on the CDN.

Why Can’t I View The Graphs? Top ↑

If you’re using Ubuntu and are unable to view the graphs, this is because the SEO Spider makes use of the JavaFX library for its graphing function. This requires java 7 from Oracle to be installed. This is an optional step, but required if you would like the spider to display graphs/charts etc. Click here to view our Java 7 Installation Guide

Why Do I Receive A Warning Saying Chrome Does Not Support Java 7 On Mac OS X? Top ↑

This warning is about running Java in your web browser, not Java Script. If you download Java 7 and need to run Java content (Java Applets for example) you will have to visit those web sites using either Safari or Firefox. Google are working on a 64-bit version of Chrome for MAC OSX that will alleviate this issue.

In our testing internally over a 6-month period using the Chrome browser every day alongside Java 7, we haven’t experienced any websites where there was an issue, ever. So, we recommend upgrading and you’ll probably never notice a problem. You can always revert if there was an issue.

Alternatively, we do still support version 2.40 which uses Java 6 and can be downloaded here. The only difference is the graphs, which require Java 7 for the feature.

Does The SEO Spider Crawl PDFs? Top ↑

The SEO Spider will check links to PDF documents. These URLs can be seen under the PDF filter in the Internal and External tabs. It does not parse PDF documents to find links to crawl.

Why Do I Get A ‘Project Open Failed’ When Attempting To Open A Saved Crawl? Top ↑

This means the crawl did not save completely, which is why it can’t be opened. EOF stands for ‘end of file’, which means the SEO Spider was unable to read to the expect end of the file.

Java end of file exception

This can be due to the SEO Spider crashing during save, which is normally due to running out of memory. This can also happen if you exit the SEO Spider during save, or your machine crashes for example.

Unfortunately there is no way to open or retrieve the crawl data, as it’s incomplete and therefore lost. Please also consider increasing your memory allocation, which will help reduce any problems saving a crawl in the future.

Why Won’t My Crawl Complete? Top ↑

First ensure the spider is still crawling the site and if so what the URLs it has been finding look like. Depending on the URLs the spider has been finding will explain why the crawl percentage is not increasing:

Why isn’t my Include/Exclude function working? Top ↑

Please note Include/Exclude are case sensitive so any functions need to match the URL exactly as it appears.

Functions will only be applied to URLs that have not yet been discovered by the spider. Any URLs that have been discovered and queued for crawling will to be affected, hence it is recommended the crawl is restarted between updates to ensure the results are accurate.

Functions will not be applied to the starting URL of a crawl or URLs in list mode.

.* is a the regex wildcard

Why Does The Installer Take A While To Start? Top ↑

Because Windows Defender is running a security scan on it, this can take up to a couple of minutes. Unfortunately when downloading the file using Google Chrome it gives no indication that it is running the scan. Internet Explorer does give an indication of this, and Firefox does not scan at all. If you go directly to your downloads folder and run the installer from there you don’t have to wait for the security scan to run.

Why Do I Get “error opening file for writing” When Installing? Top ↑

Try running the file as administrator by right clicking the installer and choosing “Run as administrator”. Alternatively log in to an administrator account. You may need to request assistance from your IT department depending on your company setup.

Why Are The Fonts So Small On My High Res Display? Top ↑

Because the SEO Spider to not obeying your windows scaling settings. You must choose the Windows Look and Feel by going to Configuration -> User Interface, selecting “Enable Windows Look and Feel” and restarting the SEO Spider.

Do You Support Macs below OS X Version 10.7.3 (& 32-Bit Macs)? Top ↑

Version 2.50 requires Java 7 to run which is only available from version 10.7.3 and above. This means older 32-bit Macs (the last of which we understand were made 7-8 years ago) will not be able to use the latest version of the SEO Spider and newer 64-bit Macs which haven’t updated their OS X.

We do still support version 2.40 for OS X versions below 10.7.3 (and 32-bit) Macs which can be downloaded here. The only difference is the graphs, which require Java 7 for the feature.

Why Does The SEO Spider User Interface Run Slowly On My MacBook Pro? Top ↑

There is a bug in the Java graphics library (JavaFX) that the SEO Spider uses for the graphs. This only affects the latest MacBook Pro – Late 2013 model with Intel Iris Pro GPU.
If you are affected you will see the following message under Configuration->User Interface:


A bug has been raised with Oracle who are looking into this, we currently don’t have any information on when a fix will be released. We will up this FAQ when we have that information.

In the mean time, there are two work arounds available. You can either run the application in “Low Resolution Mode”, keeping the graphs, or disable the graphs. There is no need to do both.

1) – Set the SEO Spider to be opened in “Low Resolution Mode”. To do this:

You can read more about this on the Apple website here.

2) – You can disable the graphs by going to Configuration->User Interface and unticking the ‘Enable Graphs’ option (as show above). This should fix the issue, but we do also still support version 2.40 which you can download and use, which simply doesn’t have the graphs.

Why Does The SEO Spider Open Then Immediately Close? Top ↑

This is a known java 8 bug relating to fonts. You can see details of this bug here.

The issue is around the use of non-standard fonts. If you restore standard fonts the SEO Spider will be able to start. To do this, open the “Font Book” application and choose File->Restore Standard Fonts. The SEO Spider should now be able to start without issue.

The removed fonts will now appear in the folder /Library/Fonts (Removed). You could add them back one by one, by double clicking the fonts, to identify which one was causing the issue. If you manage to identify which font has caused the issue please let us know and we can update the Java bug to get this fixed. They are currently waiting on this information from us. Unfortunately we have been unable to reproduce this issue ourselves.

The Spider GUI doesn’t have the latest flat style used in Yosemite Top ↑

Unfortunately we are at the mercy of Oracle to update their Mac look and feel to more closely match the new style introduced in Mac OS X Yosemite. There is a Java bug related to this at JDK-8052173. This will be updated in a future Java release.

Why Do I Get A Message About Installing Java – It’s Already Installed? Top ↑

If the SEO Spider shows a pop up saying: “Screaming Frog SEO Spider needs Java 7 or greater.” and you already have installed Java, it sounds like you have multiple versions installed. Please uninstall Java then reinstall.

How Can I Open Multiple Instances Of The SEO Spider? Top ↑

To open additional instances of the SEO Spider open a Terminal and type the following:

open -n /Applications/Screaming\ Frog\ SEO\

How Do I Submit A Bug / Receive Support? Top ↑

Please follow the steps on the support page so we can help you as quickly as possible. Please note, we only offer full support for premium users of the tool although we will generally try and fix any issues.

How Do I Provide Feedback? Top ↑

Feedback is welcome, please just follow the steps on the support page to submit feedback. Please note we will try to read all messages but might not be able to reply to all of them. We will update this FAQ as we receive additional questions and feedback.

What Operating Systems Does The SEO Spider Run On? Top ↑

The SEO Spider runs on Windows, Mac and Linux. It’s a Java application and requires a Java 7 runtime environment or later to be to run. You can check here to see the system requirements to run Java. You can download the SEO Spider for free and try it.

Mac: If you are using OS X 10.7.2 or lower please see this faq.

Linux: We provide an Ubuntu package for Linux. If you would like to run the SEO Spider on a non-Debian based distribution please extract the jar file from the .deb and run it manually.

Windows: The SEO Spider can also be run on the server variants and Windows 10.

How Do I Use The Configuration Options? Top ↑

You cannot use the configuration options in the lite version of the tool. You will need to buy a licence to open up this menu, you can do this by clicking the ‘buy a licence’ option in the spider’s interface under ‘license’.

How Do I Check For Broken Links (404 Errors)? Top ↑

Read our ‘How To Find Broken Links‘ tutorial, which explains how to identify broken links, view the source of the errors and export them and the source URLs in bulk.

What Do Each Of The Configuration Options Do? Top ↑

Please read our user guide, specifically the configuration options section.

How Do I Bulk Export All Inlinks To 3XX, 4XX (404 error etc) or 5XX pages? Top ↑

You can bulk export data via the ‘bulk export’ option in the top level navigation menu. You can then choose to export all links discovered or all in links to specific status codes such as 2XX, 3XX, 4XX or 5XX responses. For example, selecting the ‘Client Error 4XX In Links’ option will export all in links to all error pages (such as 404 error pages). Please see more on exporting in our user guide.

How Do I Bulk Export All Images Missing Alt Text? Top ↑

You can bulk export data via the ‘bulk export’ option in the top level navigation menu. Simply choose the ‘images missing alt text’ option to export all references of images without alt text. Please see more on exporting in our user guide.

How Do I Bulk Export All Image Alt Text? Top ↑

You can bulk export data via the ‘export’ option in the top level navigation menu. Simply choose the ‘all links’ option to export all images and associated alt text found in our crawl. This export actually includes data of all link instances found in our crawl, so please filter for images using the ‘type’ column in Excel. Please see more on exporting in our user guide.

How Is The Response Time Calculated? Top ↑

It is calculated from the time it takes to issue an HTTP request and get the full HTTP response back from the server. The figure displayed on the SEO Spider interface is in seconds. Note that this figure may not be 100% reproducible as it depends very much on server load and client network activity at the time the request was made.

What’s The Difference Between ‘Crawl Outside Of Start Folder’ & ‘Check Links Outside Folder’? Top ↑

The ‘Crawl outside of start folder’ configuration means you can crawl an entire website from anywhere. As an example, if you crawl with this configuration ticked, the SEO Spider will crawl the whole website. So this just provides nice flexibility on where you start, or if some (sometimes poor!) set-ups have ‘homepages’ as sub folders.

The ‘check links outside of folder’ option is different. It provides the ability to crawl ‘within’ a sub folder, but still see details on any URLs that they link out to which are outside of that sub folder. But it won’t crawl any further than this! An example –

If you started a crawl at and it linked to which returns a 404 page.

If you unticked the ‘check links outside of folder’ option, it wouldn’t crawl this 404 page as it sits outside the start folder. With it ticked, this page will be included under the ‘internal’ tab as a 404.

We felt users sometimes need to know about potential issues which start within the start folder, but which link outside. But at the sametime, didn’t need to crawl the entire website! This option now provides that flexibility.

How Do I Increase Memory?? Top ↑

Please see the how to increase memory section in our user guide.

How Does The Spider Treat Robots.txt? Top ↑

The Screaming Frog SEO Spider is robots.txt compliant. It checks robots.txt in the same way as Google. So it will check robots.txt of the (sub) domain and follow directives for all robots and specifically any for Googlebot. The tool also supports URL matching of file values (wildcards * / $) like Googlebot. Please see the above document for more information or our robots.txt section in the user guide. You can turn this feature off in the premium version.

Where Can I See The Pages Blocked By Robots.txt? Top ↑

You can simply view URLs blocked via robots.txt in the UI (within the ‘Internal’ and ‘Response Codes’ tabs for example). Ensure you have the ‘Show internal URLs blocked by robots.txt’ configuration ticked under the ‘Configuration > Spider > ‘Basic’ tab.

Disallowed URLs will appear with a ‘status’ as ‘Blocked by Robots.txt’ and there’s a ‘Blocked by Robots.txt’ filter under the ‘Response Codes’ tab, where these can be viewed.

The ‘Blocked by Robots.txt’ filter also displays a ‘Matched Robots.txt Line’ column, which provides the line number and disallow path of the robots.txt entry that’s excluding each URL. If multiple lines in robots.txt block a URL, the SEO Spider will just report on the first encountered, similar to Google within Search Console.

blocked by robots.txt

If you’re using the older 2.40 Mac version of the SEO Spider, you can view the ‘Total Blocked by robots.txt’ for a crawl on the right-hand side of the user interface in the ‘Summary’ section of the overview tab. This count includes both internal and external URLs. Currently, there isn’t a way of seeing which URLs have been blocked in the user interface. However, it is possible to get this information from the SEO Spider log file, after a crawl. Each time a URL is blocked by robots.txt, it will be reported like this:

2015-02-18 08:56:09,652 [RobotsMain 1] INFO - robots.txt file prevented the spider of '', reason 'Blocked by line 2: Disallow:'. You can choose to ignore robots.txt files in the Spider configuration.

You can view the log file(s) by either going to the location shown for ‘Log File’ under Help->Debug, or downloading and unzipping the log files from Help->Debug->Save Logs.

How Many URI Can The Spider Crawl? Top ↑

The spider cannot crawl an unlimited number of URIs, it is restricted by memory allocated. There is not a set number or pages it can crawl, it is dependent on the complexity of the site and a number of other factors. Generally speaking with the standard memory allocation of 512mb the spider can crawl between 10K-100K URI of a site. You can increase the SEO spider’s memory and as a very rough guide, a 64bit machine with 8gb of RAM will generally allow you to crawl a couple of hundred thousand URLs.

We recommend crawling large sites in sections. You can use the configuration menu to just crawl html (rather than images, CSS or JS) or exclude certain sections of the site. Alternatively if you have a nicely structured IA you can crawl by directory (/holidays/, /blog/ etc). The tool was not built to crawl entire sites with hundreds of thousands of pages to pick up every single issue as it currently uses RAM rather than a hard disk database.

Why Does The URI Total Not Match What I Export? Top ↑

The ‘Completed’ URI total is the number of URIs the SEO Spider has encountered. This is the total URI crawled, plus any ‘Internal’ and ‘External’ URI blocked by robots.txt.

Depending on the settings in the robots.txt section of the ‘Configuration > Spider >Basic’ menu, these blocked URI may not be visible in the SEO Spider interface.

If the ‘Respect Canonical’ or ‘Respect Noindex’ options in the ‘Configuration > Spider > Advanced’ tab are checked, then these URI will count towards the ‘Total Encountered’ (Completed Total) and ‘Crawled’, but will not be visible within the SEO Spider interface.

No single tab allows you to export all URI.

To export all URIs shown in the interface you need to export both the ‘Internal’ and ‘External’ tabs.

Can The SEO Spider Crawl Staging Or Development Sites That Are Password Protected Or Behind a Login? Top ↑

The SEO spider supports basic and digest authentication. If you visit the website and your browser gives you a popup requesting a username and password, that will be basic or digest authentication, which is supported. If the login screen is contained in the page itself, this will be a custom login system that is not supported.

Often sites in development will also be blocked via robots.txt as well, so make sure this is not the case or use the ‘ignore robot.txt configuration‘. Then simply insert the staging site URL, crawl and a pop-up box will appear, just like it does in a web browser, asking for a username and password.

SEO Spider authentication

Enter your credentials and the crawl will continue as normal. You cannot pre-enter login credentials – they are entered when URLs that require authentication are crawled. This feature does not require a licence key.

Try to following pages to see how authentication works in your browser, or in the SEO Spider.

How Do I Block The SEO Spider From Crawling My Site? Top ↑

The spider obeys robots.txt protocol. Its user agent is ‘Screaming Frog SEO Spider’ so you can include the following in your robots.txt if you wish the spider not to crawl your site –

User-agent: Screaming Frog SEO Spider

Disallow: /

Please note – There is an option to ‘ignore’ robots.txt and change user-agent, which is down to the responsibility of the user entirely.

Do You Collect Data & Can You See The Websites I Am Crawling? Top ↑

No. The Screaming Frog SEO Spider does not communicate any data back to us. All data is stored locally on your machine in its memory. The software does not contain any spyware, malware or adware (as verified by Softpedia) and it does not ‘phone home’ in anyway. You crawl from your machine and we don’t see it!

Google APIs use the OAuth 2.0 protocol for authentication and authorisation, and obviously the data provided via Google Analytics is only accessible locally on your machine. We don’t (and technically can’t!) see or store any data ourselves.

Why Does The Number of URLs Crawled Not Match The Number Of Results Indexed In Google Or Errors Reported Within Google Webmaster Tools? Top ↑

There’s a number of reasons why the number of URLs found in a crawl might not match the number of results indexed in Google (via a site: query) or errors reported in the SEO Spider match those in Google WMT.

First of all, crawling and indexing are quite separate, so there will always be some disparity. URLs might be crawled, but it doesn’t always mean they will actually be indexed in Google. This is an important area to consider, as there might be content in Google’s index which you didn’t know existed, or no longer want indexed for example. Equally, you may find more URLs in a crawl than in Google’s index due to directives used (noindex, canonicalisation) or even duplicate content, low site reputation etc.

First of all, the SEO Spider only crawls internal links of a website at that moment of time of the crawl. Google (more specifically Googlebot) crawls the entire web, so not just the internal links of a website for discovery, but also external links pointing to a website. Googlebot’s crawl is also not a snapshot in time, it’s over the duration of a site’s lifetime from when it’s first discovered. Therefore, you may find old URLs (perhaps from discontinued products or an old section on the site which still serve a 200 ‘OK’ response) or content that is only linked to via external sources in their index. The SEO Spider won’t be able to discover URLs which are not linked to internally, like orphan pages or URLs only accessible by external links.

There are other reasons as well, these may include –

Why Does The Number of URLs Crawled (Or Errors Discovered) Not Match Another Crawler? Top ↑

First of all, the free ‘lite’ version is restricted to a 500 URLs crawl limit and obviously a website might be significantly larger. If you have a licence, the main reason an SEO Spider crawl might discover more or less links (and indeed broken links etc), than another crawler is simply down to the different default configuration set-ups of each.

As default the SEO Spider will respect robots.txt, respect ‘nofollow’ of internal and external URLs & crawl canonicals. But other crawlers sometimes don’t respect these as default and hence why there might be differences. Obviously these can all be adjusted to your own preferences within the configuration.

While crawling more URLs might seem to be a good thing, actually it might be completely unnecessary and a waste of time and effort. So please choose wisely what you want to crawl.

We believe the SEO Spider is the most advanced crawler available and it will often find more URLs than other crawls as it crawl canonicals and AJAX just like Googlebot which other crawlers might not have as standard, or within their current capability.

There are other reasons as well, these may include –

Can I Crawl More Than One Site At A Time? Top ↑

Yes. There are two ways you can do this:

1) Open up a multiple instances of the SEO Spider, one for each domain you want to crawl. Mac users check here.

2) Use list mode (Mode->List). Remove the search depth limit (Configuration->Spider->Limits and untick “Limit Search Depth”, untick “Ignore robots.txt” (Configuration->Spider->Basic) then upload your list of domains to crawl.

How Do I Create An XML Sitemap? Top ↑

Read our ‘How To Create An XML Sitemap‘ tutorial, which explains how to generate an XML Sitemap, include or exclude pages or images and runs through all the configuration settings available.

Why Is My Sitemap Missing Some URIs? Top ↑

Canonicalised, Noindex and paginated URIs are not included in the sitemap by default. You may choose to include these in your site map by ticking the appropriate checkbox(s) in the “Pages” tab when you export the site map. Please read our user guide on XML Sitemap Creation.

How Can I Extract All Tags Matching My XPath? Top ↑

You will need to extract one value per extractor using an index selector. If you want to select all the h2 elements from a page, configure extractor 1 to get the first:


Extractor 2 to get the second:


If you run out of extractors you will have to do a separate crawl to grab any additional elements.

Why Is My Regex Extracting More Than Expected? Top ↑

If you are using a regex like .* that contains a greedy quantifier you may end up matching more than you want. The solution to this is to use a regex like .*?. For example if you are trying to extract the id from the following JSON:

"name":"James Bond"

Using "id":"(.*)" you will get:

007", "name":"James Bond

If you use "id":"(.*?)" you will extract:


How Do I Extract Multiple Matches Of A Regex? Top ↑

This isn’t possible. Only the first match will be returned. What you will need to do is adjust your regular expression such that there is only 1 match. The following html has 2 <h1> tags:

<title>2 h1s</title>

To extract the first h1, we can use: <h1>(.*?)</h1>. To extract the second one, we could use something like: </h1>.*<h1>(.*?)</h1>

Why Doesn’t GA Data Populate Against My URLs? Top ↑

The URLs in your chosen Google Analytics view have to match the URLs discovered in the SEO Spider crawl exactly, for data to be matched and populated accurately. If they don’t match, then GA data won’t be able to be matched and won’t populate. This is the single most common reason.

If Google Analytics data does not get pulled into the SEO Spider as you expected, then analyse the URLs under ‘Behaviour > Site Content > Landing Pages’ and ‘Behaviour > Site Content > All Pages’ depending on which dimension you choose in your query. Try clicking on the URLs to open them in a browser to see if they load correctly.

You can also export the ‘GA & GSC Not Matched’ report which shows a list of URLs returned from the Google Analytics & Search Analytics (from Search Console) API’s for your query, that didn’t match URLs in the crawl. Check the URLs with source as ‘GA’ for Google Analytics specifically (those marked as ‘GSC’ are Google Search Analytics, from Google Search Console). The URLs here need to match those in the crawl, for the data to be matched accurately.

If they don’t match, then the SEO Spider won’t be able to match up the data accurately. We recommend checking your default Google Analytics view settings (such as ‘default page’) and filters such as ‘extended URL’ hacks, which all impact how URLs are displayed and hence matched against a crawl. If you want URLs to match up, you can often make the required amends within Google Analytics or use a ‘raw’ unedited view (you should always have one of these ideally).

Please note – There are some very common scenarios where URLs in Google Analytics might not match URLs in a crawl, so we cover these by matching trailing and non-trailing slash URLs and case sensitivity (upper and lowercase characters in URLs). Google doesn’t pass the protocol (HTTP or HTTPS) via their API, so we also match this data automatically as well.

Why Doesn’t The GA API Data In The SEO Spider Match What’s Reported In The GA Interface? Top ↑

There’s a number of reasons why data fetched via the Google API into the SEO Spider, might be different to the data reported within the Google Analytics Interface. First of all, we recommend triple checking that you’re viewing the exact same account, property, view, segment, date range and metrics and dimensions. LandingPagePath and PagePath will of course provide very different results for example!

If data still doesn’t match, then there are some common reasons why –

We actually recommend using the Google Analytics API query explorer and viewing the data that comes back, with the following query parameters which we use as default (obviously using the account, property and view of the site you’re testing) –

Google Analytics API explorer

You should see that data returned via the API matches pretty closely to what is reported within the SEO Spider.

Follow Us!

Why Purchase A Licence?

Buy a Screaming Frog SEO Spider Licence
  • The 500 URI crawl limit is removed
  • You can access ALL the configuration options
  • You can save & re-upload crawls
  • You can search for anything in the source code, & collect any data from the HTML of a URL using XPath, CSS Path or regex
  • You can connect to the Google Analytics API & pull in data directly during a crawl
  • You get support for any technical issues with the software

Contact Us

Screaming Frog Ltd
6 Greys Road,

Tel: +44 (0)1491 415070
Fax: +44 (0)1491 578134