Table of Contents
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Saving, opening, exporting & importing crawls
- User agent
- Checking memory allocation
- XML sitemap creation
- Crawl analysis
- Command line interface set-up
- Command line interface
- Search function
- User Interface
Spider Crawl Tab
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for on-page filters
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Spelling & grammar
- Robots.txt settings
- Custom robots.txt
- URL rewriting
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- User Interface
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Memory allocation
- Storage mode
Lower Window Tabs
Right Side Window Tabs
TL;DR: If you have experienced a memory warning, crash or are trying to perform large crawls, we recommend using a machine with an SSD and switching to database storage mode. If you don’t have an SSD, allocate as much memory as possible. For example, if you have 8gb of RAM, we recommend allocating between 4-6gb of RAM max.
The Screaming Frog SEO Spider uses a configurable hybrid storage engine, which can enable it to crawl millions of URLs. However, it does require memory and storage configuration, as well as the recommended hardware.
By default the SEO Spider will crawl using RAM, rather than saving to disk. This has advantages, but it cannot crawl at scale, without lots of RAM allocated.
The SEO Spider can be configured to store to disk using database storage mode, which allows it to crawl at scale, open saved crawls far quicker and saves crawl data continuously to help avoid ‘lost crawls’, such as the machine being accidentally restarted or the crawl ‘cleared’.
Memory Storage Mode
In standard memory storage mode there isn’t a set number of pages it can crawl, it is dependent on the complexity of the site and the users machine specifications. The SEO Spider sets a maximum memory of 1gb for 32-bit and 2gb for 64-bit machines, which enables it to crawl typically between 10k-100k URLs of a site.
You can increase the SEO Spider’s memory allocation, and crawl into hundreds of thousands of URLs purely using RAM. A 64-bit machine with 8gb of RAM will generally allow you to crawl a couple of hundred thousand URLs, if the memory allocation is increased.
Database Storage Mode
The SEO Spider can be configured to save crawl data to disk, which enables it to crawl millions of URLs. Crawls are also automatically saved in database storage mode and they open significantly quicker via the ‘File > Crawls’ menu.
We recommend database storage mode as the default storage configuration for all users with Solid State Drives (SSD), as hard disk drives are significantly slower at writing and reading data. This can be configured by selecting Database Storage mode (under ‘Configuration > System > Storage’).
As a rough guide, an SSD and 4gb of RAM allocated in database storage mode should allow the SEO Spider to crawl approx. 2 million URLs. We recommend this configuration as the default set-up for most users day to day.
High Memory Usage
If you have received the following ‘high memory usage’ warning message when performing a crawl –
Or if you are experiencing slow down in a crawl or of the program itself on a large crawl, this might be due to reaching the memory allocation.
This is warning you that the SEO Spider has reached the current memory allocation and to be able to crawl more URLs, there are two options.
- Switching To Database Storage Mode – This is our recommended next step. Database storage mode saves all crawl data to disk and allows you to crawl more URLs for the same memory allocation.
- Increasing Memory Allocation – We only recommend increasing memory allocation if you can’t move to database storage mode, or if you have reached your memory allocation in database storage mode. This increases the amount of data that can be held in RAM, to allow you to crawl more URLs.
These options can also be combined to improve performance.
First of all, if you’re in memory storage mode you should save the crawl via the ‘File > Save’ menu before changing settings. This will allow the crawl to be resumed after changes have been made. If you’re already in database storage mode, it stores the crawl automatically and it will be available to resume under ‘File > Crawls’.
Switching To Database Storage
As discussed above, you can switch to database storage mode to increase the number of URLs that can be crawled. We recommend using an SSD for this storage mode, and it can be quickly configured within the application (‘Configuration > System > Storage’).
We recommend this as the default storage for users with an SSD, and for crawling at scale. Database storage mode allows for more URLs to be crawled for a given memory setting, with close to RAM storage crawling speed for set-ups with an SSD.
The default crawl limit is 5 million URLs, but it isn’t a hard limit – the SEO Spider is capable of crawling more (with the right set-up). For crawls under 2 million URLs, we recommend database storage and allocating just 4gb of RAM.
While not recommended, if you have a fast hard disk drive (HDD), rather than an SSD, then this mode can still allow you to crawl more URLs. However, writing and reading speed of a hard drive does become the bottleneck in crawling – so both crawl speed, and the interface itself will be significantly slower.
If you’re working on the machine while crawling, it can also impact machine performance, so the crawl speed might require to be reduced to cope with the load. SSDs are so fast, they generally don’t have this problem and this is why ‘database storage’ can be used as the default for both small and large crawls.
To import a crawl from memory storage mode, please read our guide on saving, opening, exporting and importing crawls.
Increasing Memory Allocation
You’re able set memory allocation within the application itself by selecting ‘Configuration > System > Memory’. This will allow the SEO Spider to crawl more URLs, even in database storage mode.
The SEO Spider will communicate your physical memory installed on the system, and allow you to configure it quickly. We recommend allowing at least 2-4gb of free RAM for your system. For example, if you have 8gb of RAM, we’d recommend allocating between 4-6gb of RAM maximum.
Please remember to restart the application for the changes to take place. You can verify you setting have taken affect by following the guide here.
To open a saved crawl, please read our guide on saving, opening, exporting and importing crawls.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top