Table of Contents
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Saving, opening, exporting & importing crawls
- User agent
- Checking memory allocation
- XML sitemap creation
- Crawl analysis
- Command line interface set-up
- Command line interface
- Search function
- User Interface
Spider Crawl Tab
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for on-page filters
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Spelling & grammar
- Robots.txt settings
- Custom robots.txt
- URL rewriting
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- User Interface
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Memory allocation
- Storage mode
Lower Window Tabs
Right Side Window Tabs
Configuration > System > Storage Mode
The Screaming Frog SEO Spider uses a configurable hybrid engine, allowing users to choose to store crawl data in RAM, or in a database.
By default the SEO Spider uses RAM, rather than your hard disk to store and process data. This provides amazing benefits such as speed and flexibility, but it does also have disadvantages, most notably, crawling at scale.
However, if you have an SSD the SEO Spider can also be configured to save crawl data to disk, by selecting ‘Database Storage’ mode (under ‘Configuration > System > Storage’), which enables it to crawl at truly unprecedented scale, while retaining the same, familiar real-time reporting and usability.
Fundamentally both storage modes can still provide virtually the same crawling experience, allowing for real-time reporting, filtering and adjusting of the crawl. However, there are some key differences, and the ideal storage, will depend on the crawl scenario, and machine specifications.
Memory storage mode allows for super fast and flexible crawling for virtually all set-ups. However, as machines have less RAM than hard disk space, it means the SEO Spider is generally better suited for crawling websites under 500k URLs in memory storage mode.
Users are able to crawl more than this with the right set-up, and depending on how memory intensive the website is that’s being crawled. As a very rough guide, a 64-bit machine with 8gb of RAM will generally allow you to crawl a couple of hundred thousand URLs.
As well as being a better option for smaller websites, memory storage mode is also recommended for machines without an SSD, or where there isn’t much disk space.
We recommend this as the default storage for users with an SSD, and for crawling at scale. Database storage mode allows for more URLs to be crawled for a given memory setting, with close to RAM storage crawling speed for set-ups with a solid state drive (SSD).
The default crawl limit is 5 million URLs, but it isn’t a hard limit – the SEO Spider is capable of crawling significantly more (with the right set-up). As an example, a machine with a 500gb SSD and 16gb of RAM, should allow you to crawl up to 10 million URLs approximately.
While not recommended, if you have a fast hard disk drive (HDD), rather than a solid state disk (SSD), then this mode can still allow you to crawl more URLs. However, writing and reading speed of a hard drive does become the bottleneck in crawling – so both crawl speed, and the interface itself will be significantly slower. Using a network drive is not supported – this will be much too slow and the connection unreliable. Using a local folder that syncs remotely, such as Dropbox or OneDrive is not supported due to these processes locking files. Vault drives are also not supported.
If you’re working on the machine while crawling, it can also impact machine performance, so the crawl speed might require to be reduced to cope with the load. SSDs are so fast, they generally don’t have this problem and this is why ‘database storage’ can be used as the default for both small and large crawls.
Check out our video guide on storage modes.
- ExFAT/MS-DOS (FAT) file systems are not supported on macOS due to JDK-8205404.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top