Getting Started Guide
The SEO Spider Getting Started Guide
This guide is designed to help beginners get started using the Screaming Frog SEO Spider. It covers the initial set-up, how to get started crawling, and viewing reports and issues discovered.
To get started, you’ll need to download and install the SEO Spider which is free for crawling up to 500 URLs at a time. It’s available for Windows, MacOS and Ubuntu. Just click on the download button below.
Next, double click on the downloaded SEO Spider installation file and follow the steps in the installer.
If you wish to use the free version, you can ignore this step. However, if you wish to crawl more than 500 URLs, save and re-open crawls and access the advanced features, then you can buy a licence.
When you purchase a licence, you are provided with a username and licence key which should be entered within the application under ‘Licence > Enter Licence Key’.
When entered correctly the licence will say it’s valid and show the expiry date. You will then be required to restart the application to remove the crawl limit and enable access to the configuration and paid features.
If the licence says it’s invalid, please read our FAQ on common invalid licence issues to troubleshoot.
Memory & Storage Set-Up
You can ignore this step if you’re using the free version, or just want to get straight to crawling. However, if you’re using the paid version we recommend setting this up at the outset.
If you have an SSD, we recommend switching to database storage mode. Go to ‘Configuration > System > Storage’ and select ‘Database Storage Mode’.
Database storage mode provides massive benefits, including allowing the SEO Spider to crawl more URLs, automatically store crawl data and allow you to access old crawls quicker.
If you don’t have an SSD, then stick to RAM storage mode. You can still save crawls, and crawl lots of URLs if you have plenty of RAM available. You’re now set to start crawling!
Starting A Crawl
There’s two main modes for crawling. The default ‘Spider’ mode which crawls a website, or ‘List’, which allows you to upload a list of URLs to be crawled.
You can start a regular ‘Spider’ crawl by inserting the homepage into the ‘Enter URL to spider’ field and clicking ‘Start’.
This will crawl and audit the URL entered, and all URLs it can discover via hyperlinks in the HTML of pages on the same subdomain.
The crawl will update in real-time, and the speed and total number of URLs completed and remaining can be viewed at the bottom of the application.
You’re able to click ‘pause’ and you can ‘resume’ the crawl anytime. You can also save the crawl and resume it later, more on saving in a moment.
If you’d rather crawl a list of URLs rather than a full site, click ‘Mode > List’ to upload or paste in a list of URLs.
Configuring The Crawl
You don’t need to adjust the configuration to crawl a website, as the SEO Spider is set-up by default to crawl in a similar way to Google.
However, there are a myriad of ways that you can configure the crawl to get the data you want. Check out the options under ‘Configuration‘ within the tool and refer to our user guide for detail on each setting.
Viewing Crawl Data
Data from the crawl populates in real-time within the SEO Spider and is displayed in tabs. The ‘Internal‘ tab includes all data discovered in a crawl for the website being crawled. You can scroll up and down, and to the right to see all the data in various columns.
The tabs focus on different elements and each have filters that help refine data by type, and by potential issues discovered.
The ‘Response Codes’ tab and ‘Client Error (4xx) filter will show you any 404 pages discovered for example.
You can click on URLs in the top window and then on the tabs at the bottom to populate the lower window pane.
These tabs provide more detail on the URL, such as their inlinks (the pages that link to them), outlinks (the pages they link out to), images, resources and more.
In the example above, we can see inlinks to a broken link discovered during the crawl.
Viewing Errors & Potential Issues
The right hand window ‘overview’ tab displays a summary of crawl data contained within each tab and filter. You can scroll through each of these sections to identify potential errors and issues discovered, without needing to click into each tab and filter.
The number of URLs that are affected is updated in real-time during the crawl for most filters and you can click on them to be taken directly to the relevant tab and filter.
The SEO Spider doesn’t tell you how to do SEO, it provides you with data to make more informed decisions. The ‘filters’ do however provide hints on specific issues that should be addressed or at least considered further in the context of your site.
Explore these hints and if you’re unsure about the meaning of a tab or filter, just refer to our user guide. Every tab has a section (such as page titles, canonicals and directives) which explain every column and filter.
You’re able to export all data into spread sheets from the crawl. Simply click the ‘export’ button in the top left hand corner to export data from the top window tabs and filters.
To export lower window data, right click on the URL(s) that you wish to export data from in the top window, then click on one of the options.
There’s also a ‘Bulk Export’ option located under the top level menu. This allows you to export the source links, for example the ‘inlinks’ to URLs with specific status codes such as 2XX, 3XX, 4XX or 5XX responses.
In the above, selecting the ‘Client Error 4XX In Links’ option above will export all inlinks to all error pages (pages that link to 404 error pages).
Saving & Opening Crawls
You can only save and open crawls with a licence. In the default memory storage mode, you can save a crawl at anytime (when paused or its finished) and re-open by selecting ‘File > Save’, or ‘File > Open’.
In database storage mode, crawls are automatically ‘saved’ and committed in the database during a crawl. To open a crawl, click on ‘File > Crawls’ in the main menu.
The ‘Crawls’ window displays an overview of automatically stored crawls, where you can open, rename, organise into project folders, duplicate, export, or delete them in bulk.
Popular Uses & Advanced Features
We can’t cover every use and feature in the tool, so we recommend you explore the options available and refer to our thorough user guide where necessary.
However, we’ve compiled a list of some of the most common uses of the SEO Spider with links for additional reading.
- Find Broken Links – Crawl a website instantly and find broken links (404s) and server errors. Bulk export the errors and source URLs to fix, or send to a developer.
- Audit Redirects – Find temporary and permanent redirects, identify redirect chains and loops, or upload a list of URLs to audit in a site migration.
- Analyse Page Titles & Meta Descriptions – Review page titles and meta descriptions of every page to discover unoptimised, missing, duplicate, long or short elements.
- Review Directives & Canonicals – View URLs blocked by robots.txt, meta robots or X-Robots-Tag directives such as ‘noindex’ or ‘nofollow’, and audit canonicals.
- Find Missing Image Alt Text & Attributes – Find images that are missing alt text and view the alt text of every image in a crawl.
- Visualise Site Architecture – Evaluate internal linking and URL structure using interactive crawl and directory force-directed diagrams and tree graph site visualisations.
- Generate XML Sitemaps – Quickly create XML Sitemaps and Image XML Sitemaps, with advanced configuration over URLs to include, last modified, priority and change frequency.
- Audit International Set-Up (hreflang) – Find common errors and issue with hreflang annotations in HTML, via HTTP Header or in XML Sitemaps at scale.
- Analyse PageSpeed – Connect to the PSI API for Lighthouse metrics, speed opportunities, diagnostics and Chrome User Experience Report (CrUX) data at scale.
We have also compiled a list of some of the most popular features.
- Schedule Crawls – Schedule crawls to run automatically within the SEO Spider, as a one-off, or at chosen intervals.
- Integrate with GA, GSC & PSI – Connect to the Google Analytics, Search Console and PageSpeed Insights APIs and fetch user and performance data for all URLs in a crawl for greater insight.
- Custom Search HTML – Find anything you want in the source code of a website. Whether that’s Google Analytics code, specific text, or code etc.
- Extract Data With XPath – Collect any data from the HTML of a web page using CSS Path, XPath or regex. This might include social meta tags, additional headings, prices, SKUs or more.
- Crawl Staging & Development Sites – Login to a staging website using basic, digest or web forms authentication.
- Operate Via Command Line – Run crawls programmatically via command line to integrate with your own internal systems.
The guide above should help illustrate the simple steps required to get started using the SEO Spider.
If you have any further queries, then just get in touch via support.