SEO Spider

Getting Started Guide

The SEO Spider Getting Started Guide

This guide is designed to help beginners get started using the Screaming Frog SEO Spider. It covers the initial set-up, how to get started crawling, and viewing reports and issues discovered.

Installation

To get started, you’ll need to download and install the SEO Spider which is free for crawling up to 500 URLs at a time. It’s available for Windows, MacOS and Ubuntu. Just click on the download button below.

Download Now

Next, double click on the downloaded SEO Spider installation file and follow the steps in the installer.

You can buy a licence, which removes the 500 URL crawl limit, opens up the configuration and provides access to more advanced features. Check out our pricing page to compare free vs paid features.

Licence Activation

If you wish to use the free version, you can ignore this step. However, if you wish to crawl more than 500 URLs, save and re-open crawls and access the advanced features, then you can buy a licence.

When you purchase a licence, you are provided with a username and licence key which should be entered within the application under ‘Licence > Enter Licence Key’.

When entered correctly the licence will say it’s valid and show the expiry date. You will then be required to restart the application to remove the crawl limit and enable access to the configuration and paid features.

If the licence says it’s invalid, please read our FAQ on common invalid licence issues to troubleshoot.

Memory & Storage Set-Up

You can ignore this step if you’re using the free version, or just want to get straight to crawling. However, if you’re using the paid version we recommend setting this up at the outset.

If you have an SSD, we recommend using database storage, which is the default. Go to ‘File > Settings > Storage Mode’ and ensure ‘Database Storage’ is selected.

Database storage provides massive benefits, including allowing the SEO Spider to crawl more URLs, automatically store crawl data and allow you to access old crawls quicker.

If you don’t have an SSD, then use memory storage mode. You can still save crawls, and crawl lots of URLs if you have plenty of RAM available. You’re now set to start crawling!

Starting A Crawl

There are two main modes for crawling. The default ‘Spider’ mode which crawls a website, or ‘List’, which allows you to upload a list of URLs to be crawled.

You can start a regular ‘Spider’ crawl by inserting the homepage into the ‘Enter URL to spider’ field and clicking ‘Start’.

This will crawl and audit the URL entered, and all URLs it can discover via hyperlinks in the HTML of pages on the same subdomain.

The crawl will update in real-time, and the speed and total number of URLs completed and remaining can be viewed at the bottom of the application.

You’re able to click ‘pause’ and you can ‘resume’ the crawl anytime. You can also save the crawl and resume it later, more on saving in a moment.

If you’d rather crawl a list of URLs rather than a full site, click ‘Mode > List’ to upload or paste in a list of URLs.

Configuring The Crawl

You don’t need to adjust the configuration to crawl a website, as the SEO Spider is set-up by default to crawl in a similar way to Google.

However, there are a myriad of ways that you can configure the crawl to get the data you want. Check out the options under ‘Configuration‘ within the tool and refer to our user guide for detail on each setting.

Some of the most common ways to control what’s crawled are to crawl a specific subfolder, use the exclude (to avoid crawling URLs by URL pattern) or the include features.

If your website relies on JavaScript to populate content, you can also switch to JavaScript rendering mode under ‘Configuration > Spider > Rendering’.

This will mean JavaScript is executed and the SEO Spider will crawl content and links within the rendered HTML.

Viewing Crawl Data

Data from the crawl populates in real-time within the SEO Spider and is displayed in tabs. The ‘Internal‘ tab includes all data discovered in a crawl for the website being crawled.

You can scroll up and down, and to the right to see all the data in various columns.

The tabs focus on different elements and each have filters that help refine data by type, and by potential issues discovered.

The ‘Response Codes’ tab and ‘Client Error (4xx) filter will show you any 404 pages discovered for example.

You can click on URLs in the top window and then on the tabs at the bottom to populate the lower window pane.

These tabs provide more detail on the URL, such as their inlinks (the pages that link to them), outlinks (the pages they link out to), images, resources and more.

In the example above, we can see inlinks to a broken link discovered during the crawl.

Finding Errors & Issues

The right-hand ‘overview’ tab displays a summary of crawl data contained within each tab and filter.

Scroll through each of these sections to explore data and identify potential errors and issues, without needing to click into each tab and filter.

The number of URLs that are affected is updated in real-time during the crawl for most filters and you can click on them to be taken directly to the relevant tab and filter.

There’s also an ‘Issues’ right-hand tab, which details issues, warnings and opportunities discovered.

Issues Tab to identify crawl issues, warnings and opportunites

An in-app explanation of each issue and potential actions is provided in English, German, Spanish, French and Italian.

Each issue has a ‘type’ and an estimated ‘priority’ based upon the potential impact.

Issues are an error or issue that should ideally be fixed.
Opportunities are potential areas for optimisation and improvement.
Warnings are not necessarily an issue, but should be checked – and potentially fixed.

E.g – An ‘Internal URL Blocked by Robots.txt’ will be classed as a ‘warning’, but with a ‘High’ priority as it could potentially have a big impact if incorrectly disallowed.

The ‘Issues’ tab is a useful way to quickly identify top-level problems and dive straight into them as an alternative to the ‘Overview’ tab. For users with less SEO expertise, it helps provide more direction and guidance on improving a website.

Explore these hints and if you’re unsure about the meaning of an issue, just refer to the in-app descriptions in the Issues tab, or read our SEO Issues explanations.

Over 300 SEO issues, warnings and opportunities can be identified and each has a clear explanation of what the issue is, why it’s important, and how to fix it.

Exporting Data

You’re able to export all data into spread sheets from the crawl. Simply click the ‘export’ button in the top left hand corner to export data from the top window tabs and filters.

To export lower window data, right click on the URL(s) that you wish to export data from in the top window, then click on one of the options.

There’s also a ‘Bulk Export’ option located under the top level menu. This allows you to export the source links, for example the ‘inlinks’ to URLs with specific status codes such as 2XX, 3XX, 4XX or 5XX responses.

In the above, selecting the ‘Client Error 4XX In Links’ option above will export all inlinks to all error pages (pages that link to 404 error pages).

All Issues can be exported in bulk via ‘Bulk Export > Issues > All’. This will export each issue discovered (including their ‘inlinks’ variants for things like broken links) as a separate spreadsheet in a folder (as a CSV, Excel and Sheets).

Saving & Opening Crawls

You can only save and open crawls with a licence.

In default database storage, crawls are automatically ‘saved’ and committed in the database during a crawl. To open a crawl, click on ‘File > Crawls’ in the main menu.

The ‘Crawls’ window displays an overview of automatically stored crawls, where you can open, rename, organise into project folders, duplicate, export, or delete them in bulk.

File > Crawls menu for opening saved crawls.

In the older memory storage, you can save a crawl at anytime (when paused or its finished) and re-open by selecting ‘File > Save’, or ‘File > Open’.

Popular Uses & Advanced Features

We can’t cover every use and feature in the tool, so we recommend you explore the options available and refer to our thorough user guide where necessary.

However, we’ve compiled a list of some of the most common uses of the SEO Spider with links for additional reading.

Find Broken Links – Crawl a website instantly and find broken links (404s) and server errors. Bulk export the errors and source URLs to fix, or send to a developer.
Audit Redirects – Find temporary and permanent redirects, identify redirect chains and loops, or upload a list of URLs to audit in a site migration.
Analyse Page Titles & Meta Descriptions – Review page titles and meta descriptions of every page to discover unoptimised, missing, duplicate, long or short elements.
Review Directives & Canonicals – View URLs blocked by robots.txt, meta robots or X-Robots-Tag directives such as ‘noindex’ or ‘nofollow’, and audit canonicals.
Find Missing Image Alt Text & Attributes – Find images that are missing alt text and view the alt text of every image in a crawl.
Check For Duplicate Content – Analyse the website for exact duplicate pages, and near duplicate ‘similar’ content.
Crawl JavaScript Websites – Render web pages using the integrated Chromium WRS to crawl dynamic, JavaScript rich websites and frameworks, such as Angular, React and Vue.js.
Visualise Site Architecture – Evaluate internal linking and URL structure using interactive crawl and directory force-directed diagrams and tree graph site visualisations.
Generate XML Sitemaps – Quickly create XML Sitemaps and Image XML Sitemaps, with advanced configuration over URLs to include, last modified, priority and change frequency.
Audit International Set-Up (hreflang) – Find common errors and issue with hreflang annotations in HTML, via HTTP Header or in XML Sitemaps at scale.
Analyse PageSpeed & Core Web Vitals – Connect to the PSI API for Core Web Vitals (CrUX field data), Lighthouse metrics, speed opportunities and diagnostics at scale.

We have also compiled a list of some of the most popular features.

Schedule Crawls – Schedule crawls to run automatically within the SEO Spider, as a one-off, or at chosen intervals.
Compare Crawls & Staging – Track progress of SEO issues and opportunities and see what’s changed between crawls. Compare staging against production environments using advanced URL Mapping.
Integrate with GA, GSC & PSI – Connect to the Google Analytics, Search Console and PageSpeed Insights APIs and fetch user and performance data for all URLs in a crawl for greater insight.
Custom Search HTML – Find anything you want in the source code of a website. Whether that’s Google Analytics code, specific text, or code etc.
Extract Data With XPath – Collect any data from the HTML of a web page using CSS Path, XPath or regex. This might include social meta tags, additional headings, prices, SKUs or more.
Crawl Staging & Development Sites – Login to a staging website using basic, digest or web forms authentication.
Operate Via Command Line – Run crawls programmatically via command line to integrate with your own internal systems.
Crawl with ChatGPT – Use prompts against elements of a page while crawling to generate image alt text, analyse language, sentiment or scrape data.