Include

Table of Contents

General

Configuration Options

Spider Crawl Tab

Spider Extraction Tab

Spider Limits Tab

Spider Rendering Tab

Spider Advanced Tab

Spider Preferences Tab

Other Configuration Options

Tabs

Include

Configuration > Include

This feature allows you to control which URL path the SEO Spider will crawl using partial regex matching. It narrows the default search by only crawling the URLs that match the regex which is particularly useful for larger sites, or sites with less intuitive URL structures. Matching is performed on the encoded version of the URL.

The page that you start the crawl from must have an outbound link which matches the regex for this feature to work, or it just won’t crawl onwards. If there is not a URL which matches the regex from the start page, the SEO Spider will not crawl anything!

  • As an example, if you wanted to crawl pages from https://www.screamingfrog.co.uk which have ‘search’ in the URL string you would simply include the regex: search in the ‘include’ feature. This would find the /search-engine-marketing/ and /search-engine-optimisation/ pages as they both have ‘search’ in them.

Check out our video guide on the include feature.

Troubleshooting

  • Matching is performed on the URL encoded address, you can see what this is in the URL Info tab in the lower window pane or respective column in the Internal tab.
  • The regular expression must match the whole URL, not just part of it.
  • If you experience just a single URL being crawled and then the crawl stopping, check your outbound links from that page. If you crawl http://www.example.com/ with an include of ‘/news/’ and only 1 URL is crawled, then it will be because http://www.example.com/ does not have any links to the news section of the site.

Join the mailing list for updates, tips & giveaways

Back to top