Robots.txt
si digital
Posted 27 November, 2015 by si digital in
Robots.txt
The Screaming Frog SEO Spider is robots.txt compliant. It obeys robots.txt in the same way as Google.
It will check the robots.txt of the subdomain(s) and follow (allow/disallow) directives specifically for the Screaming Frog SEO Spider user-agent, if not Googlebot and then ALL robots. It will follow any directives for Googlebot currently as default. Hence, if certain pages or areas of the site are disallowed for Googlebot, the SEO Spider will not crawl them either. The tool supports URL matching of file values (wildcards * / $), just like Googlebot, too.
You can choose to ignore the robots.txt (it won’t even download it) in the paid (licensed) version of the software by selecting ‘Configuration > robots.txt > Ignore robots.txt’.
You can also view URLs blocked by robots.txt under the ‘Response Codes’ tab and ‘Blocked by Robots.txt’ filter. This will also show the matched robots.txt line of the disallow against each blocked URL.
Finally, there is also a custom robots.txt configuration, which allows you to download, edit and test a site’s robots.txt under ‘Configuration > robots.txt’ in the ‘Custom Robots’ section. Please read our user guide about using the Screaming Frog SEO Spider as a robots.txt tester.
A few things to remember with robots.txt –
- The SEO Spider only follows one set of user agent directives as per robots.txt protocol. Hence, priority is the Screaming Frog SEO Spider UA if you have any. If not, the SEO Spider will follow commands for the Googlebot UA, or lastly the ‘ALL’ or global directives.
- To reiterate the above, if you specify directives for the Screaming Frog SEO Spider, or Googlebot then the ALL (or ‘global’) bot commands will be ignored. If you want the global directives to be obeyed, then you will have to include those lines under the specific UA section for the SEO Spider or Googlebot.
- If you have conflicting directives (i.e an allow and disallow to the same file path) then a matching allow directive beats a matching disallow if it contains equal or more characters in the command.
- If the robots user agent is left blank, the SEO Spider will only obey the rules for * if present.