SEO Spider

HTTP Status Codes – Why Won’t My Website Crawl?

HTTP Status Codes When Crawling

If the Screaming Frog SEO Spider only crawls one page, or does not crawl as expected, the ‘Status’ and ‘Status Code’ are the first things to check to help identify what the issue is.

A status is a part of Hypertext Transfer Protocol (HTTP), found in the server response header, it is made up of a numerical status code and an equivalent text status.

When a URL is entered into the SEO Spider and a crawl is initiated, the numerical status of the URL from the response header is shown in the ‘status code’ column, while the text equivalent is shown in the ‘status’ column within the default ‘Internal’ tab view e.g.

The most common status codes you are likely to encounter when a site cannot be crawled and the steps to troubleshoot these, can be found below:

Status Code – Status

0 – Blocked By Robots.txt
0 – DNS Lookup Failed
0 – Connection Timeout
0 – Connection Refused
0 – Connection Error / 0 – No Response
200 – OK
301 – Moved Permanently / 302 – Moved Temporarily
400 – Bad Request / 403 – Forbidden / 406 – Not Acceptable
404 – Page Not Found / 410 – Removed
429 – Too Many Requests
500 – Internal Server Error / 502 – Bad Gateway / 503 – Service Unavailable

0 – Blocked by robots.txt

Any ‘0’ status code in the Spider indicates the lack of a HTTP response from the server. The status provides a clue to exactly why no status was returned.

In this case this shows the robots.txt of the site is blocking the SEO Spider’s user agent from accessing the requested URL. Hence, the actual HTTP response is not seen due to the disallow directive.

Things to check: What is being disallowed in the sites robots.txt? (Add /robots.txt on subdomain of the URL crawled).

Things to try: Set the SEO Spider to ignore robots.txt (Configuration > Robots.txt > Settings > Ignore Robots.txt) or use the custom robots.txt configuration to allow crawling.

Reason: The SEO Spider obeys disallow robots.txt directives by default.

0 – DNS Lookup Failed

The website is not being found at all, often because the site does not exist, or your internet connection is not reachable.

Things to check: The domain is being entered correctly.

Things to check: The site can be seen in your browser.

Reason: If you can’t view the site in a browser, you could be experiencing PC / Network connectivity issues. If you can view the site, then something (likely an antivirus or firewall) is blocking the Spider from connecting to the internet and an exception must be set up for it.

0 – Connection Timeout

A connection timeout occurs when the SEO Spider struggles to receive an HTTP response from the server in a set amount of time (20 seconds by default).

Things to check: Can you view the site in a browser, does it load slowly?

Things to try: If the site is slow try increasing the response timeout and lowering speed of the crawl.

Reason: This gives the SEO Spider more time to receive information and puts less strain on the server.

Things to check: Can other sites be crawled? (bbc.co.uk and screamingfrog.co.uk are good control tests).

Things to try: Setting up exceptions for the SEO Spider in firewall / antivirus software (please consult your IT team).

Reason: If this issue occurs for every site, then it is likely an issue local to you or your PC / network.

Things to check: Is the proxy enabled (Configuration > System > Proxy).

Things to try: If enabled, disable the proxy.

Reason: If not set up correctly then this might mean the SEO Spider is not sending or receiving requests properly.

0 – Connection Refused

A ‘Connection Refused’ is returned when the SEO Spider’s connection attempt has been refused at some point between the local machine and website.

Things to check: Can you crawl other sites? (bbc.co.uk and screamingfrog.co.uk are good control tests).

Things to check: Setting up exceptions for the SEO Spider in firewall/antivirus software (please consult your IT team).

Reason: If this issue occurs for every site, then it is likely an issue local to you or your PC / network.

Things to check: Can you view the page in the browser or does it return a similar error?

Things to try: If the page can be viewed set Chrome as the user agent (Configuration > User-Agent). Enabling JavaScript Rendering (Configuration > Spider > Rendering) may also be required here.

If this still does not work you may need to add your IP address and the ‘Screaming Frog SEO Spider’ user-agent to an allowlist in the website’s CDN security settings.

Reason: The server is refusing the SEO Spider’s request of the page (possibly as protection/security against unknown user-agents).

0 – Connection Error / 0 – No Response

The SEO Spider is having trouble making connections or receiving responses.

Things to check: Proxy Settings (Configuration > System > Proxy).

Things to try: If enabled, disable the proxy.

Reason: If not set up correctly then this might mean the SEO Spider is not sending/receiving requests properly.

Things to check: Can you view the page in the browser or does it return a similar error?

Reason: If there are issues with the network or site, the browser would likely have a similar issue.

200 – OK

There was no issue receiving a response from the server, so the problem must be with the content that was returned.

Things to check: Does the requested page have meta robots ‘nofollow’ directive on the page / in the HTTP header or do all the links on the page have rel=’nofollow’ attributes?

Things to try: Set the configuration to follow Internal/External Nofollow (Configuration > Spider).

Reason: By default the SEO Spider obeys ‘nofollow’ directives.

Things to check: Are links JavaScript? (View page in browser with JavaScript disabled)

Things to try: Enable JavaScript Rendering (Configuration > Spider >Rendering > JavaScript). For more details on JavaScript crawling, please see our JavaScript Crawling Guide.

Reason: By default the SEO Spider will only crawl <a href=””>, <img src=””> and <link rel=”canonical”> links in HTML source code, it does not read the DOM. If available, the SEO Spider will use Google’s deprecated AJAX crawling scheme, which essentially means crawling an HTML snapshot of the rendered JavaScript page, instead of the JavaScript version of the page.

Things to check: ‘Limits’ tab of ‘Configuration > Spider’ particularly ‘Limit Search Depth’ and ‘Limit Search Total’.

Reason: If these are set to check 0 or 1 respectively, then the SEO Spider is being instructed to only crawl a single URL.

Things to check: Does the site require cookies? (View page in browser with cookies disabled).

Things to try: Configuration > Spider > Advanced Tab > Allow Cookies.

Reason: A separate message or page may be served to the SEO Spider if cookies are disabled, that does not hyperlink to other pages on the site.

Things to try: Change the user agent to Googlebot (Configuration > User-Agent).

Reason: The site/server may be set up to serve the HTML to search bots without the necessity of accepting Cookies.

Things to check: What is specified in the ‘Content’ Column?

Things to try: If this is blank, enable JavaScript Rendering (Configuration > Spider >Rendering > JavaScript) and retry the crawl.

Reason: If no content type is specified in the HTTP header the SEO Spider does not know if the URL is an image, PDF, HTML pages etc. so cannot crawl it to determine if there are any further links. This can be bypassed with rendering mode as the SEO Spider checks to see if a <meta http-equiv> is specified in the <head> of the document when enabled.

Things to check: Is there an age gate?

Things to try: Change the user agent to Googlebot (Configuration > User-Agent).

Reason: The site/server may be set up to serve the HTML to search bots without requiring an age to be entered.

301- Moved Permanently / 302 – Moved Temporarily

This means the requested URL has moved and been redirected to a different location.

Things to check: What is the redirect destination? (Check the outlinks of the returned URL).

Things to try: If this is the same as the starting URL, follow the steps described in our why do URLs redirect to themselves FAQ.

Reason: The redirect is in a loop where the SEO Spider never gets to a crawlable HTML page. If this is due to a cookie being dropped, this can be bypassed by following the steps in the FAQ linked above.

Things to check: External Tab.

Things to try: Configuration > Spider > Crawl All Subdomains.

Reason: The SEO Spider treats different subdomains as external and will not crawl them by default. If you are trying to crawl a subdomain that redirects to a different subdomain, it will be reported in the external tab.

Things to check: Does the site require cookies? (View the page in a browser with cookies disabled).

Things to try: Configuration > Spider > Advanced Tab > Allow Cookies.

Reason: The SEO Spider is being redirected to a URL where a cookie is dropped, but it does not accept cookies.

400 – Bad Request / 403 – Forbidden / 406 – Not Acceptable

The server cannot or will not process the request / is denying the SEO Spider’s request to view the requested URL.

Things to check: Can you view the page in a browser or does it return a similar error?

Things to try: If the page can be viewed set Chrome as the user agent (Configuration > User-Agent). Enabling JavaScript Rendering (Configuration > Spider > Rendering) may also be required here.

If this still does not work you may need to add your IP address and the ‘Screaming Frog SEO Spider’ user-agent to an allowlist in the website’s CDN security settings.

Reason: The site is denying the SEO Spider’s request of the page (possibly as protection/security against unknown user agents).

404 – Page Not Found / 410 – Removed

The server is indicating that the page has been removed.

Things to check: Does the requested URL load a normal page in the browser?

Things to try: Is the status code the same in other tools (Websniffer, Rexswain, browser plugins etc.).

Reason: If the status code is reported incorrectly for every tool, the site/server may be configured incorrectly serving the error response code, despite the page existing.

Things to try: If the page can be viewed set Chrome as the user agent (Configuration > User-Agent). A Googlebot user-agent is also worth testing, although it is not unusual for sites to block a spoofed Googlebot.

Reason: Site is serving the server error to the SEO Spider (possibly as protection/security against unknown user agents).

429 – Too Many Requests

To many requests have been made of the server in a set period of time.

Things to check: Can you view your site in the browser or does this show a similar error message?

Things to try: Lowering the crawl speed and/or testing a Chrome user agent.

Reason: The server is not allowing any more requests as too many have been made in a short period of time. Lowering the rate of requests or trying a user agent this limit may not apply to can help.

500 / 502 / 503 – Internal Server Error

The server is saying that it has a problem.

Things to check: Can you view your site in the browser or is it down?

Reason: Site is serving the server error to the SEO Spider (possibly as protection/security against unknown user agents).

It is possible for more than one of these issues to be present on the same page, for example, a JavaScript page could also have a meta ‘nofollow’ tag.

There are also many more response codes than this, but in our own experience, these are encountered infrequently, if at all. Many of these are likely to also be resolved by following the same steps as other similar response codes described above.

More details on response codes can be found at https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

HTTP Status Codes – Why Won’t My Website Crawl?

HTTP Status Codes When Crawling

Status Code – Status

0 – Blocked by robots.txt

0 – DNS Lookup Failed

0 – Connection Timeout

0 – Connection Refused

0 – Connection Error / 0 – No Response

200 – OK

301- Moved Permanently / 302 – Moved Temporarily

400 – Bad Request / 403 – Forbidden / 406 – Not Acceptable

404 – Page Not Found / 410 – Removed

429 – Too Many Requests

500 / 502 / 503 – Internal Server Error

Purchase a licence

Download

Back to top

HTTP Status Codes – Why Won’t My Website Crawl?

HTTP Status Codes When Crawling

Status Code – Status

0 – Blocked by robots.txt

0 – DNS Lookup Failed

0 – Connection Timeout

0 – Connection Refused

0 – Connection Error / 0 – No Response

200 – OK

301- Moved Permanently / 302 – Moved Temporarily

400 – Bad Request / 403 – Forbidden / 406 – Not Acceptable

404 – Page Not Found / 410 – Removed

429 – Too Many Requests

500 / 502 / 503 – Internal Server Error

Purchase a licence

Download

Join the mailing list for updates, tips & giveaways

Back to top

SEO Spider v.22.2

SEO Spider v.22.2

SEO Spider v.22.2

Log File Analyser v.6.3

Log File Analyser v.6.3

Log File Analyser v.6.3

Support Ticket

Support Ticket

Training Request