- Response Codes
- Internal No Response
- Internal Client Error (4XX)
- Internal Server Error (5XX)
- Internal Redirect Loop
- Internal Blocked by Robots.txt
- Internal Blocked Resource
- Internal Redirect Chain
- External Blocked Resource
- Internal Redirection (3XX)
- Internal Redirection (Meta Refresh)
- Internal Redirection (JavaScript)
- External No Response
- External Client Error (4XX)
- External Server Error (5XX)
- Security
- HTTP URLs
- Mixed Content
- Form URL Insecure
- Form On HTTP URL
- Missing HSTS Header
- Unsafe Cross Origin Links
- Protocol-Relative Resource Links
- Missing Content-Security-Policy Header
- Missing X-Content-Type-Options Header
- Missing X-Frames-Options Header
- Missing Secure Referrer-Policy Header
- Bad Content Type
- Hreflang
- Non-200 Hreflang URLs
- Missing Return Links
- Inconsistent Language & Region Confirmation Links
- Non-Canonical Return Links
- Noindex Returns Links
- Incorrect Language & Region Codes
- Multiple Entries
- Not Using Canonical
- Outside <head>
- Unlinked Hreflang URLs
- Missing Self Reference
- Missing X-Default
- JavaScript
- Noindex Only in Original HTML
- Nofollow Only in Original HTML
- Canonical Mismatch
- Uses Old AJAX Crawling Scheme URLs
- Uses Old AJAX Crawling Scheme Meta Fragment Tag
- Pages with Blocked Resources
- Contains JavaScript Links
- Contains JavaScript Content
- Page Title Only in Rendered HTML
- Page Title Updated by JavaScript
- Meta Description Only in Rendered HTML
- Meta Description Updated by JavaScript
- H1 Only in Rendered HTML
- H1 Updated by JavaScript
- Canonical Only in Rendered HTML
- Pages With JavaScript Errors
- Links
- Outlinks To Localhost
- Pages Without Internal Outlinks
- Non-Indexable Page Inlinks Only
- Internal Nofollow Outlinks
- Pages With High External Outlinks
- Pages With High Internal Outlinks
- Follow & Nofollow Internal Inlinks To Page
- Internal Nofollow Inlinks Only
- Pages With High Crawl Depth
- Internal Outlinks With No Anchor Text
- Non-Descriptive Anchor Text In Internal Outlinks
- AMP
- Non-200 Response
- Missing Non-AMP Return Link
- Missing Canonical to Non-AMP
- Non-Indexable Canonical
- Missing <html amp> Tag
- Missing/Invalid Doctype HTML Tag
- Missing Head Tag
- Missing Body Tag
- Missing Canonical
- Missing/Invalid Meta Charset Tag
- Missing/Invalid Meta Viewport Tag
- Missing/Invalid AMP Script
- Missing/Invalid AMP Boilerplate
- Contains Disallowed HTML
- Other Validation Errors
- Indexable
- PageSpeed
- Eliminate Render-Blocking Resources
- Properly Size Images
- Defer Offscreen Images
- Minify CSS
- Minify JavaScript
- Reduce Unused CSS
- Reduce Unused JavaScript
- Efficiently Encode Images
- Serve Images in Next-Gen Formats
- Enable Text Compression
- Preconnect to Required Origin
- Reduce Server Response Times (TTFB)
- Preload Key Requests
- Reduce JavaScript Execution Time
- Serve Static Assets With An Efficient Cache Policy
- Minimize Main-Thread Work
- Image Elements Do Not Have Explicit Width & Height
- Avoid Large Layout Shifts
- Avoid Serving Legacy JavaScript to Modern Browsers
- Avoid Multiple Page Redirects
- Use Video Format for Animated Images
- Avoid Excessive DOM Size
- Ensure Text Remains Visible During Webfont Load
Internal Blocked by Robots.txt
Internal URLs blocked by the site’s robots.txt. This means they cannot be crawled and is a critical issue if you want the page content to be crawled and indexed by search engines.
How to Analyse in the SEO Spider
Use the ‘Response Codes’ tab and ‘Internal’ and ‘Blocked By Robots.txt’ filters to view these URLs.
View URLs that link to URLs blocked by robots.txt using the lower ‘Inlinks’ tab and export them in bulk via ‘Bulk Export > Response Codes > Internal > Blocked by Robots.txt Inlinks’.
What Triggers This Issue
The issue is triggered when directives within the robots.txt match the URLs in question. An example of this could be a directive like:
Disallow: /private/
In the robots.txt file, which would block search engines from crawling any URL that begins with
https://www.screamingfrog.co.uk/private/
How To Fix
Review URLs to ensure they should be disallowed. If they are incorrectly disallowed, then the site’s robots.txt should be updated to allow them to be crawled.
Consider whether you should be linking internally to these URLs and remove links where appropriate.
Further Reading
- Robots.txt Testing In The SEO Spider - From Screaming Frog
- How Google interprets the robots.txt specification - From Google
- Real Robots.txt Parser - From realrobotstxt.com