Table of Contents
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Saving, opening, exporting & importing crawls
- User agent
- Checking memory allocation
- XML sitemap creation
- Crawl analysis
- Command line interface set-up
- Command line interface
- Search function
- User Interface
Spider Crawl Tab
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for on-page filters
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Spelling & grammar
- Robots.txt settings
- Custom robots.txt
- URL rewriting
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- User Interface
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Memory allocation
- Storage mode
Lower Window Tabs
Right Side Window Tabs
The Structured Data tab includes details of structured data and validation issues discovered from a crawl.
‘JSON-LD’, ‘Microdata’, ‘RDFa’, ‘Schema.org Validation’ and ‘Google Rich Result Feature Validation’ configuration options need to be enabled (under ‘Config > Spider > Extraction’) for this tab and respective filters to be fully populated.
This tab includes the following columns.
- Address – The URL crawled.
- Errors – The total number of validation errors discovered for the URL.
- Warnings – The total number of validation warnings discovered for the URL.
- Total Types – The total number of itemtypes discovered for the URL.
- Unique Types – The unique number of itemtypes discovered for the URL.
- Type 1 – The first itemtype discovered for the URL.
- Type 2 etc – The second itemtype discovered for the URL.
This tab includes the following filters.
- Contains Structured Data – These are simply any URLs that contain structured data. You can see the different types in columns in the upper window.
- Missing Structured Data – These are URLs that do not contain any structured data.
- Validation Errors – These are URLs that contain validation errors. The errors can be either Schema.org, Google rich result features, or both – depending on your configuration. Schema.org issues will always be classed as errors, rather than warnings. Google rich result feature validation will show errors for missing required properties or problems with the implementation of required properties. Google’s ‘required properties’ must be included and be valid for content to be eligible for display as a rich result.
- Validation Warnings – These are URLs that contain validation warnings for Google rich result features. These will always be for ‘recommended properties’, rather than required properties. Recommended properties can be included to add more information about content, which could provide a better user experience – but they are not essential to be eligible for rich snippets and hence why they are only a warning. There are no ‘warnings’ for Schema.org validation issues, however there is a warning for using the older data-vocabulary.org schema.
- Parse Errors – These are URLs which have structured data that failed to parse correctly. This is often due to incorrect mark-up. If you’re using Google’s preferred format JSON-LD, then the JSON-LD Playground is an excellent tool to help debug parsing errors.
- Microdata URLs – These are URLs that contain structured data in microdata format.
- JSON-LD URLs – These are URLs that contain structured data in JSON-LD format.
- RDFa URLs – These are URLs that contain structured data in RDFa format.
Structured Data & Google Rich Snippet Feature Validation
Structured Data validation includes checks against whether the types and properties exist according to Schema.org and will show ‘errors’ for any issues encountered.
For example, it checks to see whether https://schema.org/author exists for a property, or https://schema.org/Book exist as a type. It validates against main and pending Schema vocabulary from Schema.org latest version.
There might be a short time between a Schema.org vocabulary release, and it being updated in the SEO Spider.
The SEO Spider also performs validation against Google rich result features to check the presence of required and recommended properties and their values are accurate.
The full list of that the SEO Spider is able to validate against includes –
- Article & AMP Article
- COVID-19 announcements
- Critic Review
- Employer Aggregate Rating
- Estimated Salary
- Fact Check
- How To
- Image License
- Job Posting
- Job Training
- Local Business
- Q&A Page
- Review Snippet
- Sitelinks Searchbox
- Software App
- Subscription and Paywalled Content
The list of Google rich result features that the SEO Spider doesn’t currently validate against is –
- We currently support all Google features.
For more information on structured data validation, please read our guide on ‘How To Test & Validate Structured Data‘.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top