How To Test & Validate Structured Data
Structured Data Testing Using The SEO Spider
Structured data provides search engines with explicit clues about the meaning of pages and their components and can enable special search result features and enhancements in Google.
The different Google ‘rich result’ search features require different types of structured data and implementing them can help achieve rich snippets (a more stand-out, detailed ‘snippet’ in the search results) which may result in more traffic.
Google’s Structured Data Testing Tool can help review and validate structured data implementation against their rich result features requirements. Unfortunately, there isn’t an API, it doesn’t allow auditing of URLs in bulk and sometimes misses or miss-classifies validation issues between required and recommended properties.
Google are working on enhancement reports to help monitor structured data in Search Console as well, although they don’t yet support all rich result features.
With the above in mind, our team built our own structured data validator into the Screaming Frog SEO Spider to help make the auditing process more efficient and at scale.
1) Enable Structured Data Options Under ‘Configuration > Spider > Extraction’
Tick ‘JSON-LD’, ‘Microdata’, ‘RDFa’, ‘Schema.org Validation’ and ‘Google Validation’.
While Schema.org vocabulary is case sensitive (and you can enable this option), Google is not as strict – so this isn’t required for Google rich result features and their understanding of structured data.
2) Crawl The Website
Open up the SEO Spider, type or copy in the website you wish to crawl in the ‘Enter URL to spider’ box and hit ‘Start’.
The crawl will then start and structured data will be extracted from pages and validated. Grab a coffee and wait until the progress bar reaches 100%, and the crawl is completed.
3) View The Structured Data Tab
The Structured Data tab shows all URLs found in a crawl and the different structured data types in separate corresponding columns, as well as totals, errors and warnings in the main window pane.
The structured tab has 8 filters that help you understand the structured data implementation, and identify validation issues. The ‘Total Types’ and ‘Unique Types’ columns count the number of structured data itemtypes that has been discovered for each URL.
The right-hand overview window pane provides a summary of data contained within each tab and filter, so you know where to click, without having to check each filter to see if there’s data.
You’re able to filter by the following –
- Contains Structured Data – These are simply any URLs that contain structured data. You can see the different types in columns in the upper window.
- Missing Structured Data – These are URLs that do not contain any structured data.
- Validation Errors – These are URLs that contain validation errors. The errors can be either Schema.org, Google rich result features, or both – depending on your configuration. Schema.org issues will always be classed as errors, rather than warnings. Google rich result feature validation will show errors for missing required properties or problems with the implementation of required and recommended properties. Google’s ‘required properties’ must be included and valid for content to be eligible for display as a rich result.
- Validation Warnings – These are URLs that contain validation warnings for Google rich result features. These will always be for ‘recommended properties’, rather than required properties. Recommended properties can be included to add more information about content, which could provide a better user experience – but they are not essential to be eligible for rich snippets and hence why they are only a warning. There are no ‘warnings’ for Schema.org validation issues, however there is a warning for using the older data-vocabulary.org schema.
- Parse Errors – These are URLs which have structured data that failed to parse correctly. This is often due to incorrect mark-up. If you’re using Google’s preferred format JSON-LD, then the JSON-LD Playground is an excellent tool to help debug parsing errors.
- Microdata URLs – These are URLs that contain structured data in microdata format.
- JSON-LD URLs – These are URLs that contain structured data in JSON-LD format.
- RDFa URLs – These are URLs that contain structured data in RDFa format.
4) View The Lower Window Pane ‘Structured Data Details’ Tab To Analyse Validation Errors & Warnings
The structured data details lower window pane provides further information on the items and issues discovered. The left-hand side of the lower window pane shows property values and icons against them when there are errors or warnings, and the right-hand window provides detail on the specific validation issues.
The right-hand side of the lower window pane will detail the exact validation type (Schema.org, or the relevant Google rich result feature), the severity (an error, warning or just info) and a message for the specific issue to fix. It will also provide a link to the specific Schema.org property to provide more detail on requirements.
5) Refer To Schema.org or Google Rich Result Feature Documentation To Better Understand Validation Issues
Structured data can be challenging, even with the help of tools. So always refer to the relevant documentation to provide more context and follow the guidelines.
A simple rule for structured data auditing is to fix validation errors to ensure content is considered for Google rich results feature and rich snippets, and consider whether the information in warnings would be useful for users, and either implement or ignore.
Validation issues are based upon Google rich results feature required and recommended properties and Schema.org specifications. Reviewing their guidelines will provide a better understanding of the validation issue. Let’s look at some examples and this process.
Google Product Validation Error Example
In the example below, we can see lv.com have ‘Google Product’ feature validation errors and warnings. The right-hand window pane lists those required (with an error), and recommended (with a warning).
You could argue ‘product’ shouldn’t be used here, but as it is, it will be validated against Google product feature guidelines, whereas per the Google documentation an image is required, and there are half a dozen other recommended properties that are missing.
The recommended properties highlighted as warnings can either be implemented to add more information about the content (which may provide a better user experience) or just ignored.
Google Corporate Contact Validation Error Example
In the next example below, Direct Line have a Google corporate contact feature validation error against the use of ‘customer service general enquiries’ in the ‘contactType‘ schema property.
The right-hand window pane explains that ‘http://schema.org/contactType’ is required to be ‘customer service’ or ‘customer support’ or ‘technical support’ or ‘billing support’ or ‘bill payment’ or ‘sales’ or ‘reservations’ or ‘credit card support’ or ’emergency’ or ‘baggage tracking’ or ‘roadside assistance’ or ‘package tracking’ in ‘ContactPoint’.
As shown above, the validation error matches up with Google’s requirements from their documentation. However, the Google Structured Data Testing Tool does not pick up on this as a validation error.
While Google’s tool might be less strict or miss some items, we recommend following the guidelines so all structured data is in the correct format to ensure it’s machine-readable and there are no issues.
Google Aggregate Rating (Review Snippet) Validation Error Example
In this example, Admiral have a Google Aggregate Rating error, which is part of Google’s review snippet rich result feature.
The issue says that the worstRating property is required for AggregateRating. Once again, referring to Google’s review snippet documentation, we can see that both bestRating and worstRating are actually only recommended properties.
However, reading further, they are required if the rating system is not a 5-point scale. In this case, Admiral are using a 10-point scale. So this makes them a requirement, and they are correctly using bestRating, but haven’t included worstRating. This isn’t picked up by the Google Structured Data Testing Tool.
Google Breadcrumb Validation Error Example
In our final example below, HSBC have a Google breadcrumb rich result feature error. The issue states that ‘http://schema.org/item’ property is required for ‘ListItem’.
Google’s breadcrumb feature guidelines state that recommended properties must include item (the URL of the webpage), name (the title of the breadcrumb) and position (of the breadcrumb in the trail). HSBC are simply missing the item property and associated URL to take advantage of this rich result feature.
In summary, always review the appropriate documentation and guidelines to verify validation errors and warnings.
6) Bulk Export Validation Errors & Warnings Using ‘Reports > Structured Data’ Reporting
There are two bulk exports available for structured data via the ‘reports’ top-level menu.
The ‘Validation Errors & Warnings Summary’ report is particularly useful, as it aggregates the data to unique issues discovered (rather than reporting every instance) and shows the number of URLs affected by each issue, with a sample URL with the specific issue. An example report can be seen below.
This means the report is highly condensed and ideal for a developer who wants to know the unique validation issues that need to be fixed across the site. The ‘Validation Errors & Warnings’ export is a bulk export of every error and warning discovered alongside the URL it’s found upon.
Frequently Asked Questions
What Schema.org and Google Rich Result Feature Validation Does The SEO Spider Perform?
Schema.org validation includes checks against whether the types and properties exist against Schema vocabulary and will show ‘errors’ for any issues encountered. For example, the Schema.org validation will check to see whether http://schema.org/author exists for a property, or http://schema.org/Book exists as a type.
It validates against main and pending Schema vocabulary from Schema.org and is updated regularly to the latest versions with new releases of the SEO Spider.
The SEO Spider also validates structured data against Google rich result features. It uses Google’s own documentation and guidelines to perform validations against required and recommended properties.
As noted above, ‘required’ property issues will result in errors, and ‘recommended’ property issues will result in errors if there are problems with the existing implementation, or warnings if they are missing. This is similar to the way Google’s own Structured Data Testing tool classifies errors and warnings.
The full list of Google rich result features that the SEO Spider is able to validate against includes –
- Article & AMP Article
- COVID-19 announcements
- Critic Review
- Employer Aggregate Rating
- Estimated Salary
- Fact Check
- How To
- Image License
- Job Posting
- Job Training
- Local Business
- Q&A Page
- Review Snippet
- Sitelinks Searchbox
- Software App
- Subscription and Paywalled Content
The list of Google rich result features that the SEO Spider doesn’t currently validate against is –
- We currently support all Google features.
Why Do Validation Errors & Warnings Differ To Google’s Structured Data Testing or Rich Results Tools?
We highly recommend using the Google Structured Data Testing Tool. It’s an excellent tool and useful for reviewing and validating structured data.
However, there are occasions where the results will differ between the two tools. We generally find that the SEO Spider picks up on more errors and warnings. Google’s tool is often more relaxed than their documentation, it sometimes misses issues, or miss-classifies them.
We shared a couple of small inconsistencies in our examples above, and while it’s generally reliable, like any tool it’s not perfect. The purpose of this isn’t to pick on Google’s SDTT, but point out that it won’t always be accurate and the results shouldn’t be blindly followed without consideration.
One larger and more common issue we have seen is that Google appear to check AMP required and recommended properties against Non-AMP URLs.
However, reviewing Google’s Article feature guidelines we can see for Non-AMP URLs these are not requirements. They are only required for AMP URLs. The SEO Spider won’t show these errors, as it correctly determines that it isn’t an AMP.
The SEO Spider will not be perfect either, so we recommend using both tools in combination and referring to the appropriate documentation to validate results and any differences discovered.
We also recommend using the enhancement reports in Google Search Console to help validate errors. While support for all rich result features is gradually being rolled out, some popular types are available. Google also allows you to test structured data on a URL level using the URL Inspection Tool, see more information about the issue, and validate fixes.
The Rich Results Tool can also help you test whether a site is eligible for rich snippets, although only a small subset of types are supported still. However, as noted by Dave Ojeda, while currently still in beta, Google announced at I/O 2019 that the Rich Results Tool is the successor to the SDTT. It shows validation errors like the SDTT, although to add to the confusion the two tools can occasionally show different results.
Another point of consideration is Google’s guidelines continue to evolve, with deprecated features (Social Profile), renaming (LocalBusiness to Local Business Listing) and amendments to required and recommended properties.
A good example is Google’s local business requirements, which have changed recently but the older guidelines can still be viewed courtesy of Archive.org. An old requirement was for the ‘addressCountry’ property to be a 2-letter ISO 3166-1 alpha-2 country code.
This differed to Schema.org/addressCountry requirements, which state that you could just use the country, for example ‘USA’.
While the SEO Spider picked this up as a Google local business validation error based upon their own guidelines, the Google Structured Data Testing Tool didn’t. This caused some confusion, as Google’s documentation was inconsistent with their own tool.
Trying to get confirmation from G on this one. SDTT is inconsistent with their own documentation for local business search feature if you review. https://t.co/DZfI4EmKQP it’s cool obviously. I think it’s fine from experience, but be useful to have doc updated if so. :-)
— Screaming Frog (@screamingfrog) March 12, 2019
Google’s local business rich result feature guidelines have now been updated to remove this requirement, referring only to Schema.org PostalAddress. Although, the ‘AreaServed’ property for the corporate contact feature still has this requirement, and might also need to be adjusted.
We track these changes as closely as possible and update the SEO Spider accordingly, but there might be a delay before they are made available between SEO Spider releases (unfortunately we don’t get a pre-warning!).
Some of Google’s structured data documentation is open to interpretation as well, so we welcome feedback from the SEO community via our support to continue to improve the tool.
Finally, it’s important to remember that Google recommend following their guidelines even if the Structured Data Testing Tool is more relaxed about certain properties or values. John Mueller (a Webmaster Trends Analyst at Google) replied with advice on inconsistencies discovered between their documentation and tools.
I'd use what's in the docs. Our tools might be a bit more permissive, but if you want to be sure to do it the recommended way, follow the guidance in the docs.
— 🍌 John 🍌 (@JohnMu) March 13, 2019
The guide above should help illustrate the simple steps required to audit and validate structured data across a website using the SEO Spider tool.
If you have any further queries, then just get in touch via support.