How To Automate Crawl Reports In Data Studio
How To Automate Crawl Reports In Data Studio
This tutorial explains how to set-up the Screaming Frog SEO Spider to create fully automated Google Data Studio crawl reports to monitor site health, detect issues, and track performance.
By connecting a scheduled crawl to a Google Drive account, the SEO Spider can append crawl overview data as a new row within a single Google Sheet.
This Google Sheet will automatically update each time the scheduled crawl is run, allowing you to integrate time series crawl data into any Data Studio reports.
Follow the steps in our tutorial below to get set-up with your first automated Google Data Studio crawl report.
Scheduling A Crawl
The ‘Data Studio friendly’ export is only available via automated crawl scheduling, which you can find under ‘File > Scheduling’ on the top menu.
To create a new scheduled crawl and Data Studio export click ‘Add’.
Please see our user guide on scheduling.
Working with multiple domains and/or clients?
You can configure a single scheduled crawl and Data Studio export, then use the ‘Duplicate’ button to copy these settings. Just remember to update the seed URL appropriately.
In the General tab of the scheduler specify a Task Name. This will be used to identify the Google Sheet export and any saved crawls. You can also provide a project name for any saved database crawls and descriptions to help differentiate similar scheduled reports.
A future date and time will also need to be specified for the first scheduled crawl. For automated reporting this will need to be set to run Daily, Weekly, or Monthly via the dropdown:
For Data Studio integration, we do not recommend changing the task name once set. Doing so will create a new export within Google Sheets, rather than appending to the existing spreadsheet.
Within the start options tab, specify whether you’d like the SEO Spider to crawl in regular Spider mode, or crawl a list of URLs in List mode.
To generate a configuration file simply enable all required settings within the user interface. Then head to ‘File > Configuration > Save As’. Please see our guide on saving and loading configuration profiles
Some filters require crawl analysis to run upon crawl completion. To enable this for scheduled crawls select ‘Crawl Analysis > Configure > Auto-Analyse at End of Crawl’ when building your configuration file.
Headless mode is required for the Data Studio friendly export to run, so you’ll need to enable this.
Select your appropriate Google account from the dropdown. If Exporting to Google sheets for the first time you’ll need to select ‘Manage’, then click ‘Add’ on the next window to add your Google account where you’d like to export.
This will bring up your browser, where you can select and sign into your Google Account. You’ll need to click ‘allow’ twice, before confirming your choices to ‘allow’ the SEO Spider to export data to your Google Drive account.
Once you’ve authorised the SEO Spider, you can click ‘OK’ and your account email will now be listed. Select this account and click ‘OK’.
For automated Data Studio reporting tick the ‘Custom Crawl overview’ option and click the ‘Configure’ button.
In this panel, you can customise what crawl overview information you’d like to include within the Google Sheets export. By default, we recommend selecting all available metrics and adding them to the selected box on the right-hand side. This can be done instantly by clicking the double-right arrow.
The order of metrics in the right-hand panel will be reflected in the order of columns within the exported Google Sheet. Therefore, we do not recommend adjusting the order once the initial report has run, as data and columns may become mixed.
Once all the above has been set and a crawl has run you will have a spreadsheet exported into your specified Google drive. By default, these will be associated in a folder path: ‘My Drive > Screaming Frog SEO Spider > Project Name > [task_name]_crawl_summary_report’.
It will export a single row of data for the crawl, with all the crawl overview top level metrics. When this has run several times you’ll have multiple rows of crawl information across several days, weeks, or months:
Crawl data is automatically appended as a new row to the Google Sheet, with chosen metrics from the custom crawl overview export in columns.
Please remember, this does not export every single URL from the crawl. There will be a single row per crawl, which contain crawl overview data.
Connecting to Google Data Studio
Once your Google Sheet has been set up you’ll want to pull this through to Data Studio. For this tutorial, we’re using our own Screaming Frog crawl overview template. But you can easily build your own report, or add to an existing report using the overview export.
Adding a Data Source
If copying our report, on the top-right select the ‘three dots > Make a Copy’:
You’ll then be presented with the option to select your Data Source, which you’ll need to choose ‘Create New Data Source’.
Alternatively if building your own report, select Create > Data Source from the Data Studio homepage.
When presented with a list of connectors, select Google Sheets.
You’ll then need to select the spreadsheet generated by the scheduled crawl. This will be labelled as [task_name]_crawl_summary_report (task name as specified within the scheduler options).
Ensure the ‘use first rows as headers’ option is ticked and select ‘Connect’ in the top-right.
You’ll then be presented with all the overview fields exported. The data source can also be renamed in the top—left to something easily identifiable.
Occasionally Data Studio will mark some fields as a type of ‘date’ rather than ‘Number’. We recommend sorting the ‘Type’ column and ensuring all fields are appropriately set as ‘Number’ using the dropdown selector- aside from the ‘Date’ field of course:
Once complete select ‘Add to Report’ in the top-right and ‘Copy Report’ in the following window.
With the Google sheet now added as a data source, anytime the automated crawl runs, the Data Studio report will update automatically. You can use this data to incorporate time-series graphs, scorecards or other elements using any of the exported overview metrics
Crawl Overview Template
Using this data you can begin building your own crawl monitoring reports with any crawl overview information. Our template has several tabs, examining different elements of site health.
For instance, you can monitor sitewide indexability:
Track on-page elements such as missing or duplicate title tags:
We can even bring in Core Web Vital information from the Chrome UX report and page speed opportunity data via the PSI API:
All the above allows you to easily identify and proactively fix any potential site issues or unintended changes.
The full list of tabs included in our template:
- Summary – overview report of various health metrics.
- Response Codes – monitor counts of response codes or blocked URLs over time.
- URL Types – track counts of internal HTML, images, JS files etc…
- Indexability – monitor sitewide indexability, easily identify trends or increases in non-indexable URLs.
- Site Structure – track site structure changes, often indicating an adjustment in internal linking.
- On-Page – identify changes in site metadata or headings.
- Content Issues – spot changes to page content, duplicate page counts, or spelling issues.
- PageSpeed – track CWV performance and identify opportunities to improve page experience.
- Structured Data – monitor validation issues and sitewide structured data usage.
- Security – analyze sitewide security issues and non-secure HTTP usage.
- Hreflang – track hreglang validation issues and usage.
- Sitemaps – identify sitemap validation errors, orphan URLs, and URLs not contained in sitemaps.
- URL Inspection – monitor data from Google URL Inspection API, track count of indexed URLs, or URL issues.
Adding Data from API Connectors
The Spider is capable of connecting to various API’s to integrate data from external sources – many of which can also be added to any Data Studio Reports.
For instance, the latest URL Inspection API can be integrated to report the number of URLs indexed within Google or the number of indexable URLs that are not being indexed by Google:
To Integrate any API data, simply ensure that the option is enabled when building the configuration file to use with the Scheduled crawl. In the instance above this means ensuring the ‘Enable URL Inspection’ within the Search Console API Settings is ticked in the configuration file being used:
Secondly, within the scheduling options ensure that the relevant API is enabled, and connected to the correct Search Console account & property under the ‘Configure’ option.
The same applies to any exported data from other relevant APIs such as Analytics or PageSpeed Insights.
It’s worth noting that any API integrations will be limited by their relevant quotas limits. For instance, the URL Inspection API is limited to 2,000 URLs per search console property a day. If your site is larger than 2,000 you will not be able to use the API across your entire site.
Adding New Filters to Existing Reports
As the SEO Spider is ever-evolving, we’re continually adding new features to help users identify and fix SEO issues. Often, new features also mean new filters are now available for reporting and integration within Data Studio.
No extra work is needed to integrate these into new reports built from scratch. However, for existing reports that have been collecting data for some time a few extra steps are required.
Following a feature release and updating the software to the latest version you should see any new filters are now available to select within the Crawl Overview Export selection:
Simply highlight these new filters and click the right arrow to add them to the bottom of the list of ‘Selected’ filters. Once your next scheduled crawl runs these data points will be added to the far right columns of the overview export Google sheet:
On existing reports, these new columns will need to have their headings manually added to the initial row of the spreadsheet. These headings should match the labelling and order within the Crawl Overview Export Selection.
For instance, in the example above the following headings will need to be manually added:
- Search Console:URL Is Not on Google
- Search Console:Indexable URL Not Indexed
- Search Console:URL is on Google, But Has Issues
- Search Console:User-Declared Canonical Not Selected
- Search Console:Page is Not Mobile Friendly
- Search Console:AMP URL Invalid
- Search Console:Rich Result Invalid
Once the data and headings have been added to your connected Google sheet, open up your Data Studio report and head to ‘Resource > Manage Added Data Sources’ on the top navigation:
Select ‘Edit’ on the Data Source for the connected Google sheet, then Click ‘Refresh Fields’ in the lower left-hand corner. This will bring up a window with the new fields added to the Google sheet, which you can simply select ‘Apply’:
The new Data will now be available to use within your existing Data Studio Report.
Within Data Studio you schedule reports to be sent via email on a daily, weekly, or monthly basis. Use this to notify yourself and any stakeholders each time the report has been updated.
Just click the dropdown next to ‘Share’ and select ‘Schedule email delivery’
In the next window add your recipients, any custom subject or message, time, and how often you would like the email to be sent.
Schedule Email Time
Ensure you allow enough time for the crawl to complete and Google sheets to sync when setting the email time. For instance, if you’re crawl normally takes 1 hour to fully complete, set email delivery for at least an hour after the initial crawl schedule time.
Frequenty Asked Questions
Some common FAQ’s we see:
Why didn’t my Scheduled crawl complete?
If a scheduled crawl did not run at a set time, check that your device was turned-on and logged in at the scheduled time. You may need to adjust power-saving options to prevent devices from sleeping when needed for a crawl. For instance, many laptops will only stay on if plugged in, but you can adjust this behaviour in power saving options:
If you are certain the device was turned on head to File > Scheduling > History. This panel will show any errors the Spider encountered that may have prevented the crawl from completing. You can then make adjustments as needed.
Why are the scorecards and graphs showing ‘Error, See Details’
If you’ve connected your Google sheet to our Data Studio Crawl Overview template and seeing most of the scorecards show and ‘Error, See Details’ and blank graphs, this is likely due to one of the following reasons:
- Our Crawl Overview template was designed to only be used with the custom crawl overview export, generated through the crawl scheduler. If you’ve added another data source such as an export of the Internal-HTML tab this will not work in our report.
- When adding the custom overview export as a data source, Data Studio may set several of the metrics to a type other than ‘Number’, all data points aside from the date and URLs should be set to a type of Number. If not, it won’t be possible to use them in graphs and scorecards.
- Our crawl overview Report template was designed to work solely with the English export. If you’ve set the Spider language settings to another language the headings on the export will be different. In this case, you’ll need to manually add each of the data points again.
- There are several DataStudio related reasons charts may not display data, for instance, if the date range is set to a period with no crawl data.
Why are the top graphs on the PageSpeed tab not populating?
The top three graphs on the PageSpeed tab use data from the Chrome UX connector, rather than the Spider export. You may need to re-add this as a data source and filter to your domain.
See more details on the Chrome UX connector here:
Why are graphs in some tabs not showing data?
Some graphs within our crawl overview template require specific configuration options to be enabled during the scheduled crawl. For instance, to monitor sitemap health, ensure that the configuration file has ‘crawl these sitemaps enabled under Configuration > Spider.
Tabs that require specific configuration options include:
- Content Issues
- Structured Data
- URL Inspection
Some tabs, such as sitemaps & hreflang also require post-crawl analysis to run following the scheduled crawl. This can be enabled in the configuration file under Crawl Analysis > Configure > tick ‘auto-analyse at end of crawl’.
Why are some mertics missing from my copied report?
If copying our Crawl Overview template, some custom fields may not be transferred to your copy. For example, we have a custom non-200 field that counts all non-200 URLs in the ‘Summary’ and ‘Indexability’ dashboards for ‘Non-Indexability Status’ graphs. You may see this labelled as ‘Record Count’ after copying. If this does occur, then you can add custom fields with bespoke formulas. For example, for a count of non-200 URLs, click on the ‘Non-Indexability Status’ graph to edit, then ‘Add Metric > Create Field’, type in the ‘Non-200 URLs’ as the name and the following formula:
Community Data Studio Crawl Reports
We need your help!
There is so much customisation that can be done in Google Data Studio, we want to see how you might utilise this to build your own custom reports.
If you’ve built your own custom crawl report in Data Studio or have integrated as part of a wider SEO report and want to share it, please send it to firstname.lastname@example.org or tweet us (@screamingfrog), as we’d love to feature it here in a community gallery.
Not only will you help others in the SEO community, you’ll also receive a shout out from us.
The guide above should illustrate how to use the SEO Spider to automate crawl reports in Google Data Studio.
If you have any further queries, feedback or suggestions to improve our Google Sheets or Data Studio integration in the SEO Spider then just get in touch with our team via support.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top