How To Automate Crawl Reports In Data Studio
How To Automate Crawl Reports In Data Studio
This tutorial explains how to set-up the Screaming Frog SEO Spider to create fully automated Google Data Studio crawl reports to monitor site health, detect issues, and track performance.
By connecting a scheduled crawl to a Google Drive account, the SEO Spider can append crawl overview data as a new row within a single Google Sheet.
This Google Sheet will automatically update each time the scheduled crawl is run, allowing you to integrate time series crawl data into any Data Studio reports.
Follow the steps in our tutorial below to get set-up with your first automated Google Data Studio crawl report.
Scheduling A Crawl
The ‘Data Studio friendly’ export is only available via automated crawl scheduling, which you can find under ‘File > Scheduling’ on the top menu.
To create a new scheduled crawl and Data Studio export click ‘Add’.
Please see our user guide on scheduling.
You can configure a single scheduled crawl and Data Studio export, then use the ‘Duplicate’ button to copy these settings. Just remember to update the seed URL appropriately.
In the General tab of the scheduler specify a Task Name. This will be used to identify the Google Sheet export and any saved crawls. You can also provide a project name for any saved database crawls and descriptions to help differentiate similar scheduled reports.
A future date and time will also need to be specified for the first scheduled crawl. For automated reporting this will need to be set to run Daily, Weekly, or Monthly via the dropdown:
For Data Studio integration, we do not recommend changing the task name once set. Doing so will create a new export within Google Sheets, rather than appending to the existing spreadsheet.
Within the start options tab, specify whether you’d like the SEO Spider to crawl in regular Spider mode, or crawl a list of URLs in List mode.
To generate a configuration file simply enable all required settings within the user interface. Then head to ‘File > Configuration > Save As’. Please see our guide on saving and loading configuration profiles
Some filters require crawl analysis to run upon crawl completion. To enable this for scheduled crawls select ‘Crawl Analysis > Configure > Auto-Analyse at End of Crawl’ when building your configuration file.
Headless mode is required for the Data Studio friendly export to run, so you’ll need to enable this.
Select your appropriate Google account from the dropdown. If Exporting to Google sheets for the first time you’ll need to select ‘Manage’, then click ‘Add’ on the next window to add your Google account where you’d like to export.
This will bring up your browser, where you can select and sign into your Google Account. You’ll need to click ‘allow’ twice, before confirming your choices to ‘allow’ the SEO Spider to export data to your Google Drive account.
Once you’ve authorised the SEO Spider, you can click ‘OK’ and your account email will now be listed. Select this account and click ‘OK’.
For automated Data Studio reporting tick the ‘Custom Crawl overview’ option and click the ‘Configure’ button.
In this panel, you can customise what crawl overview information you’d like to include within the Google Sheets export. By default, we recommend selecting all available metrics and adding them to the selected box on the right-hand side. This can be done instantly by clicking the double-right arrow.
The order of metrics in the right-hand panel will be reflected in the order of columns within the exported Google Sheet. Therefore, we do not recommend adjusting the order once the initial report has run, as data and columns may become mixed.
Once all the above has been set and a crawl has run you will have a spreadsheet exported into your specified Google drive. By default, these will be associated in a folder path: ‘My Drive > Screaming Frog SEO Spider > Project Name > [task_name]_crawl_summary_report’.
When this has run several times you’ll have multiple rows of crawl information across several days, weeks, or months:
Crawl data is automatically appended as a new row to the Google Sheet, with chosen metrics from the custom crawl overview export in columns.
Connecting to Google Data Studio
Once your Google Sheet has been set up you’ll want to pull this through to Data Studio. For this tutorial, we’re using our own Screaming Frog crawl overview template. But you can easily build your own report, or add to an existing report using the overview export.
Adding a Data Source
If copying our report, on the top-right select the ‘three dots > Make a Copy’:
You’ll then be presented with the option to select your Data Source, which you’ll need to choose ‘Create New Data Source’.
Alternatively if building your own report, select Create > Data Source from the Data Studio homepage.
When presented with a list of connectors, select Google Sheets.
You’ll then need to select the spreadsheet generated by the scheduled crawl. This will be labelled as [task_name]_crawl_summary_report (task name as specified within the scheduler options).
Ensure the ‘use first rows as headers’ option is ticked and select ‘Connect’ in the top-right.
You’ll then be presented with all the overview fields exported. The data source can also be renamed in the top—left to something easily identifiable.
Occasionally Data Studio will mark some fields as a type of ‘date’ rather than ‘Number’. We recommend sorting the ‘Type’ column and ensuring all fields are appropriately set as ‘Number’ using the dropdown selector- aside from the ‘Date’ field of course:
Once complete select ‘Add to Report’ in the top-right and ‘Copy Report’ in the following window.
With the Google sheet now added as a data source, anytime the automated crawl runs, the Data Studio report will update automatically. You can use this data to incorporate time-series graphs, scorecards or other elements using any of the exported overview metrics
Crawl Overview Template
Using this data you can begin building your own crawl monitoring reports with any crawl overview information. Our template has several tabs, examining different elements of site health.
For instance, you can monitor sitewide indexability:
Track on-page elements such as missing or duplicate title tags:
We can even bring in Core Web Vital information from the Chrome UX report and page speed opportunity data via the PSI API:
All the above allows you to easily identify and proactively fix any potential site issues or unintended changes.
The full list of tabs included in our template:
- Summary – overview report of various health metrics.
- Response Codes – monitor counts of response codes or blocked URLs over time.
- URL Types – track counts of internal HTML, images, JS files etc…
- Indexability – monitor sitewide indexability, easily identify trends or increases in non-indexable URLs.
- Site Structure – track site structure changes, often indicating an adjustment in internal linking.
- On-Page – identify changes in site metadata or headings.
- Content Issues – spot changes to page content, duplicate page counts, or spelling issues.
- PageSpeed – track CWV performance and identify opportunities to improve page experience.
- Structured Data – monitor validation issues and sitewide structured data usage.
- Security – analyze sitewide security issues and non-secure HTTP usage.
- Hreflang – track hreglang validation issues and usage.
- Sitemaps – identify sitemap validation errors, orphan URLs, and URLs not contained in sitemaps.
Within Data Studio you schedule reports to be sent via email on a daily, weekly, or monthly basis. Use this to notify yourself and any stakeholders each time the report has been updated.
Just click the dropdown next to ‘Share’ and select ‘Schedule email delivery’
In the next window add your recipients, any custom subject or message, time, and how often you would like the email to be sent.
Ensure you allow enough time for the crawl to complete and Google sheets to sync when setting the email time. For instance, if you’re crawl normally takes 1 hour to fully complete, set email delivery for at least an hour after the initial crawl schedule time.
There’s a few additional items to be aware of:
- For automated crawls to run, the device it’s scheduled on needs to be online and logged in at the appropriate time. The machine can’t be in sleep mode, or scheduled crawls won’t run.
- Our template utilises data from the Chrome user-experience report for the ‘PageSpeed’ dashboard. When adding as a data source you’ll need to update the domain appropriately. You can click on the graphs and change the origin URL parameter (which is set by default as https://www.screamingfrog.co.uk).
- If copying our Crawl Overview template, some custom fields may not be transferred to your copy. For example, we have a custom non-200 field that counts all non-200 URLs in the ‘Summary’ and ‘Indexability’ dashboards for ‘Non-Indexability Status’ graphs. You may see this labelled as ‘Record Count’ after copying. If this does occur, then you can add custom fields with bespoke formulas. For example, for a count of non-200 URLs, click on the ‘Non-Indexability Status’ graph to edit, then ‘Add Metric > Create Field’, type in the ‘Non-200 URLs’ as the name and the following formula:
Community Data Studio Crawl Reports
We need your help!
There is so much customisation that can be done in Google Data Studio, we want to see how you might utilise this to build your own custom reports.
If you’ve built your own custom crawl report in Data Studio or have integrated as part of a wider SEO report and want to share it, please send it to email@example.com or tweet us (@screamingfrog), as we’d love to feature it here in a community gallery.
Not only will you help others in the SEO community, you’ll also receive a shout out from us.
The guide above should illustrate how to use the SEO Spider to automate crawl reports in Google Data Studio.
If you have any further queries, feedback or suggestions to improve our Google Sheets or Data Studio integration in the SEO Spider then just get in touch with our team via support.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top