SEO Spider

How To Automate Crawl Reports In Looker Studio

How To Automate Crawl Reports In Looker Studio

This tutorial explains how to set-up the Screaming Frog SEO Spider to create fully automated Google Looker Studio crawl reports to monitor site health, detect issues, and track performance.

By connecting a scheduled crawl to a Google Drive account, the SEO Spider can append crawl overview data as a new row within a single Google Sheet.

This Google Sheet will automatically update each time the scheduled crawl is run, allowing you to integrate time series crawl data into any Looker Studio reports.

Follow the steps in our tutorial below to get set-up with your first automated Google Looker Studio crawl report.


Scheduling A Crawl

The ‘Looker Studio friendly’ export is only available via automated crawl scheduling, which you can find under ‘File > Scheduling’ on the top menu.

To create a new scheduled crawl and Looker Studio export click ‘Add’.
schedule a crawl

Please see our user guide on scheduling.

Working with multiple domains and/or clients?

You can configure a single scheduled crawl and Looker Studio export, then use the ‘Duplicate’ button to copy these settings. Just remember to update the seed URL appropriately.


General

In the General tab of the scheduler specify a Task Name. This will be used to identify the Google Sheet export and any saved crawls. You can also provide a project name for any saved database crawls and descriptions to help differentiate similar scheduled reports.

A future date and time will also need to be specified for the first scheduled crawl. For automated reporting this will need to be set to run Daily, Weekly, or Monthly via the dropdown:

create a scheduled task

Task Name

For Looker Studio integration, we do not recommend changing the task name once set. Doing so will create a new export within Google Sheets, rather than appending to the existing spreadsheet.


Start Options

Within the start options tab, specify whether you’d like the SEO Spider to crawl in regular Spider mode, or crawl a list of URLs in List mode.

By default, the scheduled crawl will run with the default configuration. However, by adding a configuration file to the ‘Crawl Config’ option, these settings will be used instead. This is required if you’d like none-default information within the Looker Studio report. For example, sitemap information, structured data validation, or JavaScript rendering etc…

To generate a configuration file simply enable all required settings within the user interface. Then head to ‘File > Configuration > Save As’. Please see our guide on saving and loading configuration profiles

Start Options

Crawl Analysis

Some filters require crawl analysis to run upon crawl completion. To enable this for scheduled crawls select ‘Crawl Analysis > Configure > Auto-Analyse at End of Crawl’ when building your configuration file.


Export

Headless mode is required for the Looker Studio friendly export to run, so you’ll need to enable this.

Select your appropriate Google account from the dropdown. If Exporting to Google sheets for the first time you’ll need to select ‘Manage’, then click ‘Add’ on the next window to add your Google account where you’d like to export.
Exporting to Google Sheets 'Add' account

This will bring up your browser, where you can select and sign into your Google Account. You’ll need to click ‘allow’ twice, before confirming your choices to ‘allow’ the SEO Spider to export data to your Google Drive account.
Google Sheets access for exporting
Once you’ve authorised the SEO Spider, you can click ‘OK’ and your account email will now be listed. Select this account and click ‘OK’.

For automated Looker Studio reporting tick the ‘Custom Crawl overview’ option and click the ‘Configure’ button.
export options
In this panel, you can customise what crawl overview information you’d like to include within the Google Sheets export. By default, we recommend selecting all available metrics and adding them to the selected box on the right-hand side. This can be done instantly by clicking the double-right arrow.

Metric Order

The order of metrics in the right-hand panel will be reflected in the order of columns within the exported Google Sheet. Therefore, we do not recommend adjusting the order once the initial report has run, as data and columns may become mixed.

Once all the above has been set and a crawl has run you will have a spreadsheet exported into your specified Google drive. By default, these will be associated in a folder path: ‘My Drive > Screaming Frog SEO Spider > Project Name > [task_name]_custom_summary_report’.

It will export a single row of data for the crawl, with all the crawl overview top level metrics. When this has run several times you’ll have multiple rows of crawl information across several days, weeks, or months:

Crawl data is automatically appended as a new row to the Google Sheet, with chosen metrics from the custom crawl overview export in columns.

Please remember, this does not export every single URL from the crawl. There will be a single row per crawl, which contain crawl overview data.


Connecting to Google Looker Studio

Once your Google Sheet has been set up you’ll want to pull this through to Looker Studio. For this tutorial, we’re using our own Screaming Frog crawl overview template. But you can easily build your own report, or add to an existing report using the overview export.


Adding a Data Source

If copying our report, on the top-right select the ‘three dots > Make a Copy’:

You’ll then be presented with the option to select your Data Source, which you’ll need to choose ‘Create New Data Source’.

Alternatively if building your own report, select Create > Data Source from the Looker Studio homepage.

When presented with a list of connectors, select Google Sheets.

You’ll then need to select the spreadsheet generated by the scheduled crawl. This will be labelled as [task_name]_custom_summary_report (task name as specified within the scheduler options).

Ensure the ‘use first rows as headers’ option is ticked and select ‘Connect’ in the top-right.

You’ll then be presented with all the overview fields exported. The data source can also be renamed in the top—left to something easily identifiable.

Occasionally Looker Studio will mark some fields as a type of ‘date’ rather than ‘Number’. We recommend sorting the ‘Type’ column and ensuring all fields are appropriately set as ‘Number’ using the dropdown selector- aside from the ‘Date’ field of course:

Once complete select ‘Add to Report’ in the top-right and ‘Copy Report’ in the following window.


Adding Charts

With the Google sheet now added as a data source, anytime the automated crawl runs, the Looker Studio report will update automatically. You can use this data to incorporate time-series graphs, scorecards or other elements using any of the exported overview metrics


Crawl Overview Template

Using this data you can begin building your own crawl monitoring reports with any crawl overview information. Our template has several tabs, examining different elements of site health.

For instance, you can monitor sitewide indexability:

Track on-page elements such as missing or duplicate title tags:

We can even bring in Core Web Vital information from the Chrome UX report and page speed opportunity data via the PSI API:

All the above allows you to easily identify and proactively fix any potential site issues or unintended changes.

The full list of tabs included in our template:

  • Summary – overview report of various health metrics.
  • Response Codes – monitor counts of response codes or blocked URLs over time.
  • URL Types – track counts of internal HTML, images, JS files etc…
  • Indexability – monitor sitewide indexability, easily identify trends or increases in non-indexable URLs.
  • Site Structure – track site structure changes, often indicating an adjustment in internal linking.
  • On-Page – identify changes in site metadata or headings.
  • Content Issues – spot changes to page content, duplicate page counts, or spelling issues.
  • PageSpeed – track CWV performance and identify opportunities to improve page experience.
  • Structured Data – monitor validation issues and sitewide structured data usage.
  • Security – analyze sitewide security issues and non-secure HTTP usage.
  • Hreflang – track hreglang validation issues and usage.
  • Sitemaps – identify sitemap validation errors, orphan URLs, and URLs not contained in sitemaps.
  • JavaScript – analyze JavaScript usage and its impact on metadata, content, and internal linking.
  • URL Inspection – monitor data from Google URL Inspection API, track count of indexed URLs, or URL issues.

Adding Data from API Connectors

The Spider is capable of connecting to various API’s to integrate data from external sources – many of which can also be added to any Looker Studio Reports.

For instance, the latest URL Inspection API can be integrated to report the number of URLs indexed within Google or the number of indexable URLs that are not being indexed by Google:

To Integrate any API data, simply ensure that the option is enabled when building the configuration file to use with the Scheduled crawl. In the instance above this means ensuring the ‘Enable URL Inspection’ within the Search Console API Settings is ticked in the configuration file being used:

Secondly, within the scheduling options ensure that the relevant API is enabled, and connected to the correct Search Console account & property under the ‘Configure’ option.

The same applies to any exported data from other relevant APIs such as Analytics or PageSpeed Insights.

It’s worth noting that any API integrations will be limited by their relevant quotas limits. For instance, the URL Inspection API is limited to 2,000 URLs per search console property a day. If your site is larger than 2,000 you will not be able to use the API across your entire site.


Segmenting Automated Crawl Reports

You can create segmented custom crawl overview reports in Google Sheets, by setting up segments, saving the configuration, and then supplying the saved configuration file in the scheduling start options tab.

The SEO Spider will generate the default custom crawl overview in Google Sheets, as well as one for each segment.

Segmented automated crawl reports

The name of each segmented file is appended to the sheet name: ‘[task_name]_custom_summary_report_[segment_name]’. These are are stored in: ‘My Drive > Screaming Frog SEO Spider > Project Name’.

This means each segmented crawl overview exported could be hooked up to their own Looker Studio crawl report. Alternatively, a ‘Segments’ page could be created within the existing Looker Studio crawl report, with a top-level summary of data from each segmented custom summary report Google Sheet.


Adding New Filters to Existing Reports

As the SEO Spider is ever-evolving, we’re continually adding new features to help users identify and fix SEO issues. Often, new features also mean new filters are now available for reporting and integration within Looker Studio.

No extra work is needed to integrate these into new reports built from scratch. However, for existing reports that have been collecting data for some time a few extra steps are required.

Following a feature release and updating the software to the latest version you should see any new filters are now available to select within the Crawl Overview Export selection:

Simply highlight these new filters and click the right arrow to add them to the bottom of the list of ‘Selected’ filters. Once your next scheduled crawl runs these data points will be added to the far right columns of the overview export Google sheet:

On existing reports, these new columns will need to have their headings manually added to the initial row of the spreadsheet. These headings should match the labelling and order within the Crawl Overview Export Selection.

For instance, in the example above the following headings will need to be manually added:

  • Search Console:URL Is Not on Google
  • Search Console:Indexable URL Not Indexed
  • Search Console:URL is on Google, But Has Issues
  • Search Console:User-Declared Canonical Not Selected
  • Search Console:Page is Not Mobile Friendly
  • Search Console:AMP URL Invalid
  • Search Console:Rich Result Invalid

Once the data and headings have been added to your connected Google sheet, open up your Looker Studio report and head to ‘Resource > Manage Added Looker Sources’ on the top navigation:

Select ‘Edit’ on the Data Source for the connected Google sheet, then Click ‘Refresh Fields’ in the lower left-hand corner. This will bring up a window with the new fields added to the Google sheet, which you can simply select ‘Apply’:

The new Data will now be available to use within your existing Loooker Studio Report.


Email Scheduling

Within Looker Studio you schedule reports to be sent via email on a daily, weekly, or monthly basis. Use this to notify yourself and any stakeholders each time the report has been updated.

Just click the dropdown next to ‘Share’ and select ‘Schedule email delivery’

In the next window add your recipients, any custom subject or message, time, and how often you would like the email to be sent.

Schedule Email Time
Ensure you allow enough time for the crawl to complete and Google sheets to sync when setting the email time. For instance, if you’re crawl normally takes 1 hour to fully complete, set email delivery for at least an hour after the initial crawl schedule time.


Frequenty Asked Questions

Some common FAQ’s we see:

Why didn’t my Scheduled crawl complete?

If a scheduled crawl did not run at a set time, check that your device was turned-on and logged in at the scheduled time. You may need to adjust power-saving options to prevent devices from sleeping when needed for a crawl. For instance, many laptops will only stay on if plugged in, but you can adjust this behaviour in power saving options:

If you are certain the device was turned on head to File > Scheduling > History. This panel will show any errors the Spider encountered that may have prevented the crawl from completing. You can then make adjustments as needed.

Why are the scorecards and graphs showing ‘Error, See Details’

If you’ve connected your Google sheet to our Looker Studio Crawl Overview template and seeing most of the scorecards show and ‘Error, See Details’ and blank graphs, this is likely due to one of the following reasons:

  • Our Crawl Overview template was designed to only be used with the custom crawl overview export, generated through the crawl scheduler. If you’ve added another data source such as an export of the Internal-HTML tab this will not work in our report.
  • For reference, we’ve created a Google Sheet containing all datatypes for the dimensions.
  • When adding the custom overview export as a data source, Looker Studio may set several of the metrics to a type other than ‘Number’, all data points aside from the date and URLs should be set to a type of Number. If not, it won’t be possible to use them in graphs and scorecards.
  • Our crawl overview Report template was designed to work solely with the English export. If you’ve set the Spider language settings to another language the headings on the export will be different. In this case, you’ll need to manually add each of the data points again.
  • There are several DataStudio related reasons charts may not display data, for instance, if the date range is set to a period with no crawl data.

Why are the top graphs on the PageSpeed tab not populating?

The top three graphs on the PageSpeed tab use data from the Chrome UX connector, rather than the Spider export. You may need to re-add this as a data source and filter to your domain.

See more details on the Chrome UX connector here:

https://web.dev/chrome-ux-report-data-studio-dashboard/

Why are graphs in some tabs not showing data?

Some graphs within our crawl overview template require specific configuration options to be enabled during the scheduled crawl. For instance, to monitor sitemap health, ensure that the configuration file has ‘crawl these sitemaps enabled under Configuration > Spider.

Tabs that require specific configuration options include:

  • Content Issues
  • PageSpeed
  • Structured Data
  • Hreflang
  • Sitemaps
  • JavaScript
  • URL Inspection

Some tabs, such as sitemaps & hreflang also require post-crawl analysis to run following the scheduled crawl. This can be enabled in the configuration file under Crawl  Analysis > Configure > tick ‘auto-analyse at end of crawl’.

Why are some mertics missing from my copied report?

If copying our Crawl Overview template, some custom fields may not be transferred to your copy. For example, we have a custom non-200 field that counts all non-200 URLs in the ‘Summary’ and ‘Indexability’ dashboards for ‘Non-Indexability Status’ graphs. You may see this labelled as ‘Record Count’ after copying. If this does occur, then you can add custom fields with bespoke formulas. For example, for a count of non-200 URLs, click on the ‘Non-Indexability Status’ graph to edit, then ‘Add Metric > Create Field’, type in the ‘Non-200 URLs’ as the name and the following formula:


Community Looker Studio Crawl Reports

We need your help!

There is so much customisation that can be done in Google Looker Studio, we want to see how you might utilise this to build your own custom reports.

If you’ve built your own custom crawl report in Looker Studio or have integrated as part of a wider SEO report and want to share it, please send it to support@screamingfrog.co.uk or tweet us (@screamingfrog), as we’d love to feature it here in a community gallery.

Not only will you help others in the SEO community, you’ll also receive a shout out from us.


Summary

The guide above should illustrate how to use the SEO Spider to automate crawl reports in Google Looker Studio.

Please also read our Screaming Frog SEO Spider FAQs and full user guide for more information on the tool.

If you have any further queries, feedback or suggestions to improve our Google Sheets or Looker Studio integration in the SEO Spider then just get in touch with our team via support.

Join the mailing list for updates, tips & giveaways

Back to top