Posted 6 August, 2018 by in SEO

Bulk Testing PageSpeed Insights with the SEO Spider

Update: With the release of Spider version 12, you can now connect directly to the PageSpeed Insights API, letting you grab all your PageSpeed data automatically in the Spider — no XPath needed. Take a look here

Google does a great job of providing a variety of tools for SEO’s and webmasters to use, and although they may not provide the most detailed analysis available, they’re about as close as we can get to see exactly how Google views our web pages.

The trouble with these tools is that they’re all based on a single URL at one time, and scaling them across an entire domain can be a time consuming and tedious task. However, with our very own Spider and its extraction capabilities here to lend a hand, you can easily automate most of the process.

For example, If we look at PageSpeed Insights (PSI) & Lighthouse, Google recently launched the speed update to their core algorithm so these scores (while fairly general) will become increasingly valuable metrics to measure page performance and recommend optimisations appropriately. So, in order to bulk test multiple URLs at once, just follow the steps below:

Get your URLs

To get started you’ll need to change all your existing domain URLs from this:
https://www.screamingfrog.co.uk/
into this
https://developers.google.com/speed/pagespeed/insights/?url=screamingfrog.co.uk/

So go ahead and grab an excel list of every single URL you’d like some page speed data on, if you don’t have a list already, just give your site a crawl and take it straight from the tool, or download via the sitemap.

Next, you’ll need to add a cell containing the default Page Speed Insights URL:
https://developers.google.com/speed/pagespeed/insights/?url=

Once that’s in just use a quick formula in the adjacent cell to join them together into your nice PSI friendly URL.
=$A$1&B1

Once this is copied down it should look similar this:

Adjust your settings

Now you’ve got the URLs sorted you’ll need to make sure Google doesn’t realise you’re using a bot and bring down the Captcha hammer on you straight away.

  • Switch the tool over to list mode, (on the top menu > Mode > List).
  • Head over to the rendering panel sitting under Configuration > Spider > Rendering- turn on JavaScript rendering, we also want to increase the AJAX timeout from 5 seconds to 15-20 for safe measure.
  • Go to the speed panel (Configuration > Speed) turn the max threads to 1 and the Max URL/s to somewhere between 0.1 & 0.5 a second. You might need to play around with this to find what works for you.

Extract

Now that the tool can crawl and render our chosen URLs, we need to tell it what data we actually want to pull out, (i.e: those glorious PageSpeed scores).

  • Open up the custom extraction panel, (Configuration > Custom > Extraction) and enter in the following Xpath variables depending on which metrics you want to pull.

Mobile Score

(//div[@class="lh-gauge__percentage"])[2]

Desktop Score

(//div[@class="lh-gauge__percentage"])[3]

Field Data

Mobile  First Contentful Paint (FCP)

(//div[@class="metric-value lh-metric__value"]//text())[1]

Desktop First Contentful Paint (FCP)

(//div[@class="metric-value lh-metric__value"]//text())[5]

Mobile First Input Delay (FID)

(//div[@class="metric-value lh-metric__value"]//text())[2]

Desktop First Input Delay (FID)

(//div[@class="metric-value lh-metric__value"]//text())[6]

Lab Data

Mobile First Contentful Paint

(//div[@class="lh-metric__value"]//text())[1]

Desktop First Contentful Paint

(//div[@class="lh-metric__value"]//text())[7]

Mobile First Meaningful Paint

(//div[@class="lh-metric__value"]//text())[4]

Desktop First Meaningful Paint

(//div[@class="lh-metric__value"]//text())[10]

Mobile Speed Index

(//div[@class="lh-metric__value"]//text())[2]

Desktop Speed Index

(//div[@class="lh-metric__value"]//text())[8]

Mobile First CPU Idle

(//div[@class="lh-metric__value"]//text())[5]

Desktop First CPU Idle

(//div[@class="lh-metric__value"]//text())[11]

Mobile Time to Interactive

(//div[@class="lh-metric__value"]//text())[3]

Desktop Time to Interactive

(//div[@class="lh-metric__value"]//text())[9]

Mobile Estimated Input Latency

(//div[@class="lh-metric__value"]//text())[6]

Desktop Estimated Input Latency

(//div[@class="lh-metric__value"]//text())[12]

If done correctly you should have a nice green tick next to each entry, a bit like this:

(Be sure to add custom labels to each one, set the type to Xpath and change the far right drop down from extract HTML to extract text.)

(There are also quite a lot of variables so you may want to split your crawl by mobile & desktop or take a selection of metrics you wish to report on.)

Hit OK.

Crawl

That’s it, you’re done! Once all the above has been sorted simply highlight and copy your list of URLs in excel, switch to the tool hit the upload > Paste, then sit back and relax as this will take a while. I’d recommend leaving it running in the background while you scroll through cat videos on YouTube or your preferred procrastination method of choice.

Export & Sort

After a coffee/nap/cat-vid, you should hopefully come back to a 100% completed crawl with every page speed score you could hope for.

Navigate over to the custom extraction tab (Custom > Filter > Extraction) and hit export to download it all into a handy .xls spreadsheet.

Once the export is open in Excel hit the find and replace option and replace https://developers.google.com/speed/pagespeed/insights/?url= with nothing. This will bring back all your URLs in the original order alongside all their shiny new speed scores for mobile and desktop.

After a tiny bit of formatting you should end up with a spreadsheet that looks something like this:


Bonus

What I find particularly powerful is the ability to combine this data with other metrics the spider can pull through in a separate crawl. As list mode exports in the same order its uploaded in, you can run a normal list mode crawl with your original selection of URLs connected to any API, export this and combine with your PSI scores.
Essentially allowing you to make an amalgamation of session data, PSI scores, response times, GA trigger times alongside any other metrics you want!

Troubleshooting

If set up correctly, this process should be seamless but occasionally Google might catch wind of what you’re up too and come down to stop your fun with an annoying anti-bot captcha test.

If this happens just pause your crawl, load up a PSI page in a browser to solve the captcha, then jump back in the tool highlight the URLs that did not extract any data right click > Re-Spider.

If this continues the likelihood is you have your crawl speed set too high, if you lower it down a bit in the options mentioned above it should put you back on track.

I’ve also noticed a number of comments reporting the PSI page not properly rendering and nothing being extracted. If this happens it might be worth a clear to the default config (File > Configuration > Clear to default). Next, make sure the user-agent is set to ScreamingFrog. Finally, ensure you have the following configuration options ticked (Configuration > Spider):

  • Check Images
  • Check CSS
  • Check JavaScript
  • Check SWF
  • Check External Links

If for any reason, the page is rendering correctly but some scores weren’t extracted,  double check the Xpaths have been entered correctly and the dropdown is changed to ‘Extract Text’. Secondly, it’s worth checking PSI actually has that data by loading it in a browser — much of the real-world data is only available on high-volume pages.

Final Thoughts

What’s great about this is if you have any other online tools similar to PSI, you can adapt the extraction feature to pull out any of your necessary data, (however, this won’t work for every tool and some of Googles others are a bit less lenient towards bots).

Simply find and right click on your chosen metric > Inspect to load up the rendered HTML. Within that panel again right click on the area with your metric > Copy > Copy Xpath and add it into the extraction setting within the Spider.

Et voilà, you now have access to your very own bulk testing tool, for more details on the scraping potential of the SEO Spider, it’s worth checking out our web scraping run through here.

I hope this quick guide was helpful and should you run into any problems, or have any other useful tricks with the extractor function then let us know in the comments below.