Bulk Testing PageSpeed Insights with the SEO Spider

Richard Lawther

Posted 6 August, 2018 by Richard Lawther in SEO

Bulk Testing PageSpeed Insights with the SEO Spider

Update: With the release of Spider version 12, you can now connect directly to the PageSpeed Insights API, letting you grab all your PageSpeed data automatically in the Spider — no XPath needed. Take a look here.

Google does a great job of providing a variety of tools for SEO’s and webmasters to use, and although they may not provide the most detailed analysis available, they’re about as close as we can get to see exactly how Google views our web pages.

The trouble with these tools is that they’re all based on a single URL at one time, and scaling them across an entire domain can be a time consuming and tedious task. However, with our very own Spider and its extraction capabilities here to lend a hand, you can easily automate most of the process.

For example, If we look at PageSpeed Insights (PSI) & Lighthouse, Google recently launched the speed update to their core algorithm so these scores (while fairly general) will become increasingly valuable metrics to measure page performance and recommend optimisations appropriately. So, in order to bulk test multiple URLs at once, just follow the steps below:

Get your URLs

To get started you’ll need to change all your existing domain URLs from this:
https://www.screamingfrog.co.uk/
into this
https://developers.google.com/speed/pagespeed/insights/?url=screamingfrog.co.uk/

So go ahead and grab an excel list of every single URL you’d like some page speed data on, if you don’t have a list already, just give your site a crawl and take it straight from the tool, or download via the sitemap.

Next, you’ll need to add a cell containing the default Page Speed Insights URL:
https://developers.google.com/speed/pagespeed/insights/?url=

Once that’s in just use a quick formula in the adjacent cell to join them together into your nice PSI friendly URL.
=$A$1&B1

Once this is copied down it should look similar this:

Adjust your settings

Now you’ve got the URLs sorted you’ll need to make sure Google doesn’t realise you’re using a bot and bring down the Captcha hammer on you straight away.

Switch the tool over to list mode, (on the top menu > Mode > List).
Head over to the rendering panel sitting under Configuration > Spider > Rendering- turn on JavaScript rendering, we also want to increase the AJAX timeout from 5 seconds to 15-20 for safe measure.
Go to the speed panel (Configuration > Speed) turn the max threads to 1 and the Max URL/s to somewhere between 0.1 & 0.5 a second. You might need to play around with this to find what works for you.

Extract

Now that the tool can crawl and render our chosen URLs, we need to tell it what data we actually want to pull out, (i.e: those glorious PageSpeed scores).

Open up the custom extraction panel, (Configuration > Custom > Extraction) and enter in the following Xpath variables depending on which metrics you want to pull.

Mobile Score

(//div[@class="lh-gauge__percentage"])[2]

Desktop Score

(//div[@class="lh-gauge__percentage"])[3]

Field Data

Mobile First Contentful Paint (FCP)

(//div[@class="metric-value lh-metric__value"]//text())[1]

Desktop First Contentful Paint (FCP)

(//div[@class="metric-value lh-metric__value"]//text())[5]

Mobile First Input Delay (FID)

(//div[@class="metric-value lh-metric__value"]//text())[2]

Desktop First Input Delay (FID)

(//div[@class="metric-value lh-metric__value"]//text())[6]

Lab Data

Mobile First Contentful Paint

(//div[@class="lh-metric__value"]//text())[1]

Desktop First Contentful Paint

(//div[@class="lh-metric__value"]//text())[7]

Mobile First Meaningful Paint

(//div[@class="lh-metric__value"]//text())[4]

Desktop First Meaningful Paint

(//div[@class="lh-metric__value"]//text())[10]

Mobile Speed Index

(//div[@class="lh-metric__value"]//text())[2]

Desktop Speed Index

(//div[@class="lh-metric__value"]//text())[8]

Mobile First CPU Idle

(//div[@class="lh-metric__value"]//text())[5]

Desktop First CPU Idle

(//div[@class="lh-metric__value"]//text())[11]

Mobile Time to Interactive

(//div[@class="lh-metric__value"]//text())[3]

Desktop Time to Interactive

(//div[@class="lh-metric__value"]//text())[9]

Mobile Estimated Input Latency

(//div[@class="lh-metric__value"]//text())[6]

Desktop Estimated Input Latency

(//div[@class="lh-metric__value"]//text())[12]

If done correctly you should have a nice green tick next to each entry, a bit like this:

(Be sure to add custom labels to each one, set the type to Xpath and change the far right drop down from extract HTML to extract text.)

(There are also quite a lot of variables so you may want to split your crawl by mobile & desktop or take a selection of metrics you wish to report on.)

Hit OK.

Crawl

That’s it, you’re done! Once all the above has been sorted simply highlight and copy your list of URLs in excel, switch to the tool hit the upload > Paste, then sit back and relax as this will take a while. I’d recommend leaving it running in the background while you scroll through cat videos on YouTube or your preferred procrastination method of choice.

Export & Sort

After a coffee/nap/cat-vid, you should hopefully come back to a 100% completed crawl with every page speed score you could hope for.

Navigate over to the custom extraction tab (Custom > Filter > Extraction) and hit export to download it all into a handy .xls spreadsheet.

Once the export is open in Excel hit the find and replace option and replace https://developers.google.com/speed/pagespeed/insights/?url= with nothing. This will bring back all your URLs in the original order alongside all their shiny new speed scores for mobile and desktop.

After a tiny bit of formatting you should end up with a spreadsheet that looks something like this:

Bonus

What I find particularly powerful is the ability to combine this data with other metrics the spider can pull through in a separate crawl. As list mode exports in the same order its uploaded in, you can run a normal list mode crawl with your original selection of URLs connected to any API, export this and combine with your PSI scores.
Essentially allowing you to make an amalgamation of session data, PSI scores, response times, GA trigger times alongside any other metrics you want!

Troubleshooting

If set up correctly, this process should be seamless but occasionally Google might catch wind of what you’re up too and come down to stop your fun with an annoying anti-bot captcha test.

If this happens just pause your crawl, load up a PSI page in a browser to solve the captcha, then jump back in the tool highlight the URLs that did not extract any data right click > Re-Spider.

If this continues the likelihood is you have your crawl speed set too high, if you lower it down a bit in the options mentioned above it should put you back on track.

I’ve also noticed a number of comments reporting the PSI page not properly rendering and nothing being extracted. If this happens it might be worth a clear to the default config (File > Configuration > Clear to default). Next, make sure the user-agent is set to ScreamingFrog. Finally, ensure you have the following configuration options ticked (Configuration > Spider):

Check Images
Check CSS
Check JavaScript
Check SWF
Check External Links

If for any reason, the page is rendering correctly but some scores weren’t extracted, double check the Xpaths have been entered correctly and the dropdown is changed to ‘Extract Text’. Secondly, it’s worth checking PSI actually has that data by loading it in a browser — much of the real-world data is only available on high-volume pages.

Final Thoughts

What’s great about this is if you have any other online tools similar to PSI, you can adapt the extraction feature to pull out any of your necessary data, (however, this won’t work for every tool and some of Googles others are a bit less lenient towards bots).

Simply find and right click on your chosen metric > Inspect to load up the rendered HTML. Within that panel again right click on the area with your metric > Copy > Copy Xpath and add it into the extraction setting within the Spider.

Et voilà, you now have access to your very own bulk testing tool, for more details on the scraping potential of the SEO Spider, it’s worth checking out our web scraping run through here.

I hope this quick guide was helpful and should you run into any problems, or have any other useful tricks with the extractor function then let us know in the comments below.

Richard Lawther

Richard is a Senior Technical SEO Manager, working with fellow Frogs on a variety of technical campaigns & projects. Outside of work you can find him watching some of his favourite films, running and cycling around, or (most likely) at the pub.

72 Comments

Deny ahad 8 years ago

Nice Article

Reply
Jasper Bosman 8 years ago

Very nice approach to measure page speed accurately. Totally agree with the common problem of people generalizing a website’s page speed based on a single URL.

Reply
Duncan 8 years ago

Wow! This load of information helps me to figure out how to deal with my sites speed. Thank you, Richard, for the share, I will bookmark this to capture the process whenever I forget the gold steps you have highlighted. Thumbs up!

Reply
Sumit Sharma 8 years ago

Such an Awesome Post !!
Love It !!

Reply
Martina 8 years ago

I did not get a result when I did this operation. A domain extension is a newly emerged domain extension .vin extension. I get a negative result I can not get results because my site address seo.vin is a new extension for you?

Reply
- Richard Lawther 8 years ago
  
  Hi Martina,
  It looks as though PSI doesn’t have sufficient real-world speed data for the domain to display the FCP & DCL scores, You should still be able to extract the optimization scores though.
  
  Reply
Search Partners 8 years ago

Best post in years! Keep up the good work.

Reply
Samuel H 8 years ago

Screaming frog is a great tool, i’m getting more and more in to it, and discover everyday news things which for i used to use an other tool. Seems like you guys are getting all tools into one. The “couteau suisse” of SEO. Thanks for that and freemium feature.

Reply
Wajahath Ali 8 years ago

Hi Richard,

Thank you for this amazing post.

Would really be helpful to know the page speed of all the competitors

Screaming Frog is really getting good at scraping different parameters which helps SEO Enthusiasts

Will come for more such posts again

Thanks Richard!

Reply
SEO 8 years ago

Awesome Richard!! Mobile friendly and Page speed of your website has to be faster than 2.6sec to load especially on mobile devices in order to enter the ranking race for keywords nowadays. Keep screaming great post coming Mr frogs.

Reply
Kieran R 8 years ago

Epic! Thanks for that Richard – perfect addition to our tech SEO audit checklist.

Reply
Gary 8 years ago

Great article Richard. Thx a lot.

Reply
Mark 8 years ago

Richard, very well-done! Super helpful.

Would you consider creating a similar post for using the newer, much more accurate mobile speed test tool – https://www.thinkwithgoogle.com/feature/mobile/

Thanks.

Reply
Moon Marketing 8 years ago

good idea !
an atypical use of very useful Screaming Frog.

Reply
Performance Department 8 years ago

What a timely post! This answers exactly what I was looking for and works great :-)

Reply
Shaken M 8 years ago

That’s a smart way to use ScreamingFrog spider, the re-spider function is nice to fight back the anti bot captcha.

Reply
Dennis 8 years ago

Hi,

This is a great article, and I see some responsed who get results.
But I don’t get any results. The status code of the URLs are 0, while when I copy and paste this into chrome they show the page.

You know what could be the problem? (Speed is low, I changed user-agent, no results)

Thanks

Reply
Tommy 8 years ago

PageSpeed Insights got a redesign a few days ago, so to get mobile and desktop scores now, you’ll have to make these changes:

1. Go into Custom Extraction

2. Replace this text in the field for mobile score
(//div[@class=”speed-report-card”]//span[1])[1]
with
//*[@id=”page-speed-insights”]/div[2]/div[2]/div[2]/div[1]/div[1]/div[1]/div/a/div[1]

3. Still in Custom Extraction, replace this text in the field for desktop score
(//div[@class=”speed-report-card”]//span[1])[2]
with
//*[@id=”page-speed-insights”]/div[2]/div[2]/div[2]/div[2]/div[1]/div[1]/div/a/div[1]

Reply
- Richard Lawther 8 years ago
  
  Hi Tommy,
  
  Thanks for sending this over, much appreciated!
  I’ll be taking a look over the new PSI over the next few days and update the post with your info and any new metrics we can lift :)
  Stay tuned.
  
  Thanks,
  Richard
  
  Reply
- Praveen 8 years ago
  
  Hi Tommy,
  
  That was the exact updated Xpath. Thanks for sharing it.
  
  But unfortunately it ain’t working for me. Especially the Mobile and desktop score.
  
  Could you please advice me on this.
  
  Thanks in advance.
  
  Praveen
  
  Reply
  - Richard Lawther 8 years ago
    
    Hi Praveen,
    
    I’ve just updated the post to reflect the new PSI metrics.
    Have a go with them and let me know if you’re running into any trouble :)
    
    Thanks,
    Richard
    
    Reply
James 8 years ago

Hi SF team! This is really nice article! It’s very helps me to optimize speed on my site.

Reply
AJ 8 years ago

Can this be taken a step further to extract the markup in the “Opportunities” area? Such as the opportunity description and savings numbers? I’m not too familiar with XPath at all so I am not sure where to start. Do you happen to have any resources for learning about XPath that you recommend?

Reply
- Richard Lawther 8 years ago
  
  Hi AJ,
  
  These are a bit tricker as they change dynamically depending on what opps are listed.
  But you should be able to use the following to get one detail at a time:
  (//div[@class="lh-audit__title"][1])[2]
  You’ll need to change the last bracket [2] for each one (so a maximum of 10 at a time), and unfortunately, won’t include the potential savings.
  
  In terms of Xpath resources, I’d recommend a chrome extension called ‘Xpath Helper’ mixed with a Xpath cheatsheet and alot of trial and error :)
  
  Thanks,
  
  Reply
  - Courtney 7 years ago
    
    Going off of this question – is it possible to extract the number of opportunities? I found that the following command returns the list of opportunities just in a bit of weird format but it does the job: //*[@id=”page-speed-insights”]/div[2]/div[2]/div[2]/div[2]/div[2]/div[1]/div[2]/div/div[3]/div[3]
    
    In order to return the number of opportunities, I tried count(//*[@id=”page-speed-insights”]/div[2]/div[2]/div[2]/div[2]/div[2]/div[1]/div[2]/div/div[3]/div[3]) but it returns this: “Invalid Custom Extraction Configured”.
    
    Any thoughts?
    
    Reply
    - Richard Lawther 7 years ago
      
      Hi Courtney,
      Sorry for the delay, I was out of the office last week.
      The closest I could get to counting the number of opportunities is using an XPath of: count(//div[@class=”lh-audit__title”])
      Make sure to change the XPath dropdown from ‘Extract inner HTML’ to ‘Function Value’ in the Spider.
      
      This will come back with the total number of opportunities for both mobile & desktop (I’m not sure if they can be separated).
      
      Hope that helps :-)
      Rich
      
      Reply
Jack 8 years ago

This is a great guide Richard, will save me a lot of time! Keep up the good work :)

Reply
Liz 8 years ago

THIS is exactly what I was looking for; can’t wait to try this out on our client with 6,000+ pages who has suddenly decided to prioritize page speed despite a . Thanks for the comprehensive post!

Reply
Dean Williams 8 years ago

Why all the configuration? Is there no way to save/load these configurations automatically? Kind of stupid that you have to manually set this up every time you want to do it… Not talking about the file you need as that I guess is pretty custom, but would be nice to be able to store and re-use all the configuration and mode.

Reply
- screamingfrog 8 years ago
  
  Hi Dean,
  
  ‘File > Config > Save Config’, will store all this, it’s a one time set-up.
  
  You can then choose the config file each time you want to use it.
  
  Cheers.
  
  Dan
  
  Reply
Sean Ginnaw 8 years ago

Hey,

Looks like this is no longer working. I’ve set it all up as per the instructions above and I’m getting nothing back for any of my extractions.

The render screenshot is showing the main PSI screen with ‘There was a problem with the request. Please try again later.’

Is it likely that they’ve picked up on systems such as SF doing this? Changing the user agent didn’t seem to fix it :(

Sean

Reply
- Richard Lawther 8 years ago
  
  Hi Sean,
  
  Couple bits worth checking:
  Leave the user agent as SF.
  Within Configuration > Spider make sure check images, check CSS, check Javascript and Check SWF are ticked.
  
  If you’re still having trouble drop us an email and I can take a closer look.
  
  Thanks,
  Richard
  
  Reply
Tony Svoboda 7 years ago

Hi SF Team,
Thanks for this blog post, really useful, and I’ve saved the Custom Configuration for quick use in the future.
I think I discovered an error in your Custom Extraction codes above. I was having an issue with the Mobile Speed Index (Lab Data) not matching up (or coming anywhere close) to tests I ran on the Google PageSpeed Insights page. It looks like you have the wrong number at the end of the code. You show a “1” (which was just the same as the Mobile FCP (Lab)), but when I change it to “2” I get Mobile Speed Index results the same or almost the same from the PSI tests I’ve ran. If I’m correct, you’ll need to update the Custom Extraction code above and the screen capture as well. Thanks and keep up the good work and improvements to the Screaming Frog crawling software.

Reply
- Richard Lawther 7 years ago
  
  Hi Tony,
  
  You’re right! I’ve updated the Xpath for the mobile speed index, so should be working again.
  Thanks for bringing to our attention :)
  
  Cheers,
  Richard
  
  Reply
Adam Young 7 years ago

A lot more reporting needs lately and this has given me so many ideas. Honestly, it’s been awhile since I got so excited reading a blog post, can’t wait to set this up. Thanks!

Reply
CJ 7 years ago

Hi Richard,
Thanks for the great tutorial. Unfortunately it doesn’t seem to work with the new PSI anymore.
The extraction doesn’t get any data and the “Rendered Page” screenshot is showing the PSI startpage.
Any workaround to get the PSI speed data for desktop and mobile with Screaming Frog?
Thanks a lot!
Christoph

Reply
- Richard Lawther 7 years ago
  
  Hi Christoph,
  
  It’s still working on my end :-)
  It might be worth double checking your user-agent is set to SF, and that you don’t have any rogue options set in the config (maybe worth a clear to default).
  If you’re still having trouble drop me an email with a save of your crawl at [email protected] and I’ll have a closer look at what’s going on.
  
  Cheers,
  Richard
  
  Reply
  - CJ 7 years ago
    
    Thanks for your reply by mail Richard! It works like a charm now.
    
    If anyone else is having my problem -> the ‘Check External Links’ option is essential.
    Go to Configuration > Spider > tick the ‘Check External Links’ option
    
    Reply
Felix Consultant 7 years ago

One of the best articles in pagespeed insights I have ever read!!! Great work!!

Reply
William 7 years ago

Keep returning “https://www.google.com/recaptcha/api.js?render=explicit” as the url being crawled. Seems to me recaptcha is getting in the way. Any thoughts on this? I have I’m not getting the usual captcha in browser, tried on my VPS as well. Getting the same “https://www.google.com/recaptcha/api.js?render=explicit”

All settings are as suggested. (Triple Checked)

Really just wont let me render.

Reply
Scott 7 years ago

This does not appear to be working. Value not returned – only trying to grab mobile page speed. Latest changes to PSI breaking this perhaps? Ideas?

Reply
- scott 7 years ago
  
  Got it working! I had to set Rendering to Javascript :D
  
  Reply
Nathan 7 years ago

Another awesome use case for ScreamingFrog!
Will try it out!

Reply
Max 7 years ago

This hasn’t been working for me. Tried all the troubleshooting people have mentioned in the comments and in the article above. Has anyone tried it recently?

Reply
- Richard Lawther 7 years ago
  
  Hi Max,
  
  Yep, it still works — might be worth clearing your configuration to default (File > Configuration > Clear default Configuration) then re-enabling all the settings listed.
  If that doesn’t work please drop me a save of your crawl (File > Save) to [email protected] and I’ll take a closer look.
  
  Thanks,
  Richard
  
  Reply
  - Max 7 years ago
    
    You were right, Richard! Works perfectly. Appreciate your help. I tried resetting everything yesterday, but undoubtedly a mistake on my part led to it not working.
    
    This works great!
    
    Reply
Petr Blagov 7 years ago

Works. Thank you author! However, I am having problems with “Field Data”.

Desktop First Contentful Paint (FCP)
(//div[@class=”metric-value”]//text())[5]

Desktop First Input Delay (FID)
(//div[@class=”metric-value”]//text())[6]

They return empty cells.

Reply
- Richard Lawther 7 years ago
  
  Hi Petr. Are you certain the domain has field data within PSI?
  Lots of sites don’t meet the threshold for CrUX data, so worth checking within the web-app first.
  
  Cheers,
  Richard
  
  Reply
  - Petr Blagov 7 years ago
    
    Hi Richard Lawther,
    I checked, you’re right. No data for my pages. Thanks for help!
    
    Reply
    - Erik 7 years ago
      
      I noticed we weren’t getting any data for a couple of things (mainly FCP), and upon checking the XPATH of the div class in the insights tool, I noticed that it’s metric__value and not metric-value. So, instead of using the versions listed under Field Data (in the article) that look like this…
      
      (//div[@class=”metric-value”]//text())[5]
      
      …they should actually look like this…
      
      (//div[@class=”metric__value”]//text())[5]
      
      Hope this helps.
      
      Reply
      - Richard Lawther 7 years ago
        
        Hey Erik,
        
        Yep you’re right, looks like they updated the tags again. I’ve updated the post so should all be working again :-)
        Cheers for letting me know.
        
        Thanks,
        Richard
Mateusz Kicki 7 years ago

Awsome Richard. Thanks a lot for a great knowledge.

Reply
Fabian Clemens 7 years ago

Thanks for the interesting article, really helpful.

Reply
Joost 7 years ago

Is it true that the pagespeeds also depends on which computer/tablet/smartphone you use?

Reply
Alexander Holl 6 years ago

Hi Richard, super post. Love screamingFrog. So far it works pretty fine. Did not hit the captcha up to know.
I think there are lots of nice adaption possible
Best Alexander

Reply
Bernie Torras 6 years ago

Very interesting this way of tracking the speed of your entire website.

Reply
Gokhan Tunc 6 years ago

Hi Richard, as of 24th Feb 2020, we’re not getting any data :( I tried to go PSI and update xpath tags but still no luck :(

Reply
- Richard Lawther 6 years ago
  
  Hey Gokhan,
  It seems to be working on my end – might be worth clearing your config back to default and double-checking all the settings. If you’re still having trouble, drop me a line at [email protected] and can have a closer look :-).
  cheers,
  
  Reply
  - Gokhan Tunc 6 years ago
    
    I will Richard, thank you :)
    
    Reply
    - Arvid Engström 6 years ago
      
      Hi Gokhan Tunc,
      
      I have the same problem as Richard! It worked 3 weeks ago but not now with “12,6 version”.
      
      Reply
      - Richard Lawther 6 years ago
        
        Hey,
        Drop me a line at [email protected] and we’ll get it going for you :-)
        thanks,
Discord 6 years ago

Nice work! Thanks!

Reply
Da Tamsin 6 years ago

Best tutorial on page speed. Very comprehensive and detailed. Site speed is one the trickiest aspect of SEO. Many thanks

Reply
Lukasz 6 years ago

Looking a way to bulk test in PSI. Pretty cool approach. Google also offer free API key for testing speed. More info in their doc.

Reply
- screamingfrog 6 years ago
  
  Hi Lukasz,
  
  This post is quite old, you can do that now in the SEO Spider –
  
  https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#pagespeed-insights-integration
  
  Cheers.
  
  Dan
  
  Reply
Erik Chavez 5 years ago

Does anyone have the Xpath for LCP?

Reply
- Carrie Dils 5 years ago
  
  I was wondering the same.
  
  Reply
  - Carrie Dils 5 years ago
    
    Feel free to delete previous comment. I’m a dork. :P
    
    Reply
J.P.S. 5 years ago

Hi Richard,

Long-time user, first-time commenter….

I was wondering if it’s possible (and whether it’s recommended) to create a template for SF crawls for PSI. I ran a quick search and also used the ‘Find’ function on this page to see if there was any mention of ‘template’. but couldn’t seem to find anything on this page. My apologies in advance if this information already exists, very much looking forward to your response.

Cheers,

J.P.

Reply
- Richard Lawther 5 years ago
  
  Hey J.P. You can now integrate PSI directly within the tool, so no need to save these Xpaths – there’s even a PageSpeed tab for all the results. You can see more here https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#pagespeed-insights-integration.
  
  If you’d still like to save a ‘template’ you can save the current configuration under File > Configuration > Save As. to quickly use for other crawls.
  
  Thanks,
  Richard
  
  Reply
Rini 5 years ago

The connection with the API helped me with a lot with work more efficient. Thanks!

Reply
Creado with Love 5 years ago

I always find it really hard to make my website go faster. Thanks for sharing this article.

Reply