How to Scrape Google Search Features Using XPath

Patrick Langridge

Posted 3 September, 2019 by in SEO

How to Scrape Google Search Features Using XPath

Google’s search engine results pages (SERPs) have changed a great deal over the last 10 years, with more and more data and information being pulled directly into the results pages themselves. Google search features are a regular occurrence on most SERPs nowadays, some of most common features being featured snippets (aka ‘position zero’), knowledge panels and related questions (aka ‘people also ask’). Data suggests that some features such as related questions may feature on nearly 90% of SERPs today – a huge increase over the last few years.

Understanding these features can be powerful for SEO. Reverse engineering why certain features appear for particular query types and analysing the data or text included in said features can help inform us in making optimisation decisions. With organic CTR seemingly on the decline, optimising for Google search features is more important than ever, to ensure content is as visible as it possibly can be to search users.

This guide runs through the process of gathering search feature data from the SERPs, to help scale your analysis and optimisation efforts. I’ll demonstrate how to scrape data from the SERPs using the Screaming Frog SEO Spider using XPath, and show just how easy it is to grab a load of relevant and useful data very quickly. This guide focuses on featured snippets and related questions specifically, but the principles remain the same for scraping other features too.

TL;DR

If you’re already an XPath and scraping expert and are just here for the syntax and data type to setup your extraction (perhaps you saw me eloquently explain the process at SEOCamp Paris or Pubcon Las Vegas this year!), here you go (spoiler alert for everyone else!) –

Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//span[@class='S3Uucc'])[1]
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – (//img[@id="dimg_7"]//@title)
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//g-accordion-expander//h3)[1]
  • Related question 2 text (Text) – (//g-accordion-expander//h3)[2]
  • Related question 3 text (Text) – (//g-accordion-expander//h3)[3]
  • Related question 4 text (Text) – (//g-accordion-expander//h3)[4]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//h3
  • Related question page URLs for all 4 questions (Inner HTML) – //g-accordion-expander//div[@class="r"]//a/@href
  • You can also get this list in our accompanying Google doc. Back to our regularly scheduled programming for the rest of you…follow these steps to start scraping featured snippets and related questions!

    1) Preparation

    To get started, you’ll need to download and install the SEO Spider software and have a licence to access the custom extraction feature necessary for scraping. I’d also recommend our web scraping and data extraction guide as a useful bit of light reading, just to cover the basics of what we’re getting up to here.

    2) Gather keyword data

    Next you’ll need to find relevant keywords where featured snippets and / or related questions are showing in the SERPs. Most well-known SEO intelligence tools have functionality to filter keywords you rank for (or want to rank for) and where these features show, or you might have your own rank monitoring systems to help. Failing that, simply run a few searches of important and relevant keywords to look for yourself, or grab query data from Google Search Console. Wherever you get your keyword data from, if you have a lot of data and are looking to prune and prioritise your keywords, I’d advise the following –

  • Prioritise keywords where you have a decent ranking position already. Not only is this relevant to winning a featured snippet (almost all featured snippets are taken from pages ranking organically in the top 10 positions, usually top 5), but more generally if Google thinks your page is already relevant to the query, you’ll have a better chance of targeting all types of search features.
  • Certainly consider search volume (the higher the better, right?), but also try and determine the likelihood of a search feature driving clicks too. As with keyword intent in the main organic results, not all search features will drive a significant amount of additional traffic, even if you achieve ‘position zero’. Try to consider objectively the intent behind a particular query, and prioritise keywords which are more likely to drive additional clicks.
  • 3) Create a Google search query URL

    We’re going to be crawling Google search query URLs, so need to feed the SEO Spider a URL to crawl using the keyword data gathered. This can either be done in Excel using find and replace and the ‘CONCATENATE’ formula to change the list of keywords into a single URL string (replace word spaces with + symbol, select your Google of choice, then CONCATENATE the cells to create an unbroken string), or, you can simply paste your original list of keywords into this handy Google doc with formula included (please make a copy of the doc first).

    google search query string URL

    At the end of the process you should have a list of Google search query URLs which look something like this –

    https://www.google.co.uk/search?q=keyword+one
    https://www.google.co.uk/search?q=keyword+two
    https://www.google.co.uk/search?q=keyword+three
    https://www.google.co.uk/search?q=keyword+four
    https://www.google.co.uk/search?q=keyword+five etc.

    4) Configure the SEO Spider

    Experienced SEO Spider users will know that our tool has a multitude of configuration options to help you gather the important data you need. Crawling Google search query URLs requires a few configurations to work. Within the menu you need to configure as follows –

  • Configuration > Spider > Rendering > JavaScript
  • Configuration > robots.txt > Settings > Ignore robots.txt
  • Configuration > User-Agent > Present User Agents > Chrome
  • Configuration > Speed > Max Threads = 1 > Max URI/s = 0.5
  • These config options ensure that the SEO Spider can access the features and also not trigger a captcha by crawling too fast. Once you’ve setup this config I’d recommend saving it as a custom configuration which you can load up again in future.

    5) Setup your extraction

    Next you need to tell the SEO spider what to extract. For this, go into the ‘Configuration’ menu and select ‘Custom’ and ‘Extraction’ –

    screaming frog seo spider custom extraction

    You should then see a screen like this –

    screaming frog seo spider xpath

    From the ‘Inactive’ drop down menu you need to select ‘XPath’. From the new dropdown which appears on the right hand side, you need to select the type of data you’re looking to extract. This will depend on what data you’re looking to extract from the search results (full list of XPath syntax and data types listed below), so let’s use the example of related questions –

    scraping google related questions

    The above screenshot shows the related questions showing for the search query ‘seo’ in the UK. Let’s say we wanted to know what related questions were showing for the query, to ensure we had content and a page which targeted and answered these questions. If Google thinks they are relevant to the original query, at the very least we should consider that for analysis and potentially for optimisation. In this example we simply want the text of the questions themselves, to help inform us from a content perspective.

    Typically 4 related questions show for a particular query, and these 4 questions have a separate XPath syntax –

    • Question 1 – (//g-accordion-expander//h3)[1]
    • Question 2 – (//g-accordion-expander//h3)[2]
    • Question 3 – (//g-accordion-expander//h3)[3]
    • Question 4 – (//g-accordion-expander//h3)[4]

    To find the correct XPath syntax for your desired element, our web scraping guide can help, but we have a full list of the important ones at the end of this article!

    Once you’ve input your syntax, you can also rename the extraction fields to correspond to each extraction (Question 1, Question 2 etc.). For this particular extraction we want the text of the questions themselves, so need to select ‘Extract Text’ in the data type dropdown menu. You should have a screen something like this –

    screaming frog custom extraction

    If you do, you’re almost there!

    6) Crawl in list mode

    For this task you need to use the SEO Spider in List Mode. In the menu go Mode > List. Next, return to your list of created Google search query URL strings and copy all URLs. Return to the SEO Spider, hit the ‘Upload’ button and then ‘Paste’. Your list of search query URLs should appear in the window –

    screaming frog list mode

    Hit ‘OK’ and your crawl will begin.

    7) Analyse your results

    To see your extraction you need to navigate to the ‘Custom’ tab in the SEO Spider, and select the ‘Extraction’ filter. Here you should start to see your extraction rolling in. When complete, you should have a nifty looking screen like this –

    screaming frog seo spider custom extraction

    You can see your search query and the four related questions appearing in the SERPs being pulled in alongside it. When complete you can export the data and match up your keywords to your pages, and start to analyse the data and optimise to target the relevant questions.

    8) Full list of XPath syntax

    As promised, we’ve done a lot of the heavy lifting and have a list of XPath syntax to extract various featured snippet and related question elements from the SERPs –

    Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//span[@class='S3Uucc'])[1]
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – (//img[@id="dimg_7"]//@title)
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//g-accordion-expander//h3)[1]
  • Related question 2 text (Text) – (//g-accordion-expander//h3)[2]
  • Related question 3 text (Text) – (//g-accordion-expander//h3)[3]
  • Related question 4 text (Text) – (//g-accordion-expander//h3)[4]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//h3
  • Related question page URLs for all 4 questions (Text) – //g-accordion-expander//div[@class="r"]//a/@href
  • We’ve also included them in our accompanying Google doc for ease.

    Conclusion

    Hopefully our guide has been useful and can set you on your way to extract all sorts of useful and relevant data from the search results. Let me know how you get on, and if you have any other nifty XPath tips and tricks, please comment below!

    Patrick is Head of SEO at Screaming Frog, leading a team of talented search marketers and content specialists to deliver best in class technical SEO, creative content and cutting edge digital PR for our clients. Away from the office you might find Patrick playing guitar in his band or complaining about his beloved Arsenal Football Club.

    38 Comments

    • This is an Amazing post Patrick!

      Why no mention of Proxies and VPNs? Scraping Google isn’t the best idea if one doesn’t know what he does. I’d at least put a disclaimer to that post.

      Also, anything on scraping Google For Jobs?

      Thanks!

      Reply
      • Patrick Langridge 5 years ago

        Thanks Jean-Christophe!

        Fair point on scraping Google, though I did cover this in the config section in terms of setting certain crawl limits – it’s nothing that all rank tracking tools do either!

        Like the idea of looking at Jobs, I’ll see what I can do…

        Reply
      • Kieran 5 years ago

        ☝️ @Jean yep the first thing I was thinking too, would be very useful to have a section to expand on this. I haven’t worked extensively with Google SERP scraping, but even doing manual incog spot checks a bit quick can trigger the anti-scraping captcha or the 4XX errors.

        Reply
    • In Canada, I crawled Google For Jobs 3-pack (I had to adapt the Xpath becaus google.ca isn’t exactly the same as google.co.uk).

      You can extract the job title, the employer and the Website where you’ll find the job. This can help getting some rankings.

      For the gjobs title I call the div with the class BjJfJf gsrt cPd5d (sorry SF firewall blocks the code, so I’ll only give the class here). Click on my name to see the post. (sorry for the self promotion, I am just doing this since I can’t put code here.)

      To get the gjobs employer, I use the div with the class gyaQGc

      For the gjobs website source I call the second div with the class LqUjCc

      You can also extract the function value instead of the inner HTML to find out if the Google For Jobs box gets out.

      To do this, use the count() function.

      And using Xpath call the div that contains the class bkWMgd and that contains a subdiv contains the class BjJfJf gsrt cPd5d

      This will give you a 1 if it gets out or a 0 if not.

      Reply
      • Patrick Langridge 5 years ago

        Awesome, thanks JC! Might have to do a follow-up post at some point for Jobs and other features too :)

        Reply
    • Ahmet Çadırcı 5 years ago

      Thanks for the article. It was very useful. I tried it the first time. I really like.

      Reply
    • John 5 years ago

      Is this configuration to crawl more than the first 4 related questions, as once you click on the query it will keep on expanding? I’ve seen some python solutions but perhaps this is out of scope?

      Reply
    • Jonas 5 years ago

      Hey Patrick,

      is it possible to export the title and description of a normal search result with that?

      I want to monitor whether google uses the title and/or the description which is defined or generates a description from the content.

      Cheers,
      Jonas

      Reply
      • Patrick Langridge 5 years ago

        Hey Jonas,

        This should do all the page titles –

        //div[3]/div/div/div/div/div/a/h3/div

        Descriptions can be done individually, but I’m trying to find a solution to grab them all. Here’s for the 1st result anyway –

        //div[2]/div/div/div[3]/div/div[1]/div/div/div/div

        Let me know if that works!

        Cheers,
        Patrick

        Reply
        • Bartjan 4 years ago

          Title
          //div[@class=”r”]//h3

          URL link
          //div[@class=”r”]//a/@href

          Reply
    • Matthew Gehin 5 years ago

      That’s a great post !
      I’ve just extracted all the search queries triggering featured snippets with Ahref, and I needed to know what typology of FS it was.
      It helps a lot, thank you =)

      Reply
    • Scott 5 years ago

      Have you tried any of this on the latest Screaming Frog recently? I followed instructions to the letter and could never reproduce your results.

      Reply
      • ES Smith 5 years ago

        It’s not working for me either

        Reply
        • screamingfrog 5 years ago

          Hi Eric,

          Thanks for your message via support. I just wanted to reply here so others could see as well – Make sure you’re using the latest version of the SEO Spider, when performing the above.

          It should then work!

          Thanks,

          Dan

          Reply
          • Roland 4 years ago

            Sorry, but this is not working. Can you please check this?

            Reply
    • Joaquin 4 years ago

      Should this also work to get links on SERPs? Any specific advice about doing this? :)
      Many thanks,
      J.

      Reply
    • F 4 years ago

      Pretty sure the Featured snippet XPath syntax for page title (Text) does not work. :/

      Reply
    • Teddy Sifert 4 years ago

      If you noticed a user in this page’s analytics with some 1,000 sessions and some 1,000 corresponding pageviews, that was me. This tab was open for the past five months and I finally had time to sharpen the saw. Blessings from the USA to you and Dan!

      Reply
    • Deyan Ivanov 4 years ago

      Hey, Patrick!
      Great article! Thank you!
      I’m not a technical guy, but I need to scrape Facebook pages from 200websites. Can Screaming Frog help me with that? Could you give me more information? I will be thankful!

      Reply
    • Gokhan 4 years ago

      hey Patrick, I did everything as you showed here, but my screaming frog is returning 302 for all google search URLs. I checked them and they look ok.

      Reply
      • Mike 2 years ago

        if you get a 302 it means you have most likely triggered the captcha, you can check the redirect URL given in the crawl

        Reply
    • Hi,

      Great article as always. It would be better if there was a video. at least we could see how it’s done on the screen.

      Reply
    • Jason De Vera 4 years ago

      Interesting! I realized it’s easier to scrape other tld’s of Google DB. I tried scraping Google.com – but i’m receiving no response as a status. Any tips for scraping Google.com DB?

      Reply
    • Luuk 4 years ago

      Just to add, scraping with Regex does work for me but its generating too many results.

      Reply
    • Marty 4 years ago

      Hey guys,

      I’m trying to pull the links from the People Also Ask section on the SERPs, but can’t seem to get the formula to work. Can anybody help? :)

      Thanks,

      Marty

      Reply
    • Rob 3 years ago

      I tried to apply y’all the custom settings suggested here and tried to scrape the first PAA question. I got no results, although the SERP shows several results for People Also Ask.

      Same for any kind of XPath I tried from this guide. Is there any missing info?

      Reply
    • Chris Lever 3 years ago

      Hi Patrick,

      With the incoming Core Web Vitals & Page Experience algorithm update due in a few months from now. I’m thinking ahead. Would your methodology above work for scraping the ‘fast page badge’ label? (That’s if Google decides to implement the fast page badge).

      Info on the fast page label – https://blog.chromium.org/2020/08/highlighting-great-user-experiences-on.html#:~:text=Links%20to%20pages%20that%20have,have%20a%20particularly%20good%20experience.

      Thanks,
      Chris

      Reply
    • Fred 3 years ago

      This does not work anymore as of August 2021.

      Reply
    • Dario Guzzeloni 2 years ago

      I use this Xpath syntax for Google PAA and it’s working, only gives back only the first 3 answers, no matter how I set it:
      //div[@class=”iDjcJe IX9Lgd wwB5gf”]

      Let me know if anyone finds a better syntax set up.

      Reply
      • Dario Guzzeloni 2 years ago

        Found this other Xpath syntax configuration that seems to work for all questions:
        //*[@aria-expanded]/div[2]/span

        Reply
        • Jess Loop 2 years ago

          This is great thank you for sharing. Was curious if there was a way to pull the answer portion and not just the question?

          Reply
    • Erich 8 months ago

      I have tried following this guide but am not getting the results. Has the method to extract related questions from the SERP using SF changed since this page was published?

      Any help will be much appreciated.

      Reply
    • Erich 8 months ago

      For those who are also stuck, I have figured it out how to get an XPath that works.

      For each “People also ask” (PAA) accordion:
      1) Load a Google SERP which features the PAA accordions
      2) Go into ‘Inspect’ mode (F12)
      3) Use the ‘Select Element’ tool
      4) Hover over and click the PAA accordion you wish to extract – This will highlight the corresponding html in the inspect window
      5) Go up a few levels in the html – as you go up the accordion should still be highlight in blue – go to the highest level before the highlighting disappears – this seems to start with Copy > Copy full XPath
      7) Use this in the custom extraction in screaming frog as per this guide

      Reply

    Leave A Comment.

    Back to top