Posted 3 September, 2019 by in SEO

How to Scrape Google Search Features Using XPath

Google’s search engine results pages (SERPs) have changed a great deal over the last 10 years, with more and more data and information being pulled directly into the results pages themselves. Google search features are a regular occurrence on most SERPs nowadays, some of most common features being featured snippets (aka ‘position zero’), knowledge panels and related questions (aka ‘people also ask’). Data suggests that some features such as related questions may feature on nearly 90% of SERPs today – a huge increase over the last few years.

Understanding these features can be powerful for SEO. Reverse engineering why certain features appear for particular query types and analysing the data or text included in said features can help inform us in making optimisation decisions. With organic CTR seemingly on the decline, optimising for Google search features is more important than ever, to ensure content is as visible as it possibly can be to search users.

This guide runs through the process of gathering search feature data from the SERPs, to help scale your analysis and optimisation efforts. I’ll demonstrate how to scrape data from the SERPs using the Screaming Frog SEO Spider using XPath, and show just how easy it is to grab a load of relevant and useful data very quickly. This guide focuses on featured snippets and related questions specifically, but the principles remain the same for scraping other features too.

TL;DR

If you’re already an XPath and scraping expert and are just here for the syntax and data type to setup your extraction (perhaps you saw me eloquently explain the process at SEOCamp Paris or Pubcon Las Vegas this year!), here you go (spoiler alert for everyone else!) –

Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//span[@class='S3Uucc'])[1]
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – (//img[@id="dimg_7"]//@title)
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//g-accordion-expander//h3)[1]
  • Related question 2 text (Text) – (//g-accordion-expander//h3)[2]
  • Related question 3 text (Text) – (//g-accordion-expander//h3)[3]
  • Related question 4 text (Text) – (//g-accordion-expander//h3)[4]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//h3
  • Related question page URLs for all 4 questions (Inner HTML) – //g-accordion-expander//div[@class="r"]//a/@href
  • You can also get this list in our accompanying Google doc. Back to our regularly scheduled programming for the rest of you…follow these steps to start scraping featured snippets and related questions!

    1) Preparation

    To get started, you’ll need to download and install the SEO Spider software and have a licence to access the custom extraction feature necessary for scraping. I’d also recommend our web scraping and data extraction guide as a useful bit of light reading, just to cover the basics of what we’re getting up to here.

    2) Gather keyword data

    Next you’ll need to find relevant keywords where featured snippets and / or related questions are showing in the SERPs. Most well-known SEO intelligence tools have functionality to filter keywords you rank for (or want to rank for) and where these features show, or you might have your own rank monitoring systems to help. Failing that, simply run a few searches of important and relevant keywords to look for yourself, or grab query data from Google Search Console. Wherever you get your keyword data from, if you have a lot of data and are looking to prune and prioritise your keywords, I’d advise the following –

  • Prioritise keywords where you have a decent ranking position already. Not only is this relevant to winning a featured snippet (almost all featured snippets are taken from pages ranking organically in the top 10 positions, usually top 5), but more generally if Google thinks your page is already relevant to the query, you’ll have a better chance of targeting all types of search features.
  • Certainly consider search volume (the higher the better, right?), but also try and determine the likelihood of a search feature driving clicks too. As with keyword intent in the main organic results, not all search features will drive a significant amount of additional traffic, even if you achieve ‘position zero’. Try to consider objectively the intent behind a particular query, and prioritise keywords which are more likely to drive additional clicks.
  • 3) Create a Google search query URL

    We’re going to be crawling Google search query URLs, so need to feed the SEO Spider a URL to crawl using the keyword data gathered. This can either be done in Excel using find and replace and the ‘CONCATENATE’ formula to change the list of keywords into a single URL string (replace word spaces with + symbol, select your Google of choice, then CONCATENATE the cells to create an unbroken string), or, you can simply paste your original list of keywords into this handy Google doc with formula included (please make a copy of the doc first).

    google search query string URL

    At the end of the process you should have a list of Google search query URLs which look something like this –

    https://www.google.co.uk/search?q=keyword+one
    https://www.google.co.uk/search?q=keyword+two
    https://www.google.co.uk/search?q=keyword+three
    https://www.google.co.uk/search?q=keyword+four
    https://www.google.co.uk/search?q=keyword+five etc.

    4) Configure the SEO Spider

    Experienced SEO Spider users will know that our tool has a multitude of configuration options to help you gather the important data you need. Crawling Google search query URLs requires a few configurations to work. Within the menu you need to configure as follows –

  • Configuration > Spider > Rendering > JavaScript
  • Configuration > robots.txt > Settings > Ignore robots.txt
  • Configuration > User-Agent > Present User Agents > Chrome
  • Configuration > Speed > Max Threads = 1 > Max URI/s = 0.5
  • These config options ensure that the SEO Spider can access the features and also not trigger a captcha by crawling too fast. Once you’ve setup this config I’d recommend saving it as a custom configuration which you can load up again in future.

    5) Setup your extraction

    Next you need to tell the SEO spider what to extract. For this, go into the ‘Configuration’ menu and select ‘Custom’ and ‘Extraction’ –

    screaming frog seo spider custom extraction

    You should then see a screen like this –

    screaming frog seo spider xpath

    From the ‘Inactive’ drop down menu you need to select ‘XPath’. From the new dropdown which appears on the right hand side, you need to select the type of data you’re looking to extract. This will depend on what data you’re looking to extract from the search results (full list of XPath syntax and data types listed below), so let’s use the example of related questions –

    scraping google related questions

    The above screenshot shows the related questions showing for the search query ‘seo’ in the UK. Let’s say we wanted to know what related questions were showing for the query, to ensure we had content and a page which targeted and answered these questions. If Google thinks they are relevant to the original query, at the very least we should consider that for analysis and potentially for optimisation. In this example we simply want the text of the questions themselves, to help inform us from a content perspective.

    Typically 4 related questions show for a particular query, and these 4 questions have a separate XPath syntax –

    • Question 1 – (//g-accordion-expander//h3)[1]
    • Question 2 – (//g-accordion-expander//h3)[2]
    • Question 3 – (//g-accordion-expander//h3)[3]
    • Question 4 – (//g-accordion-expander//h3)[4]

    To find the correct XPath syntax for your desired element, our web scraping guide can help, but we have a full list of the important ones at the end of this article!

    Once you’ve input your syntax, you can also rename the extraction fields to correspond to each extraction (Question 1, Question 2 etc.). For this particular extraction we want the text of the questions themselves, so need to select ‘Extract Text’ in the data type dropdown menu. You should have a screen something like this –

    screaming frog custom extraction

    If you do, you’re almost there!

    6) Crawl in list mode

    For this task you need to use the SEO Spider in List Mode. In the menu go Mode > List. Next, return to your list of created Google search query URL strings and copy all URLs. Return to the SEO Spider, hit the ‘Upload’ button and then ‘Paste’. Your list of search query URLs should appear in the window –

    screaming frog list mode

    Hit ‘OK’ and your crawl will begin.

    7) Analyse your results

    To see your extraction you need to navigate to the ‘Custom’ tab in the SEO Spider, and select the ‘Extraction’ filter. Here you should start to see your extraction rolling in. When complete, you should have a nifty looking screen like this –

    screaming frog seo spider custom extraction

    You can see your search query and the four related questions appearing in the SERPs being pulled in alongside it. When complete you can export the data and match up your keywords to your pages, and start to analyse the data and optimise to target the relevant questions.

    8) Full list of XPath syntax

    As promised, we’ve done a lot of the heavy lifting and have a list of XPath syntax to extract various featured snippet and related question elements from the SERPs –

    Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//span[@class='S3Uucc'])[1]
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – (//img[@id="dimg_7"]//@title)
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//g-accordion-expander//h3)[1]
  • Related question 2 text (Text) – (//g-accordion-expander//h3)[2]
  • Related question 3 text (Text) – (//g-accordion-expander//h3)[3]
  • Related question 4 text (Text) – (//g-accordion-expander//h3)[4]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//h3
  • Related question page URLs for all 4 questions (Text) – //g-accordion-expander//div[@class="r"]//a/@href
  • We’ve also included them in our accompanying Google doc for ease.

    Conclusion

    Hopefully our guide has been useful and can set you on your way to extract all sorts of useful and relevant data from the search results. Let me know how you get on, and if you have any other nifty XPath tips and tricks, please comment below!