Custom extraction

Dan Sharp

Posted 13 November, 2019 by in

Custom extraction

The custom extraction tab works alongside the custom extraction configuration. This feature allows you to scrape any data from the HTML of pages in a crawl and can be configured under ‘Config > Custom > Extraction’.

You’re able to configure up to 100 extractors in the custom extraction configuration, which allow you to input XPath, CSSPath or regex to scrape the required data. Extraction is performed against URLs with an HTML content type only.

The results appear within the custom extraction tab as outlined below.


Columns

This tab includes the following columns.

  • Address – The URI crawled.
  • Content – The content type of the URI.
  • Status Code – HTTP response code.
  • Status – The HTTP header response.
  • [Extractor Name] – Column heading names are dynamic based upon the name provided to each extractor. Each extractor will have a seperate named column, which will contain the data extracted against each URL.

Filters

This tab includes the following filters.

  • [Extractor Name] – Filters are dynamic, and will match the name of the extractors and relevant column. They show the relevant extraction column against the URLs.

Please read our tutorial on ‘Web Scraping & Custom Extraction‘.

Dan Sharp is founder & Director of Screaming Frog. He has developed search strategies for a variety of clients from international brands to small and medium-sized businesses and designed and managed the build of the innovative SEO Spider software.

Comments are closed.

Back to top