PDF

Dan Sharp

Posted 5 December, 2022 by in

PDF

Configuration > Spider > Extraction > PDF

Store PDF

This allows you to save PDFs to disk during a crawl. They can be bulk exported via ‘Bulk Export > Web > All PDF Documents’, or just the content can be exported as .txt files via ‘Bulk Export > Web > All PDF Content’.

When PDFs are stored, the PDF can be viewed in the ‘Rendered Page’ tab and the text content of the PDF can be viewed in the ‘View Source’ tab and ‘Visible Content’ filter.

Extract PDF Properties

By default the PDF title and keywords will be extracted. These will appear in the ‘Title’ and ‘Meta Keywords’ columns in the Internal tab of the SEO Spider.

Google will convert the PDF to HTML and use the PDF title as the title element and the keywords as meta keywords, although it doesn’t use meta keywords in scoring.

By enabling ‘Extract PDF properties’, the following additional properties will also be extracted.

  • Subject
  • Author
  • Creation Date
  • Modification Date
  • Page Count
  • Word Count

These new columns are displayed in the Internal tab.

Dan Sharp is founder & Director of Screaming Frog. He has developed search strategies for a variety of clients from international brands to small and medium-sized businesses and designed and managed the build of the innovative SEO Spider software.

Comments are closed.

Back to top