PDF

Table of Contents

General

Configuration Options

Spider Crawl Tab

Spider Extraction Tab

Spider Limits Tab

Spider Rendering Tab

Spider Advanced Tab

Spider Preferences Tab

Other Configuration Options

Tabs

PDF

Configuration > Spider > Extraction > PDF

Store PDF

This allows you to save PDFs to disk during a crawl. They can be bulk exported via ‘Bulk Export > Web > All PDF Documents’, or just the content can be exported as .txt files via ‘Bulk Export > Web > All PDF Content’.

When PDFs are stored, the PDF can be viewed in the ‘Rendered Page’ tab and the text content of the PDF can be viewed in the ‘View Source’ tab and ‘Visible Content’ filter.

Extract PDF Properties

By default the PDF title and keywords will be extracted. These will appear in the ‘Title’ and ‘Meta Keywords’ columns in the Internal tab of the SEO Spider.

Google will convert the PDF to HTML and use the PDF title as the title element and the keywords as meta keywords, although it doesn’t use meta keywords in scoring.

By enabling ‘Extract PDF properties’, the following additional properties will also be extracted.

  1. Subject
  2. Author
  3. Creation Date
  4. Modification Date
  5. Page Count
  6. Word Count

These new columns are displayed in the Internal tab.

Join the mailing list for updates, tips & giveaways

Back to top