Resources
Table of Contents
General
- Installation
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Installation on Fedora
- Crawling
- Saving, opening, exporting & importing crawls
- Configuration
- Scheduling
- Exporting
- Robots.txt
- User agent
- Memory
- Checking memory allocation
- Cookies
- XML sitemap creation
- Visualisations
- Reports
- Command line interface set-up
- Command line interface
- User Interface
- Search function
- Auto Updates
Configuration Options
Spider Crawl Tab
- Images
- Media
- CSS
- JavaScript
- SWF
- Internal hyperlinks
- External links
- Canonicals
- Pagination (rel next/prev)
- Hreflang
- AMP
- Meta refresh
- iframes
- Mobile alternate
- Check links outside of start folder
- Crawl outside of start folder
- Crawl all subdomains
- Follow internal or external ‘nofollow’
- Crawl linked XML sitemaps
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for Issues
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Perform HTML validation
- Green hosting carbon calculation
- Assume pages are HTML
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Duplicates
- Spelling & grammar
- Robots.txt
- URL rewriting
- CDNs
- Include
- Exclude
- Speed
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- Custom JavaScript
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Majestic
- Ahrefs
- Moz
- Authentication
- Segments
- Crawl analysis
- User Interface
- Language
- Proxy
- Storage mode
- Memory allocation
- Trusted Certificates
- Mode
Tabs
Top Tabs
- Internal
- External
- Security
- Response Codes
- URL
- Page titles
- Meta description
- Meta keywords
- h1
- h2
- Content
- Images
- Canonicals
- Pagination
- Directives
- hreflang
- JavaScript
- Links
- AMP
- Structured data
- Sitemaps
- PageSpeed
- Mobile
- Custom search
- Custom extraction
- Custom JavaScript
- Analytics
- Search Console
- Validation
- Link Metrics
- Change Detection
Lower Window Tabs
Right Side Window Tabs
Resources
Highlighting a URL in the top window will populate this bottom window tab. This tab contains a list of resources found on the URL.
- Type – The type of resources (JavaScript, CSS, Image etc).
- From – The current URL selected in the main window.
- To – The resource link found on the above ‘From’ page URL.
- Anchor Text – The anchor or link text used, if any.
- Alt Text – The alt attribute used, if any.
- Follow – ‘True’ means the link is followed. ‘False’ means the link contains a ‘nofollow’ , ‘UGC’ or ‘sponsored’ attribute.
- Target – Associated target attributes (_blank, _self, _parent etc.)
- Rel – Associated link attributes (limited to ‘nofollow’, ‘sponsored’, and ‘ugc’).
- Status Code – The HTTP response code of the ‘To’ URL. The ‘To’ URL needs to have been crawled for data to appear.
- Status – The HTTP header response of the ‘To’ URL. The ‘To’ URL needs to have been crawled for data to appear.
- Path Type – Is the href attribute of the link absolute, protocol-relative, root-relative or path-relative links
- Link Path – The XPath detailing the links position within the page.
- Link Position – Where is the link located in the code (Head, Nav, Footer etc.), can be customised with the Custom Link Position configuration.
- Link Origin – If the link was found in the HTML (only the raw HTML), the Rendered HTML (only the rendered HTML after JavaScript has been processed), HTML & Rendered HTML (both the raw and rendered HTML) or Dynamically Loaded (where there is no link, and JavaScript dynamically loads onto the page).