Site Structure
Table of Contents
General
- Installation
- Installation on Windows
- Installation on macOS
- Installation on Ubuntu
- Installation on Fedora
- Crawling
- Saving, opening, exporting & importing crawls
- Configuration
- Scheduling
- Exporting
- Robots.txt
- User agent
- Memory
- Checking memory allocation
- Cookies
- XML sitemap creation
- Visualisations
- Reports
- Command line interface set-up
- Command line interface
- User Interface
- Search function
- Auto Updates
Configuration Options
Spider Crawl Tab
- Images
- Media
- CSS
- JavaScript
- SWF
- Internal hyperlinks
- External links
- Canonicals
- Pagination (rel next/prev)
- Hreflang
- AMP
- Meta refresh
- iframes
- Mobile alternate
- Check links outside of start folder
- Crawl outside of start folder
- Crawl all subdomains
- Follow internal or external ‘nofollow’
- Crawl linked XML sitemaps
Spider Extraction Tab
Spider Limits Tab
Spider Rendering Tab
Spider Advanced Tab
- Cookie storage
- Ignore non-indexable URLs for Issues
- Ignore paginated URLs for duplicate filters
- Always follow redirects
- Always follow canonicals
- Respect noindex
- Respect canonical
- Respect next/prev
- Respect HSTS policy
- Respect self referencing meta refresh
- Extract images from img srcset attribute
- Crawl fragment identifiers
- Perform HTML validation
- Green hosting carbon calculation
- Assume pages are HTML
- Response timeout
- 5XX response retries
Spider Preferences Tab
Other Configuration Options
- Content area
- Duplicates
- Spelling & grammar
- Robots.txt
- URL rewriting
- CDNs
- Include
- Exclude
- Speed
- User agent
- HTTP header
- Custom search
- Custom extraction
- Custom link positions
- Custom JavaScript
- Google Analytics integration
- Google Search Console integration
- PageSpeed Insights integration
- Majestic
- Ahrefs
- Moz
- Authentication
- Segments
- Crawl analysis
- User Interface
- Language
- Proxy
- Storage mode
- Memory allocation
- Trusted Certificates
- Mode
Tabs
Top Tabs
- Internal
- External
- Security
- Response Codes
- URL
- Page titles
- Meta description
- Meta keywords
- h1
- h2
- Content
- Images
- Canonicals
- Pagination
- Directives
- hreflang
- JavaScript
- Links
- AMP
- Structured data
- Sitemaps
- PageSpeed
- Mobile
- Custom search
- Custom extraction
- Custom JavaScript
- Analytics
- Search Console
- Validation
- Link Metrics
- Change Detection
Lower Window Tabs
Right Side Window Tabs
Site Structure
The site structure tab updates in real-time to provide an aggregated directory tree view of the website. This helps visualise site architecture, and identify where issues are at a glance, such as indexability of different paths.
The top table updates in real-time to show the path, total number of URLs, Indexable and Non-Indexable URLs in each path of the website.
- Path – The URL path of the website crawled.
- URLs – The total number of unique children URLs found within the path.
- Indexable – The total number of unique Indexable children URLs found within the path.
- Non-Indexable – The total number of unique Non-Indexable children URLs found within the path.
You’re able to adjust the ‘view’ of the aggregated Site Structure, to also see ‘Indexability Status’, ‘Response Codes’ and ‘Crawl Depth’ of URLs in each path.
The lower table and graph show the number of URLs at crawl depths between 1-10+ in buckets based upon their response codes.
- Depth (Clicks from Start URL) – Depth of the page from the homepage or start page (number of ‘clicks’ away from the start page).
- Number of URLs – Number of URLs encountered in the crawl that have a particular Depth.
- % of Total – Percentage of URLs in the crawl that have a particular Depth.
‘Crawl Depth’ data for every URL can be found and exported from the ‘Crawl Depth’ column in the ‘Internal’ tab.