SEO Spider Tabs

Internal

The internal tab combines all data crawled from all other tabs except the external and custom tabs. So it combines data from the following tabs – response codes, uri, page titles, meta description, meta keywords, h1, h2, images, meta & canonical so data can be viewed or exported all together.

  • Address – The URI crawled.
  • Content – The content type of the URI.
  • Status Code – Http response code.
  • Status – The http header response.
  • Title 1 – The (first) page title.
  • Title 1 Length – The character length of the page title.
  • Title 1 Pixel Width – The pixel width of the page title as described in our pixel width post.
  • Meta Description 1 – The meta description.
  • Meta Description Length 1 – The character length of the meta description.
  • Meta Description Pixel Width – The pixel width of the meta description as described in our pixel width post.
  • Meta Keyword 1 – The meta keywords.
  • Meta Keywords Length – The character length of the meta keywords.
  • h1 – 1 – The first h1 (heading) on the page.
  • h1 – Len-1 – The character length of the h1.
  • h2 – 1 – The first h2 (heading) on the page.
  • h2 – Len-1 – The character length of the h2.
  • Meta Data 1 – Meta robots data.
  • Meta Refresh 1 – Meta refresh data.
  • Canonical – The canonical link element data.
  • Size – Size is in bytes, divide by 1024 to convert to kilobytes. The value is set from the Content-Length header if provided, if not it’s set to zero. For HTML pages this is updated to the size of the (uncompressed) HTML in bytes.
  • Word Count – This is all ‘words’ inside the body tag. This does not include HTML markup. Our figures may not be exactly what doing this manually would find, as the parser performs certain fix-ups on invalid html. Our definition of a word is taking the text and splitting it by spaces.
  • Text Ratio – Number of characters in the <body> of the page / total characters in the page.
  • Level – Depth of the page from the start page (number of ‘clicks’ away from the start page).
  • Inlinks – Number of internal inlinks to the URI. ‘Interal inlinks’ are links pointing to a given URI from the same subdomain that is being crawled.
  • Outlinks – Number of internal outlinks from the URI. ‘Internal outlinks’ are links from a given URI to another URI on the same subdomain that is being crawled
  • External Outlinks – Number of external outlinks from the URI. ‘External outlinks’ are links from a given URI to another subdomain.
  • Hash – Hash value of the page. This is a duplicate content check. If two hash values match the pages are exactly the same in content.
  • Response Time – Time in seconds to download the URI. More detailed information in can be found in our FAQ.
  • Last-Modified – Read from the Last-Modified header in the servers HTTP response. If there server does not provide this the value will be empty.
  • Title 2, meta description 2, h1-2, h2-2 etc – The Spider will collect data from the first two elements it encounters in the source code. Hence, h1-2 is data from the second h1 heading on the page.

Filter by –

  • HTML – HTML pages.
  • JavaScript – Any JavaScript
  • Images – Any images.
  • PDF – Any portable document files.

External

The external tab includes information about external URI.

  • Address – The external URI address
  • Content – The content type of the URI.
  • Status Code – Http response code.
  • Status – The http header response.
  • Level – Depth of the page from the homepage or start page (number of ‘clicks’ aways from the start page).
  • Inlinks – Number of links found pointing to the external URI.

Filter by –

  • HTML – HTML pages.
  • JavaScript – Any JavaScript
  • Images – Any images.
  • PDF – Any portable document files.

Response codes

The response codes tab includes response information from internal and external URI.

  • Address – The URI crawled.
  • Content – The content type of the URI.
  • Status Code – Http response code.
  • Status – The http header response.
  • Redirect URI – If the address URI redirects, this column will include the redirect URI target. The status code above will display the type of redirect, 301, 302 etc.

Filter by –

  • No Response – Where we receive no response to our request. Typically a malformed URI or a connection time out.
  • Success (2XX) – The URI requested was received, understood, accepted and processed successfully.
  • Redirection (3XX) – A redirection was encountered.
  • Client Error (4xx) – Indicates a problem occurred with the request.
  • Server Error (5XX) – The server failed to fulfil an apparently valid request.

W3.org offer a full list of http status codes to find the exact description. 

URI

The URI tab includes data related to the URLs requested.

  • Address – The URI crawled.
  • Content – The content type of the URI.
  • Status Code – Http response code.
  • Status – The http header response.
  • Hash – Hash value of the page. This is a duplicate content check. If two hash values match the pages are exactly the same in content.
  • Length – The character length of the URI.
  • Canonical 1 – The canonical link element data.

Filter by –

  • Non ASCII Characters – The URI has characters in it that are not included in the ASCII character encoding scheme.
  • Underscores – The URI has underscores within it which are not always seen as word separators.
  • Duplicate – This is a duplicate content check. It filters for all duplicate pages found via the hash value. If two hash values match the pages are exactly the same in content.
  • Dynamic – The URI could be dynamic in nature (includes paramaters such as ‘?’ or ‘&’ etc).
  • Over 115 characters – The URI is over 115 characters in length (hence getting fairly long).

 

Page titles

The page title tab includes data related to page titles.

  • Address – The URI crawled.
  • Occurences – The number of page titles found on the page (maximum we find is 2).
  • Title 1/2 – The page title.
  • Title 1/2 length – The character length of the page title.

Filter by –

  • Missing – Any pages which have a missing page title.
  • Duplicate – Any pages which have duplicate page titles.
  • Over 70 characters – Any pages which have page titles over 70 characters in length.
  • Same as h1 – Any page titles which match their h1.
  • Multiple – Any pages which have multiple page titles.

Meta description

The meta description tab includes data related to meta descriptions.

  • Address – The URI crawled.
  • Occurences – The number of meta descriptions found on the page (maximum we find is 2).
  • Meta Description 1/2 – The meta description.
  • Meta Description 1/2 length – The character length of the meta description.

Filter by –

  • Missing – Any pages which have a missing meta description.
  • Duplicate – Any pages which have duplicate meta description.
  • Over 156 characters – Any pages which have meta descriptions over 156 characters in length.
  • Multiple – Any pages which have multiple meta descriptions.

Meta keyword

The meta keywords tab includes data related to meta keywords. PLEASE NOTE – We advise to ignore the meta keyword tag, it is widely ignored, in particular Google does not consider it at all in their scoring of sites for ranking.

  • Address – The URI crawled.
  • Occurences – The number of meta keywords found on the page (maximum we find is 2).
  • Meta Keyword 1/2 – The meta keywords.
  • Meta Keyword 1/2 length – The character length of the meta keywords.

Filter by –

  • Missing – Any pages which have a missing meta keywords.
  • Duplicate – Any pages which have duplicate meta keywords.
  • Multiple – Any pages which have multiple meta keywords.

 

h1

The h1 tab includes data related to the h1 heading.

  • Address – The URI crawled.
  • Occurences – The number of h1s found on the page (maximum we find is 2).
  • h1- 1/2 – The h1 data.
  • h1-len- 1/2 – The character length of the h1.

Filter by –

  • Missing – Any pages which have a missing h1.
  • Duplicate – Any pages which have duplicate h1.
  • Over 70 characters – Any pages which have h1 over 70 characters in length.
  • Multiple – Any pages which have multiple h1.

h2

The h2 tab includes data related to the h2 heading.

  • Address – The URI crawled.
  • Occurences – The number of h2s found on the page (maximum we find is 2).
  • h2- 1/2 – The h2 data.
  • h2-len- 1/2 – The character length of the h2.

Filter by –

  • Missing – Any pages which have a missing h2.
  • Duplicate – Any pages which have duplicate h2.
  • Over 70 characters – Any pages which have h2 over 70 characters in length.
  • Multiple – Any pages which have multiple h2.

 

Images

The images tab includes data related to any images crawled.

  • Address – The URI crawled.
  • Content – The content type of the image (jpeg, gif, png etc).
  • Size – Size of the image. File size is in bytes, divide by 1024 to convert to kilobytes.

Filter by –

  • Over 100kb – Large images over 100kb in size.
  • Missing Alt Text – Images that are missing alt text. Click the address (URI) of the image and then the ‘image info’ tab in the lower window pane to view which pages have the image on and which pages are missing alt text of the said image.
  • Alt Text Over 100 Characters – Images which have one instance of alt text over 100 characters in length.

Directives

The directives tab includes all information related to meta robots, canonicals and rel=“next” and rel=“prev” link elements crawled by the SEO Spider.

  • Address – The URI crawled.
  • Meta Robots 1/2 etc – Meta robots found on the URI. The Spider will find all instances if there are multiple.
  • Meta Refresh 1/2 etc – Meta Refresh found on the URI. The Spider will find all instances if there are multiple.
  • Canonical Link Element 1/2 etc – Canonical link element data on the URI. The Spider will find all instances if there are multiple.
  • HTTP Canonical 1/2 etc – Canonical issued via HTTP. The Spider will find all instances if there are multiple.
  • X-Robots-Tag 1/2 etc – X-Robots-tag data. The Spider will find all instances if there are multiple.
  • rel=“next” and rel=“prev” – The SEO Spider collects these HTML link elements designed to indicate the relationship between URLs in a paginated series.

Filter by –

  • Canonical – The URL has a canonical set, this could be self referencing or to another URL.
  • Canonicalised – The URL has a canonical set, that is different to the URL itself. The URL is ‘canonicalised’ to another location.
  • No Canonical – There’s no canonical present.
  • rel=“next” and rel=“prev”
  • Index
  • Noindex
  • Follow
  • Nofollow
  • None – This does not mean there are no directives in place. It means the meta tag ‘none’ is being used, which is the equivalent to “noindex, nofollow”.
  • NoArchive
  • NoSnippet
  • NoODP
  • NoYDIR
  • NoImageIndex
  • NoTranslate
  • Unavailable_After
  • Refresh

 

AJAX

The Ajax tab specifically refers to the now deprecated Google AJAX crawling scheme.

If the site is AJAX, but does not have escaped fragment URLs with HTML snapshots, then adjust the ‘rendering’ configuration to ‘JavaScript’ to crawl the site. This mode will render content like a modern day browser, rendering content, crawling and indexing JavaScript and dynamically generated content. This configuration can be adjusted under ‘Configuration > Spider > Rendering tab > JavaScript’.

The AJAX tab shows both ugly and pretty URLs, with filters for hash fragments. Some Ajax pages may not use hash fragments (such as a homepage), so the ‘fragment’ meta tag can be used to recognise an Ajax page. In a the same way as Google, the SEO Spider will then fetch the ugly version of the URL

  • Pretty URL – The pretty URL of the page.
  • Ugly URL – The ugly URL actually requested.
  • Status Code – Http response code.
  • Status – The http header response.

Custom

The custom tab works alongside the custom search and custom extraction features.

The custom search feature allows you to search the source code of HTML pages, while the custom extraction feature allows you to extract any data from the source code using XPath, CSS Path or regex. There are 10 filters under the custom search configuration which relate directly to the filters in the custom report and 10 filters under custom extraction, which relate to the ‘extraction’ filter and columns.

  • Address – The URI crawled.
  • Content – The content type of the URI.
  • Status Code – Http response code.
  • Status – The http header response.
  • Occurrences – The number of times it appears within the source code of the URL.

Filter by –

  • Filter – 1-10 – Shows URI that either contain or do not contain the query string entered in the relevant custom filter.
  • Filter – Extraction – Shows all data extracted.

URL info

If you highlight a URI in the top window, this bottom window tab populates. This contains a very brief overview of the URL in question.

  • URL – The URI crawled.
  • Status Code – Http response code.
  • Status – The http header response.
  • Content – The content type of the URI.
  • Size – File or web page size.
  • Level – Depth of the page from the homepage or start page (number of ‘clicks’ aways from the start page).
  • Inlinks – Number of internal inlinks to the URI.
  • Outlinks – Number of internal outlinks from the URI.

 

Image info

If you highlight a URI in the top window, this bottom window tab populates. This contains a list of images found on the URI.

  • From – The URI chosen in the top window.
  • To – The image link found on the URI.
  • Alt Text – The alt text used, if any.

SERP snippet

If you highlight a URI in the top window, this bottom window tab populates.

This shows you how we believe the SERP snippet may display in the Google search results which are calculated upon pixel width, rather than number of characters. Google changes the SERPs regularly and we have covered some of the changes in previous blog posts, such as Page Title & Meta Description By Pixel Width In SERP Snippet.

The latest update in May ’16, Google increased the column width of the organic SERPs from 512px to 600px on desktop, which means titles and description snippets are longer than they were previously.

Our previous research showed Google used to truncate page titles at around 482px on desktop. With the change, the SERP snippet emulator has been updated to match Google’s new truncation point before an ellipses (…), which for page titles on desktop is around 570px.

You can also now edit page titles and descriptions directly in the interface.

serp snippet tool

The SEO Spider will by default remember the edits you make to page titles and descriptions, unless you click the ‘reset title and description’ button. This allows you to make as many changes as you like and then export and send to a client or development team.

  • Like us on Facebook
  • +1 us on Google Plus
  • Connect with us on LinkedIn
  • Follow us on Twitter
  • View our RSS feed

Free Download.

Download

Purchase a licence.

Purchase