URL

si digital

Posted 30 November, 2015 by si digital in

URL

The URL tab shows data related to the URLs discovered in a crawl. The filters show common issues discovered for URLs.

Columns

This tab includes the following columns.

Address – The URL crawled.
Content – The content type of the URL.
Status Code – HTTP response code.
Status – The HTTP header response.
Indexability – Whether the URL is indexable or Non-Indexable.
Indexability Status – The reason why a URL is Non-Indexable. For example, if it’s canonicalised to another URL.
Hash – Hash value of the page. This is a duplicate content check. If two hash values match the pages are exactly the same in content.
Length – The character length of the URL.
Canonical 1 – The canonical link element data.
URL Encoded Address – The URL actually requested by the SEO Spider. All non-ASCII characters percent encoded, see RFC 3986 for further details.

Filters

This tab includes the following filters that mostly apply to HTML page URLs.

Non ASCII Characters – The URL has characters in it that are not included in the ASCII character-set. Standards outline that URLs can only be sent using the ASCII character-set and some users may have difficulty with subtleties of characters outside this range. URLs must be converted into a valid ASCII format, by encoding links to the URL with safe characters (made up of % followed by two hexadecimal digits). Today browsers and the search engines are largely able to transform URLs accurately.
Underscores – The HTML page URL has underscores within it, which are not always seen as word separators by search engines. Hyphens are recommended for word separators.
Uppercase – The HTML page URL has uppercase characters within it. URLs are case sensitive, so as best practice generally URLs should be lowercase, to avoid any potential mix ups and duplicate URLs.
Multiple Slashes – The HTML page URL has multiple forward slashes in the path (for example, screamingfrog.co.uk/seo//). This is generally by mistake and as best practice URLs should only have a single slash between sections of a path to avoid any potential mix ups and duplicate URLs.
Repetitive Path – The HTML page URL has a path that is repeated in the URL string (for example, screamingfrog.co.uk/services/seo/technical/seo/). In some cases this can be legitimate and logical, however it also often points to poor URL structure and potential improvements. It can also help identify issues with incorrect relative linking, causing infinite URLs.
Contains A Space – The HTML page URL has a space in it. These are considered unsafe and could cause the link to be broken when sharing the URL. Hyphens should be used as word separators instead of spaces.
Internal Search – The HTML page URL might be part of the websites internal search function. Google and other search engines recommend blocking internal search pages from being crawled. To avoid Google indexing the blocked internal search URLs, they should not be discoverable via internal links either.
Parameters – The HTML page URL includes parameters such as ‘?’ or ‘&’ in it. This isn’t an issue for Google or other search engines to crawl, but it’s recommended to limit the number of parameters in a URL which can be complicated for users, and can be a sign of low value-add URLs.
Broken Bookmark – HTML page URLs that have a broken bookmark (also known as ‘named anchors’, ‘jump links’, and ‘skip links’) that link users to a specific part of a webpage using an ID attribute in the HTML and append a fragment (#) and the ID name to the URL. When the link is clicked, the page will scroll to the location with the bookmark. While these links can be excellent for users, it’s easy to make mistakes in the set-up, and they often become ‘broken’ over time as pages are updated and IDs are changed or removed. A broken bookmark will mean the user is still taken to the correct page, but they won’t be directed to the intended section. While Google will see these URLs as the same page (as it ignores anything from the #), they can use named anchors for ‘jump to’ links in their search results for the page ranking. Please see our guide on how to find broken bookmarks.
GA Tracking Parameters – URLs that contain Google Analytics tracking parameters. In addition to creating duplicate pages that must be crawled, using tracking parameters on links internally can overwrite the original session data. utm= parameters strip the original source of traffic and starts a new session with the specified attributes. _ga= and _gl= parameters are used for cross-domain linking and identify a specific user, including this on links prevents a unique user ID from being assigned.
Over 115 characters – The HTML page URL is over 115 characters in length. This is not necessarily an issue, however research has shown that users prefer shorter, concise URL strings.