How To Debug Invalid HTML Elements In The Head
Introduction To Invalid HTML Elements In The <Head>
This tutorial explains how to use the Screaming Frog SEO Spider to identify invalid HTML Elements in the <head>, view which metadata might be adversely affected, and how to debug what’s actually causing the issue.
- What Are Invalid HTML Elements In The <Head>?
- How To Identify Invalid HTML Elements Breaking The <Head>
- How To Debug Invalid HTML Elements Breaking The <Head>
First, let’s quickly summarise what we mean by invalid HTML elements in the <head>.
What Are Invalid HTML Elements In The <Head>?
Using valid HTML for a page’s metadata ensures search engines, such as Google are able to use it as intended.
While search engines will try to understand markup even when there are errors, some fix ups performed within the <head> can cause issues to it being used.
If invalid HTML elements are used within the <head>, Google will assume the head should be closed, and start the <body> – in a similar way to a browser, like Chrome.
This means any metadata that appears after the invalid HTML element could be ignored by Google.
The <head> element must only contain the following elements as per the HTML standard:
Some of the most common elements that appear in the <head>, that will cause issues include –
- noscript containing an img (more on this later!)
The basic rule is that any <body> element that precedes a <head> element can render it useless.
While it’s best to avoid using a non-head element in the <head>, it won’t be an issue for search engines if it comes after the intended metadata. The <head> will simply close after the metadata.
Not Just ‘In The <Head>’, But Preceding
Invalid HTML elements in the <head> is documented by Google and known and recited by many SEOs.
However, it’s not just ‘in the <head>’ that is potentially problematic – but <body> elements preceding the <head> element itself can also impact all metadata in the <head>.
For example, a stray <div> tag preceding the opening <html> element will mean Google automatically opens and closes an empty <head> element, meaning all metadata will be in the <body> and potentially ignored.
How To Identify Invalid HTML Elements Breaking The <Head>
Finding invalid HTML elements in the <head>, or preceding it, at scale across a website is difficult, which is where the SEO Spider can help perform the heavy-lifting.
It will flag any pages with invalid <html> elements that could be problematic, and any metadata such as titles, canonicals, or meta robots that are outside of the <head>.
To identify invalid HTML elements in or preceding the <head>, just follow this process.
1) Crawl the Site
Input the website address into the URL bar, and hit ‘Start’.
The SEO Spider analyses in real-time, so you can start analysing data, or wait until the crawl reaches 100%.
2) View The ‘Validation’ Tab
Click the ‘Validation’ tab, which has a number of filters that help identify potential issues.
Use the right-hand ‘Overview’ tab to see the number of URLs with each issue within the filters.
This tab includes the following filters related to potential issues that can occur from invalid HTML.
- Invalid HTML Elements In <head> – Pages with invalid HTML elements within the <head>. When an invalid element is used in the <head>, Google assumes the end of the <head> element and ignores any elements that appear after the invalid element. This means critical <head> elements that appear after the invalid element will not be seen. The <head> element as per the HTML standard is reserved for title, meta, link, script, style, base, noscript and template elements only.
- <body> Element Preceding <html> – Pages that have a body element preceding the opening html element. Browsers and Googlebot will automatically assume the start of the body and generate an empty head element before it. This means the intended head element below and its metadata will be seen in the body and ignored.
- <head> Not First In <html> Element – Pages with an HTML element that proceed the <head> element in the HTML. The <head> should be the first element in the <html> element. Browsers and Googlebot will automatically generate a <head> element if it’s not first in the HTML. While ideally <head> elements would be in the <head>, if a valid <head> element is first in the <html> it will be considered as part of the generated <head>. However, if non <head> elements such as <p>, <body>, <img> etc are used before the intended <head> element and its metadata, then Google assumes the end of the <head> element. This means the intended <head> element and its metadata may only be seen in the <body> and ignored.
- Missing <head> Tag – Pages missing a <head> element within the HTML. The <head> element is a container for metadata about the page, that’s placed between the <html> and <body> tag. Metadata is used to define the page title, character set, styles, scripts, viewport and other data that are critical to the page. Browsers and Googlebot will automatically generate a <head> element if it’s omitted in the markup, however it may not contain meaningful metadata for the page and this should not be relied upon.
- Multiple <head> Tags – Pages with multiple <head> elements in the HTML. There should only be one <head> element in the HTML which contains all critical metadata for the document. Browsers and Googlebot will combine metadata from subsequent <head> elements if they are both before the <body>, however, this should not be relied upon and is open to potential mix-ups. Any <head> tags after the <body> starts will be ignored.
Remember, if a non-head element comes after intended metadata, it’s not a problem for Google. But how do you know if it’s impacting any critical SEO and metadata elements?
3) View Metadata Tabs
If any of the ‘Validation’ filters are flagged above, then it’s possible that intended metadata might be outside the <head>.
To better understand and identify this issue, there are ‘Outside <head>’ filters for key elements in the following tabs – Page Titles, Meta Descriptions, Canonicals, Directives and Hreflang.
Data can be viewed under each tab and filter, or if you’re using the right-hand ‘Issues’ tab, all validation issues will be flagged with ‘high priority’ for critical elements.
While the SEO Spider will flag any elements ‘Outside <head>’ where appropriate, testing has shown Google will consider titles and directives, such as ‘noindex’, in the <body>. However, this shouldn’t be relied upon, and Google isn’t consistent with this behaviour and will ignore canonicals and hreflang.
If you have a broken <head>, the next step for the curious SEOs amongst you is to debug it.
How To Debug Invalid HTML Elements Breaking The <Head>
If you discover URLs flagged under the ‘Invalid HTML Elements In Head’ filter or another related issue, then you can analyse the HTML to debug.
Scanning the raw HTML for invalid HTML elements between the opening and closing <head> element can be tricky, particularly if the <head> element is big. This example has over 600 lines in the <head>.
However, it is possible, in this case there’s an SVG element on line 70, which shouldn’t be there – based upon the valid HTML elements that we outlined earlier.
Chrome (like Googlebot), will assume the <head> should close and the <body> should open when it encounters a <body> element. So immediately you can see it’s the SVG, without having to bother wading through the raw HTML for elements that shouldn’t be there.
You can also see in this case, that the canonical link element which appears after the SVG in the raw HTML (but inside the <head> element), is actually considered ‘outside the <head>’ and in the <body> in the rendered HTML.
From our testing, Googlebot typically parses in a same way as Chrome with exceptions around iframes, which they like to inline.
However, you can verify Google’s own behavior by using the URL Inspection tool in Search Console, or the Mobile Friendly Test tool (soon to be killed) and reviewing the rendered HTML.
As you can see here, the <body> was opened due to the SVG in the same way as Chrome. Further down in the rendered HTML is the canonical outside the <head>.
If a <noscript> tag includes an invalid HTML element, it can close the <head>. Using the URL Inspection Tool to view the rendered HTML will not reveal the broken <head> element either.
One way to see the issue is using Google’s URL Inspection ‘Page Indexing’, which shows the ‘User-declared Canonical’ after indexing.
The rendered HTML on the right-hand side appears fine, but we can see the ‘User-declared Canonical’ is ‘N/A’ as it’s being ignored due to the <noscript> tag containing an image above it, silently closing the <head>.
This tutorial will hopefully help you find invalid HTML Elements in the <head>, discover which metadata might be adversely affected, and now debug what specifically is causing the issue.
The offending HTML can then be shared with the boss, client or developer for extra kudos and a warm fuzzy feeling.
If you fell asleep halfway through, none of this makes sense, or you’re just struggling to debug an issue, then alternatively, please contact us via support and we can help.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top