SEO Spider

How To Debug Invalid HTML Elements In The Head

Introduction To Invalid HTML Elements In The <Head>

This tutorial explains how to use the Screaming Frog SEO Spider to identify invalid HTML Elements in the <head>, view which metadata might be adversely affected, and how to debug what’s actually causing the issue.

First, let’s quickly summarise what we mean by invalid HTML elements in the <head>.


What Are Invalid HTML Elements In The <Head>?

Using valid HTML for a page’s metadata ensures search engines, such as Google are able to use it as intended.

While search engines will try to understand markup even when there are errors, some fix ups performed within the <head> can cause issues to it being used.

If invalid HTML elements are used within the <head>, Google will assume the head should be closed, and start the <body> – in a similar way to a browser, like Chrome.

This means any metadata that appears after the invalid HTML element could be ignored by Google.

Breaking The Head

The <head> element must only contain the following elements as per the HTML standard:

  • title
  • meta
  • link
  • script
  • style
  • base
  • noscript
  • template

Some of the most common elements that appear in the <head>, that will cause issues include –

  • iframe
  • img
  • svg
  • div
  • noscript containing an img (more on this later!)

The basic rule is that any <body> element that precedes a <head> element can render it useless.

While it’s best to avoid using a non-head element in the <head>, it won’t be an issue for search engines if it comes after the intended metadata. The <head> will simply close after the metadata.

Not Just ‘In The <Head>’, But Preceding

Invalid HTML elements in the <head> is documented by Google and known and recited by many SEOs.

However, it’s not just ‘in the <head>’ that is potentially problematic – but <body> elements preceding the <head> element itself can also impact all metadata in the <head>.

For example, a stray <div> tag preceding the opening <html> element will mean Google automatically opens and closes an empty <head> element, meaning all metadata will be in the <body> and potentially ignored.


How To Identify Invalid HTML Elements Breaking The <Head>

Finding invalid HTML elements in the <head>, or preceding it, at scale across a website is difficult, which is where the SEO Spider can help perform the heavy-lifting.

It will flag any pages with invalid <html> elements that could be problematic, and any metadata such as titles, canonicals, or meta robots that are outside of the <head>.

To identify invalid HTML elements in or preceding the <head>, just follow this process.


1) Crawl the Site

Input the website address into the URL bar, and hit ‘Start’.

Crawl A Site To Find Invalid HTML Elements In The Head

The SEO Spider analyses in real-time, so you can start analysing data, or wait until the crawl reaches 100%.


2) View The ‘Validation’ Tab

Click the ‘Validation’ tab, which has a number of filters that help identify potential issues.

Validation tab

Use the right-hand ‘Overview’ tab to see the number of URLs with each issue within the filters.

Validation Tab

This tab includes the following filters related to potential issues that can occur from invalid HTML.

  • Invalid HTML Elements In <head> – Pages with invalid HTML elements within the <head>. When an invalid element is used in the <head>, Google assumes the end of the <head> element and ignores any elements that appear after the invalid element. This means critical <head> elements that appear after the invalid element will not be seen. The <head> element as per the HTML standard is reserved for title, meta, link, script, style, base, noscript and template elements only.
  • <body> Element Preceding <html> – Pages that have a body element preceding the opening html element. Browsers and Googlebot will automatically assume the start of the body and generate an empty head element before it. This means the intended head element below and its metadata will be seen in the body and ignored.
  • <head> Not First In <html> Element – Pages with an HTML element that proceed the <head> element in the HTML. The <head> should be the first element in the <html> element. Browsers and Googlebot will automatically generate a <head> element if it’s not first in the HTML. While ideally <head> elements would be in the <head>, if a valid <head> element is first in the <html> it will be considered as part of the generated <head>. However, if non <head> elements such as <p>, <body>, <img> etc are used before the intended <head> element and its metadata, then Google assumes the end of the <head> element. This means the intended <head> element and its metadata may only be seen in the <body> and ignored.
  • Missing <head> Tag – Pages missing a <head> element within the HTML. The <head> element is a container for metadata about the page, that’s placed between the <html> and <body> tag. Metadata is used to define the page title, character set, styles, scripts, viewport and other data that are critical to the page. Browsers and Googlebot will automatically generate a <head> element if it’s omitted in the markup, however it may not contain meaningful metadata for the page and this should not be relied upon.
  • Multiple <head> Tags – Pages with multiple <head> elements in the HTML. There should only be one <head> element in the HTML which contains all critical metadata for the document. Browsers and Googlebot will combine metadata from subsequent <head> elements if they are both before the <body>, however, this should not be relied upon and is open to potential mix-ups. Any <head> tags after the <body> starts will be ignored.

Remember, if a non-head element comes after intended metadata, it’s not a problem for Google. But how do you know if it’s impacting any critical SEO and metadata elements?


3) View Metadata Tabs

If any of the ‘Validation’ filters are flagged above, then it’s possible that intended metadata might be outside the <head>.

To better understand and identify this issue, there are ‘Outside <head>’ filters for key elements in the following tabs – Page Titles, Meta Descriptions, Canonicals, Directives and Hreflang.

Page Titles Tab, Outside Head

Data can be viewed under each tab and filter, or if you’re using the right-hand ‘Issues’ tab, all validation issues will be flagged with ‘high priority’ for critical elements.

Issues tab with validation related issues

While the SEO Spider will flag any elements ‘Outside <head>’ where appropriate, testing has shown Google will consider titles and directives, such as ‘noindex’, in the <body>. However, this shouldn’t be relied upon, and Google isn’t consistent with this behaviour and will ignore canonicals and hreflang.

If you have a broken <head>, the next step for the curious SEOs amongst you is to debug it.


How To Debug Invalid HTML Elements Breaking The <Head>

If you discover URLs flagged under the ‘Invalid HTML Elements In Head’ filter or another related issue, then you can analyse the HTML to debug.

Raw HTML

A right click and ‘view page source’ in Chrome will show you the raw HTML before JavaScript, where you can view the contents of the <head> element.

Right-click view page source in Chrome

Scanning the raw HTML for invalid HTML elements between the opening and closing <head> element can be tricky, particularly if the <head> element is big. This example has over 600 lines in the <head>.

Raw HTML Head Element

However, it is possible, in this case there’s an SVG element on line 70, which shouldn’t be there – based upon the valid HTML elements that we outlined earlier.

SVG in the head, what's it doing there?!

One of the issues with just looking at the raw HTML is that it can miss elements that are dynamically inserted by JavaScript.

Rendered HTML

A more efficient and reliable approach to identifying invalid HTML elements in the <head>, is using right-click ‘inspect element’ in Chrome, which shows the rendered HTML after JavaScript has been processed.

This is useful for two reasons – the first element in the <body> will generally be the invalid HTML element. It also takes into account pesky JavaScript, which can dynamically insert invalid HTML elements into the <head> unsuspectingly if you’ve just analysed the raw HTML.

Chrome (like Googlebot), will assume the <head> should close and the <body> should open when it encounters a <body> element. So immediately you can see it’s the SVG, without having to bother wading through the raw HTML for elements that shouldn’t be there.

You can also see in this case, that the canonical link element which appears after the SVG in the raw HTML (but inside the <head> element), is actually considered ‘outside the <head>’ and in the <body> in the rendered HTML.

Canonical outside the Head

Googlebot

From our testing, Googlebot typically parses in a same way as Chrome with exceptions around iframes, which they like to inline.

However, you can verify Google’s own behavior by using the URL Inspection tool in Search Console, or the Mobile Friendly Test tool (soon to be killed) and reviewing the rendered HTML.

Googlebot rendered HTML

As you can see here, the <body> was opened due to the SVG in the same way as Chrome. Further down in the rendered HTML is the canonical outside the <head>.

<noscript> Caveat

While typically we think of Google rendering everything today, there are some <noscript> edge cases, which require you to disable JavaScript with inspect element when testing.

If a <noscript> tag includes an invalid HTML element, it can close the <head>. Using the URL Inspection Tool to view the rendered HTML will not reveal the broken <head> element either.

One way to see the issue is using Google’s URL Inspection ‘Page Indexing’, which shows the ‘User-declared Canonical’ after indexing.

Canonical below noscript tag

The rendered HTML on the right-hand side appears fine, but we can see the ‘User-declared Canonical’ is ‘N/A’ as it’s being ignored due to the <noscript> tag containing an image above it, silently closing the <head>.


Summary

This tutorial will hopefully help you find invalid HTML Elements in the <head>, discover which metadata might be adversely affected, and now debug what specifically is causing the issue.

The offending HTML can then be shared with the boss, client or developer for extra kudos and a warm fuzzy feeling.

If you fell asleep halfway through, none of this makes sense, or you’re just struggling to debug an issue, then alternatively, please contact us via support and we can help.

Join the mailing list for updates, tips & giveaways

Back to top