Like Google we use Chrome for our web rendering service (WRS) and keep this updated to be as close to ‘evergreen’. The exact version used in the SEO Spider can be viewed within the app (‘Help > Debug’ on the ‘Chrome Version’ line).
This guide contains 4 sections –
- All the resources of a page (JS, CSS, imagery) need to be available to be crawled, rendered and indexed.
- They don’t click around like a user and load additional events after the render (a click, a hover or a scroll for example).
- The rendered page snapshot is taken when network activity is determined to have stopped, or over a time threshold. There is a risk if a page takes a very long time to render it might be skipped and elements won’t be seen and indexed.
- Typically Google will render all pages, however they will not queue pages for rendering if they have a ‘noindex’ in the initial HTTP response or static HTML.
- Google’s rendering is separate to indexing. Google initially crawls the static HTML of a website, and defers rendering until it has resource. Only then will it discover further content and links available in the rendered HTML. Historically this could take a week, but Google have made significant improvements to the point that the median time is now down to just 5 seconds.
- Hydration and hybrid rendering (also called ‘Isomorphic’) is where rendering can take place on the server-side for the initial page load and HTML, and client-side for non critical elements and pages afterwards.
Alternatively, a workaround is to use dynamic rendering. This can be useful when changes can’t be made to the front-end codebase. Dynamic rendering means switching between client-side rendered for users and pre-rendered content for specific user agents (in this case, the search engines). This means crawlers will be served a static HTML version of the web page for crawling and indexing.
Dynamic rendering is seen as a stop-gap, rather than a long-term strategy as it doesn’t have the user experience or performance benefits that some of the above solutions. If you have this set-up, then you can test this by switching the user-agent to Googlebot within the SEO Spider (‘Config > User-Agent’).
Google have a two-phase indexing process, where by they initially crawl and index the static HTML, and then return later when resources are available to render the page and crawl and index content and links in the rendered HTML.
The median time between crawling and rendering is 5 seconds – however, it is dependent on resource availability and therefore it can be longer, which is problematic for websites that rely on timely content (such as publishers).
If for some reason the render takes longer, then elements in the original response (such as meta data and canonicals) can be used for the page, until Google gets around to rendering it when resources are available. All pages will be rendered unless they have a robots meta tag or header instructing Googlebot not to index the page. So the initial HTML response also needs to be consistent.
While often these are not a problem, relying on client-side rendering can mean that a mistake or simple oversight is extremely costly as it impacts the indexing of pages.
This is a start point for many, and you can just go ahead and start a crawl of a website with default settings using the Screaming Frog SEO Spider.
You’ll also find that the page has no hyperlinks in the lower ‘outlinks’ tab, as they are not being rendered and therefore can’t be seen.
How do you find these with ease?
The SEO Spider will then crawl both the original and rendered HTML to identify pages that have content or links only available client-side and report other key dependencies.
You can also find pages with links that are only in the rendered HTML in a similar way.
This should really be the first step. One of the simplest ways to find out about a website is to speak to the client and the development team and ask the question.
Pretty sensible questions, and you might just get a useful answer.
Typically it’s also useful to disable cookies and CSS during an audit as well to diagnose for other crawling issues that can be experienced.
Audit View Source
Is there any text content, or HTML? Often there are signs and hints to JS frameworks and libraries used. Are you able to see the content and hyperlinks within the HTML source code?
If you run a search and can’t find content or links within the source, then they will be dynamically generated in the DOM and will only be viewable in the rendered code.
If the body is empty like the above example, it’s a pretty clear indication.
Audit The Rendered Source
You can often see the JS Framework name in the rendered code, like ‘React’ in the example below.
By clicking on the opening HTML element, then ‘copy > outerHTML’ you can compare the rendered source code, against the original source.
Toolbars & Plugins
These are not always accurate, but can provide some valuable hints, without much work.
- Auditing with known client-side dependencies.
- During large site deployments.
2) Configure User-Agent & Window Size
The default viewport for rendering is set to Googlebot Smartphone, as Google primarily crawls and indexes pages with their smartphone agent for mobile-first indexing.
This will mean you’ll see a mobile sized screenshot in the lower ‘rendered page’ tab.
3) Check Resources & External Links
If resources are on a different subdomain, or a separate root domain, then ‘check external links‘ should be ticked, otherwise they won’t be crawled and hence rendered either.
This is the default configuration in the SEO Spider, so you can simply click ‘File > Default Config > Clear Default Configuration’ to revert to this set-up.
4) Crawl The Website
Now type or paste in the website you wish to crawl in the ‘enter url to spider’ box and hit ‘Start’.
The crawling experience is different to a standard crawl, as it can take time for anything to appear in the UI to start with, then all of a sudden lots of URLs appear together at once. This is due to the SEO Spider waiting for all the resources to be fetched to render a page before the data is displayed.
You’re able to filter by the following SEO related items –
6) Monitor Blocked Resources
If key resources which impact the render are blocked, then unblock them to crawl (or allow them using the custom robots.txt for the crawl). You can test different scenarios using both the exclude and custom robots.txt features.
The individual blocked resources can also be viewed under ‘Response Codes > Blocked Resource’.
They can be exported in bulk including the source pages via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.
7) View Rendered Pages
Viewing the rendered page can be useful when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t use Google’s own URL Inspection Tool in Google Search Console.
If you spot any problems in the rendered page screen shots and it isn’t due to blocked resources, you may need to consider adjusting the AJAX timeout, or digging deeper into the rendered HTML source code for further analysis.
8) Compare Raw & Rendered HTML & Visible Content
This then populates the lower window ‘view source’ pane, to enable you to compare the differences, and be confident that critical content or links are present within the DOM. Click ‘Show Differences’ to see a diff.
10) Adjust The AJAX Timeout
The 5 second timeout is generally fine for most websites, and Googlebot is more flexible as they adapt based upon how long a page takes to load content, considering network activity and they perform a lot of caching. However, Google obviously won’t wait forever, so content that you want to be crawled and indexed, needs to be available quickly, or it simply won’t be seen.
It’s worth noting that a crawl by our software will often be more resource intensive than a regular Google crawl over time. This might mean that the site response times are typically slower, and the AJAX timeout requires adjustment.
You’ll know this might need to be adjusted if the site fails to crawl properly, ‘response times’ in the ‘Internal’ tab are longer than 5 seconds, or web pages don’t appear to have loaded and rendered correctly in the ‘rendered page’ tab.
While we have performed plenty of research internally and worked hard to mimic Google’s own rendering capabilities, a crawler is still only ever a simulation of real search engine bot behaviour.
- Core Principles of JS SEO – From Justin Briggs.
- Progressive Web Apps Fundamentals Guide – From Builtvisible.
- Crawling JS Rich Sites – From Onely.