How To Crawl JavaScript Websites

Introduction To Crawling JavaScript

Historically search engine bots such as Googlebot were not able to crawl and index content created dynamically using JavaScript and were only able to see what was in the static HTML source code.

However, Google in particular has evolved, deprecating their old AJAX crawling scheme guidelines of escaped-fragment #! URLs and HTML snapshots in October ’15, and are generally able to render and understand web pages like a modern-day browser.

While Google are generally able to render pages, in 2019 they updated their guidelines to recommend server-side rendering, pre-rendering or dynamic rendering rather than relying on client-side JavaScript. Google explained it’s “difficult to process JavaScript and not all search engine crawlers are able to process it successfully or immediately”.

However, JavaScript usage is up, and adoption of Google’s own JavaScript MVW framework AngularJS, other frameworks like React and single page applications (SPA) and progressive web apps (PWA) are on the rise.

It’s become more essential today to be able to read the DOM after JavaScript has come into play and constructed the web page and understand the differences between the original response HTML, when crawling and evaulating websites.

Traditional website and SEO crawlers that are used to scan website links and content were only able to crawl the static response HTML, until we launched the first ever JavaScript rendering functionality into our Screaming Frog SEO Spider software, so JavaScript is executed and the DOM is read.

We integrated the Chromium project library for our rendering engine to emulate Google as closely as possible.

Screaming Frog SEO Spider

In 2019 Google updated their web rendering service (WRS) which was previously based on Chrome 41 to be ‘evergreen’ and use the latest, stable version of Chrome. As of writing, this is Chrome 74, which supports 1,000 more features than Chrome 41.

The SEO Spider uses a slightly earlier version of Chrome, version 64 at the time of writing, but we recommend viewing the exact version within the app by clicking ‘Help > Debug’ and scrolling down to the ‘Chrome Version’ line.

Hence, while rendering will obviously be similar, it won’t be exactly the same as there might be some small differences in supported features (there are arguments that the exact version of Chrome itself won’t be exactly the same, either). However, generally, the WRS supports the same web platform features and capabilities that the Chrome version it uses, and you can compare the differences between Chrome 74 and 64 at

This guide contains the following 3 sections. Click and jump to a relevant section, or continue reading.

  1. 1) Why You Shouldn’t Crawl Blindly With JavaScript Enabled
  2. 2) How To Identify JavaScript
  3. 3) How To Crawl JavaScript Websites

If you already understand the basic principles of JavaScript and just want to crawl a JavaScript website, skip straight to our guide on configuring the Screaming Frog SEO Spider tool to crawl JavaScript sites. Or, read on.

Why You Shouldn’t Crawl Blindly With JavaScript Enabled

While it’s essential in auditing today, we recommend utilising JavaScript crawling selectively when required and only keeping this enabled by default with careful consideration.

You don’t have to identify whether the site itself is using JavaScript. You can just go ahead and crawl with JavaScript rendering enabled and sites that use JavaScript will be crawled. However, you should take care, as there are a couple of big problems blindly crawling with JavaScript enabled.

First of all, JavaScript crawling is slower and more intensive for the server, as all resources (whether JavaScript, CSS, images etc.) need to be fetched to render each web page. This won’t be an issue for smaller websites, but for a large website with many thousands or more pages, this can make a huge difference.

If your site doesn’t rely on JavaScript to dynamically manipulate a web page significantly, then there’s often no need to waste time and resource.

More importantly, if you’re auditing a website you should know how it’s built and not put all your faith in any tool. JavaScript frameworks can be quite different to one another, and the SEO implications are very different to a traditional HTML site.

Plenty of sites are still using the old AJAX crawling scheme as well, which requires a unique set-up, and this is very different to relying purely on rendering JavaScript for crawling, indexing and scoring.

Core JavaScript Principles

While Google can typically crawl and index JavaScript, there’s some core principles and limitations that need to be understood.

  1. All the resources of a page (JS, CSS, imagery) need to be available to be crawled, rendered and indexed.
  2. Google still require clean, unique URLs for a page, and links to be to be in proper HTML anchor tags (you can offer a static link, as well as calling a JavaScript function).
  3. They don’t click around like a user and load additional events after the render (a click, a hover or a scroll for example).
  4. The rendered page snapshot is estimated to be taken at around 5 seconds, although in reality we believe this adapts based upon responses. However, there is a risk if a page takes too long to render, elements won’t be seen and indexed.
  5. Finally, Google’s rendering is seperate to indexing. Google initially crawls the static HTML of a website, and defers rendering until it has resource. Only then will it discover further content and links available in the rendered HTML. This can be a few days, to a week.

It’s essential you know these things with JavaScript SEO, as you live and die by the render in rankings.

Google Advice On JavaScript & Rendering Strategy

It’s important to remember that Google strongly advises against relying purely on JavaScript and recommend developing with progressive enhancement, building the site’s structure and navigation using only HTML and then improving the site’s appearance and interface with AJAX.

If you’re using a JavaScript framework, rather than relying on a fully client-side rendered approach, Google recommend using server-side rendering, pre-rendering or hybrid rendering which can improve performance for users and search engine crawlers.

Server-side rendering (SSR) and pre-rendering excecute the pages JavaScript and delivering a rendered initial HTML version of the page to both users and search engines.

Hybrid rendering (sometimes referred to as ‘Isomorphic’) is where rendering can take place on the server-side for the initial page load and HTML, and client-side for non critical elements and pages afterwards.

Many JavaScript frameworks such as React or Angular Universal allow for server-side and hybrid rendering.

Alternatively, a workaround to help crawlers is to use dynamic rendering. This can be particularly useful when changes can’t be made to the front-end codebase. Dynamic rendering means switching between client-side rendered for users and pre-rendered content for specific user agents (in this case, the search engines). This means crawlers will be served a static HTML version of the web page for crawling and indexing.

Dynamic rendering is seen as a stop-gap, rather than a long-term strategy as it doesn’t have the user experience or performance benefits that some of the above solutions. If you already have this set-up, then you can test this functionality by switching the user-agent to Googlebot within the SEO Spider.

Google also have a very useful progressive web app checklist, which covers some essential requirements for crawling and indexing of PWAs, such as using the history API instead of page fragment identifiers.

JavaScript Indexing Complications

Even though Google are generally able to crawl and index JavaScript, there are fruther considerations.

Google have a two-phase indexing process, where by they initially crawl and index the static HTML, and then return later when resources are available to render the page and crawl and index content and links in the rendered HTML.

Google render queue

This means the crawling and indexing process is much slower, so if you rely on timely content (such as a publisher), a client-side approach is not a sensible option. It also means that elements in the original response (such as meta data and canonicals) can be used for the page, until Google gets around to rendering it when resources are available.

Other search engines like Bing struggle to render and index JavaScript at scale and due to the fragility of JavaScript, it’s fairly easy to experience errors hindering the render, and indexing of content. Feature detection should be used, and errors should be handled gracefully with a fallback.

For further reading on JavaScript SEO, I highly recommend Justin Briggs guide on the core principles, and SEO guides on progressive web apps, Angular.JS, Universal Angular 2.0 and React.JS by Builtvisible. I also recommend Bartosz Góralewicz’s piece on crawling JavaScript, which touches upon some of the points in this guide.

The purpose of this guide is not actually to go into lots of detail about JavaScript SEO, but more specifically, how to identify and crawl JavaScript websites with a client-side approach using our Screaming Frog SEO Spider software.

How To Identify JavaScript Sites

Identifying a site built using a JavaScript framework can be pretty simple, however, identifying sections, pages or just smaller elements which are dynamically adapted using JavaScript can be far more challenging.

There’s a number of ways you’ll know whether the site is built using a JavaScript framework.


This is a start point for many, and you can just go ahead and start a crawl of a website with the standard configuration. By default, the SEO Spider will crawl using the ‘old AJAX crawling scheme’, which means JavaScript is disabled, but the old AJAX crawling scheme will be adhered to if set up correctly by the website.

If the site uses JavaScript and is set up with escaped-fragment (#!) URLs and HTML snapshots as per Google’s old AJAX crawling scheme, then it will be crawled and URLs will appear under the ‘AJAX’ tab in the SEO Spider. This tab only includes pages using the old AJAX crawling scheme specifically, not every page that uses AJAX.

Old Ajax Crawling Scheme

The AJAX tab shows both ugly and pretty versions of URLs, and like Google, the SEO Spider fetches the ugly version of the URL and maps the pre-rendered HTML snapshot to the pretty URL. Some AJAX sites or pages may not use hash fragments, so the meta fragment tag can be used to recognise an AJAX page for crawlers.

If the site is built using JavaScript but doesn’t adhere to the old crawling scheme or pre-render pages, then you may find only the homepage is crawled with a 200 OK response and perhaps a couple of JavaScript and CSS files, but not much else.

Crawling JavaScript without rendering

You’ll probably also find that the page has virtually no ‘outlinks’ in the tab at the bottom of the tool, as they are not being rendered and hence can’t be seen.

JavaScript Framework Outlinks

In the example screen shot above, the ‘outlinks’ tab in the SEO Spider shows JS and CSS files on the page only.

Client Q&A

This should really be the first step. One of the simplest ways to find out about a website is to speak to the client and the development team and ask the question. What’s the site built in? What CMS is it using, or is it bespoke?

Pretty sensible questions and you might just get a useful answer.

Disable JavaScript

You can turn JavaScript off in your browser and view content available. This is possible in Chrome using the built-in developer tools, or if you use Firefox, the web developer toolbar plugin has the same functionality. Is content available with JavaScript turned off? You may just see a blank page.

JavaScript disabled

Typically it’s also useful to disable cookies and CSS during an audit as well to diagnose for other crawling issues that can be experienced.

Audit The Source Code

A simple one, by right clicking and viewing the raw HTML source code. Is there actually much text and HTML content? Often there are signs and hints to JS frameworks and libraries used. Are you able to see the content and hyperlinks rendered in your browser within the HTML source code?

You’re viewing code before it’s processed by the browser and what the SEO Spider will crawl, when not in JavaScript rendering mode.

If you run a search and can’t find them within the source, then they will be dynamically generated in the DOM and will only be viewable in the rendered code.

source code of a JS site

If the body is pretty much empty like the above example, it’s a pretty clear indication.

Audit The Rendered Code

How different is the rendered code to the static HTML source? By right clicking and using ‘inspect element’ in Chrome, you can view the rendered HTML. You can often see the JS Framework name in the rendered code, like ‘React’ in the example below.

Rendered HTML source code

You will find that the content and hyperlinks are in the rendered code, but not the original HTML source code. This is what the SEO Spider will see, when in JavaScript rendering mode.

By clicking on the opening HTML element, then ‘copy > outerHTML’ you can compare the rendered source code, against the original source.

Toolbars & Plugins

Various toolbars and plugins such as the BuiltWith toolbar, Wappalyser and JS library detector for Chrome can help identify the technologies and frameworks being utilised on a web page at a glance.

These are not always accurate, but can provide some valuable hints, without much work.

Manual Auditing Is Still Required

These points should help you identify sites that are built using a JS framework fairly easily. However, further analysis is always recommended to discover JavaScript elements, with a manual inspection of page templates, auditing different content areas and elements which might require user interaction.

We see lots of e-commerce websites relying on JavaScript to load products onto category pages, which is often missed by webmasters and SEOs until they realise product pages are not being crawled in standard (non-rendering) crawls.

Additionally, you can support a manual audit by crawling a selection of templates and pages from across the website, with JavaScript both disabled and enabled, and analysing any differences in elements and content. Sometimes websites use variables for elements like titles, meta tags or canonicals, which are extremely difficult to pick up by the eye only.

I recommend reading Justin Briggs’s guide to auditing JavaScript for SEO, which goes into far more practical detail about this analysis phase.

How To Crawl JavaScript Using The SEO Spider

Once you have identified JavaScript you want to crawl, next you’ll need to configure the SEO Spider to JavaScript rendering mode to be able to crawl it.

The following 7 steps should help you configure a crawl for most cases encountered.

1) Configure Rendering To ‘JavaScript’

To crawl a JavaScript website, open up the SEO Spider, click ‘Configuration > Spider > Rendering’ and change ‘Rendering’ to ‘JavaScript’.

Crawl With JavaScript Rendering

2) Check Resources & External Links

Ensure resources such as images, CSS and JS are ticked under ‘Configuration > Spider’.

If resources are on a different subdomain, or a separate root domain, then ‘check external links‘ should be ticked, otherwise they won’t be crawled and hence rendered either.

Check Resources For JavaScript Rendering

This is the default configuration in the SEO Spider, so you can simply click ‘File > Default Config > Clear Default Configuration’ to revert to this set-up.

3) Configure User-Agent & Window Size

You can configure both the user-agent under ‘Configuration > HTTP Header > User-Agent’ and window size by clicking ‘Configuration > Spider > Rendering’ in JavaScript rendering mode to your own requirements.

This is an optionable step, the window size is set to Googlebot’s desktop dimensions in standard configuration. Google are expected to move to a mobile-first index soon, hence if you’re performing a mobile audit you can configure the SEO Spider to mimic Googlebot for Smartphones.

Configure user-agent & window size

4) Crawl The Website

Now type or paste in the website you wish to crawl in the ‘enter url to spider’ box and hit ‘Start’.

Crawl a JavaScript framework website

The crawling experience is quite different to a standard crawl, as it can take time for anything to appear in the UI to start with, then all of a sudden lots of URLs appear together at once. This is due to the SEO Spider waiting for all the resources to be fetched to render a page before the data is displayed.

5) Monitor Blocked Resources

Keep an eye on anything appearing under the ‘Blocked Resource’ filter within the ‘Response Codes’ tab. You can glance at the right-hand overview pane, rather than click on the tab specifically. If JavaScript, CSS or images are blocked via robots.txt (don’t respond, or error), then this will impact rendering, crawling and indexing.

Monitor blocked resources

Blocked resources can also be viewed for each page within the ‘Rendered Page’ tab, adjacent to the rendered screen shot in the lower window pane. In severe cases, if a JavaScript site blocks JS resources completely, then the site simply won’t crawl.

Blocked Resources JavaScript Crawling

If key resources which impact the render are blocked, then unblock them to crawl (or allow them using the custom robots.txt for the crawl). You can test different scenarios using both the exclude and custom robots.txt features.

The pages this impacts and the individual blocked resources can also be exported in bulk via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.

Export blocked resources

6) View Rendered Pages

You can view the rendered page the SEO Spider crawled in the ‘Rendered Page’ tab which dynamically appears at the bottom of the user interface when crawling in JavaScript rendering mode. This populates the lower window pane when selecting URLs in the top window.

Viewing the rendered page is vital when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t use Google’s own Fetch & Render in Search Console.

If you have adjusted the user-agent and viewport to Googlebot Smartphone, you can see exactly how every page renders on mobile for example.

View the rendered pages

If you spot any problems in the rendered page screen shots and it isn’t due to blocked resources, you may need to consider adjusting the AJAX timeout, or digging deeper into the rendered HTML source code for further analysis.

7) Compare Raw & Rendered HTML

You may wish to store and view HTML and rendered HTML within the SEO Spider when working with JavaScript. This can be set-up under ‘Configuration > Spider > Advanced’ and ticking the appropriate store HTML & store rendered HTML options.

This then populates the lower window ‘view source’ pane, to enable you to compare the differences, and be confident that critical content or links are present within the DOM.

View Source Tab Closer Up!

This is super useful for a variety of scenarios, such as debugging the differences between what is seen in a browser and in the SEO Spider, or just when analysing how JavaScript has been rendered, and whether certain elements are within the code.

8) Adjust The AJAX Timeout

Based upon the responses of your crawl, you can choose when the snapshot of the rendered page is taken by adjusting the ‘AJAX timeout‘ which is set to 5 seconds, under ‘Configuration > Spider > Rendering’ in JavaScript rendering mode.

JavaScript Rendering AJAX timeout

Previous internal testing indicated that Googlebot takes their snapshot of the rendered page at 5 seconds, which many in the industry concurred with when we discussed it more publicly in 2016.

In reality, we believe Google is more flexible than the above and things like caching play a part. However, Google obviously won’t wait forever, so content that you want to be crawled and indexed, needs to be available quickly, or it simply won’t be seen. We’ve seen cases of misfiring JS causing the render to load much later, and entire websites plummeting in rankings due to pages suddenly being indexed and scored with virtually no content.

It’s worth noting that a crawl by our software will often be more resource intensive than a regular Google crawl over time. This might mean that the site response times are typically slower, and the AJAX timeout requires adjustment.

You’ll know this might need to be adjusted if the site fails to crawl properly, ‘response times’ in the ‘Internal’ tab are longer than 5 seconds, or web pages don’t appear to have loaded and rendered correctly in the ‘rendered page’ tab.

Closing Thoughts

The guide above should help you identify JavaScript websites and crawl them efficiently using the Screaming Frog SEO Spider tool in JavaScript rendering mode.

While we have performed plenty of research internally and worked hard to mimic Google’s own rendering capabilities, a crawler is still only ever a simulation of real search engine bot behaviour.

We highly recommend using log file analysis and Google’s own URL Inspection Tool, or downloading and using the relevant version of Chrome to fully understand what they are able to crawl, render and index, alongside a JavaScript crawler.

If you experience any problems when crawling JavaScript, or encounter any differences between how we render and crawl, and Google, we’d love to hear from you. Please get in touch with our support team directly.

  • Like us on Facebook
  • Connect with us on LinkedIn
  • Follow us on Twitter
  • View our RSS feed



Purchase a licence.