Authentication

Table of Contents

General

Configuration Options

Spider Crawl Tab

Spider Extraction Tab

Spider Limits Tab

Spider Rendering Tab

Spider Advanced Tab

Spider Preferences Tab

Other Configuration Options

Tabs

Authentication

Configuration > Authentication

The SEO Spider supports two forms of authentication, standards based which includes basic and digest authentication, and web forms based authentication.

Check out our video guide on how to crawl behind a login, or carry on reading below.

Basic & Digest Authentication

There is no set-up required for basic and digest authentication, it is detected automatically during a crawl of a page which requires a login. If you visit the website and your browser gives you a pop-up requesting a username and password, that will be basic or digest authentication. If the login screen is contained in the page itself, this will be a web form authentication, which is discussed in the next section.

Often sites in development will also be blocked via robots.txt as well, so make sure this is not the case or use the ‘ignore robot.txt configuration‘. Then simply insert the staging site URL, crawl and a pop-up box will appear, just like it does in a web browser, asking for a username and password.

basic authentication

Enter your credentials and the crawl will continue as normal.

Alternatively, you can pre-enter login credentials via ‘Config > Authentication’ and clicking ‘Add’ on the Standards Based tab.

Add Standards Based Authentication Details

Then input the URL, username and password.

Standards Based Authentication

When entered in the authentication config, they will be remembered until they are deleted.

This feature does not require a licence key. Try to following pages to see how authentication works in your browser, or in the SEO Spider.

Web Form Authentication

There are other web forms and areas which require you to login with cookies for authentication to be able to view or crawl it. The SEO Spider allows users to log in to these web forms within the SEO Spider’s built in Chromium browser, and then crawl it. This feature requires a licence to use it.

To log in, navigate to ‘Configuration > Authentication’ then switch to the ‘Forms Based’ tab, click the ‘Add’ button, enter the URL for the site you want to crawl, and a browser will pop up allowing you to log in.

Web Forms Authentication

Please read our guide on crawling web form password protected sites in our user guide, before using this feature. Some website’s may also require JavaScript rendering to be enabled when logged in to be able to crawl it.

Please note – This is a very powerful feature, and should therefore be used responsibly. The SEO Spider clicks every link on a page; when you’re logged in that may include links to log you out, create posts, install plugins, or even delete data.

Authentication Profiles

The authentication profiles tab allows you to export an authentication configuration to be used with scheduling, or command line.

This means it’s possible for the SEO Spider to login to standards and web forms based authentication for automated crawls.

Authentication Profiles

When you have authenticated via standards based or web forms authentication in the user interface, you can visit the ‘Profiles’ tab, and export an .seospiderauthconfig file.

This can be supplied in scheduling via the ‘start options’ tab, or using the ‘auth-config’ argument for the command line as outlined in the CLI options.

Authentication Profiles In Scheduling

Please note – We can’t guarantee that automated web forms authentication will always work, as some websites will expire login tokens or have 2FA etc. Exporting or saving a default authentication profile will store an encrypted version of your authentication credentials on disk using AES-256 Galois/Counter Mode.

Troubleshooting

  • Forms based authentication uses the configured User Agent. If you are unable to login, perhaps try this as Chrome or another browser.

Join the mailing list for updates, tips & giveaways

Back to top