URL rewriting
Table of Contents
URL rewriting
Configuration > URL Rewriting
The URL rewriting feature allows you to rewrite URLs on the fly. For the majority of cases, the ‘remove parameters’ and common options (under ‘options’) will suffice. However, we do also offer an advanced regex replace feature which provides further control.
URL rewriting is only applied to URLs discovered in the course of crawling a website, not URLs that are entered as the start of a crawl in ‘Spider’ mode, or as part of a set of URLs in ‘List’ mode.
Remove Parameters
This feature allows you to automatically remove parameters in URLs. This is extremely useful for websites with session IDs, Google Analytics tracking or lots of parameters which you wish to remove. For example –
If the website has session IDs which make the URLs appear something like this ‘example.com/?sid=random-string-of-characters’. To remove the session ID, you just need to add ‘sid’ (without the apostrophes) within the ‘parameters’ field in the ‘remove parameters’ tab.
The SEO Spider will then automatically strip the session ID from the URL. You can test to see how a URL will be rewritten by our SEO Spider under the ‘test’ tab.
This feature can also be used for removing Google Analytics tracking parameters. For example, you can just include the following under ‘remove parameters’ –
utm_source
utm_medium
utm_campaign
This will strip the standard tracking parameters from URLs.
Regex Replace
This advanced feature runs against each URL found during a crawl or in list mode. It replaces each substring of a URL that matches the regex with the given replace string. The “Regex Replace” feature can be tested in the “Test” tab of the “URL Rewriting” configuration window.
Examples are:
1) Changing all links from HTTP to HTTPS
Regex: http
Replace: https
2) Changing all links to example.com to be example.co.uk
Regex: .com
Replace: .co.uk
3) Making all links containing page=number to a fixed number, eg
www.example.com/page.php?page=1
www.example.com/page.php?page=2
www.example.com/page.php?page=3
www.example.com/page.php?page=4
To make all these go to www.example.com/page.php?page=1
Regex: page=\d+
Replace: page=1
4) Removing the www. domain from any URL by using an empty ‘Replace’. If you want to remove a query string parameter, please use the “Remove Parameters” feature – Regex is not the correct tool for this job!
Regex: www.
Replace:
5) Stripping all parameters
Regex: \?.*
Replace:
6) Changing links for only subdomains of example.com from HTTP to HTTPS
Regex: http://(.*example.com)
Replace: https://$1
7) Removing the anything after the hash value in JavaScript rendering mode
Regex: #.*
Replace:
8) Adding parameters to URLs
Regex: $
Replace: ?parameter=value
This will add ‘?parameter=value’ to the end of any URL encountered
In situations where the site already has parameters this requires more complicated expressions for the parameter to be added correctly:
Regex: (.*?\?.*)
Replace: $1¶meter=value
Regex: (^((?!\?).)*$)
Replace: $1?parameter=value
These must be entered in the order above or this will not work when adding the new parameter to existing query strings.
Options
We will include common options under this section. The ‘lowercase discovered URLs’ option does exactly that, it converts all URLs crawled into lowercase which can be useful for websites with case sensitivity issues in URLs.