How To Bypass Geo IP Redirection In A Crawl
I wanted to put together this rapid-fire guide after a question on Twitter from Charlie Whitworth, asking if anyone knew of a way to get around country specific IP redirection, when attempting to perform a crawl using the Screaming Frog SEO Spider.
— Charlie Whitworth (@WhitworthSEO) September 25, 2017
We’ve all experienced it, you visit a website in another country and get redirected to your local version immediately without being asked. Google do it for their own search engines (although, they recommend you don’t do it, as it can be an irritating user experience) and plenty of international brands do the same. It can be helpful, but it can be rather unhelpful, particularly as an SEO when analysing international sites.
Viewing the website outside of your own region often requires finding a link to the country version and setting it as your preference (which sets a cookie in your browser).
Historically, performing a crawl of an auto country-IP redirecting website required using a proxy and spoofing IP of the country, or occasionally being able to set a parameter in the URL string to bypass it. Other times sites might redirect to different versions of the site, based upon the Accept-Language value rather than IP, which you can easily adjust in the SEO Spider.
However, I wanted to share an easy way to crawl sites which redirect based upon location, using the relatively new and scarily powerful web forms authentication feature, which allows you to login to anything, and crawl it.
Crawling A Site With Geo IP Redirection
An example of an auto-redirecting site by location is GoDaddy. They use country-level subdomains, with the US version of the website on the www. So let’s say I wanted to crawl the US website, from outside the US. This is what happens.
The www. homepage immediately 302 redirects to https://pt.godaddy.com/, which is the Portuguese subdomain, and the location of where I am currently on holiday (writing riveting blog posts like this). The site won’t let me crawl the www. US version, it redirects and that’s that. As a user, you can set the location, which is where forms-based authentication can help, too.
1) Click ‘Configuration > Authentication > Forms Based’
Then hit ‘add’, and the URL of the site you’ve attempted to crawl will auto-populate (www.godaddy.com in this example).
Our inbuilt browser window will then appear, and you’ll see the www. version of the site has redirected again to your location version, like the crawl.
I can see the Portuguese subdomain, but still want to crawl the US site.
2) Set The Location You Want To Crawl In The In-built Browser
Now all you need to do is set the preferred version of the site you wish to crawl. Godaddy has a country menu which makes this simple.
I simply click on the ‘United States’ link to go to the www.godaddy.com subdomain, which displays the correct location and sets a cookie within the browser and the SEO Spider.
Then click ‘OK’ within the browser window.
3) Now Crawl The Set Location
Now start the crawl again. With the cookie set, you’ll be able to crawl the preferred location site.
This beats using a proxy which can be slow, and just annoying to have to bother to set-up. Enjoy.