How To Use The SEO Spider For Broken Link Building
If you’ve not heard of ‘broken link building’ before, it’s essentially a tactic that involves letting a webmaster know about broken links on their site and suggesting an alternative resource (perhaps your own site or a particular piece of content, alongside any others).
There’s a couple of ways that link builders approach this, which include –
- Collecting a big list of ‘prospects’ such as resource pages or pages around a particular content theme or search phrase. Then checking these pages for broken links.
- Another method is simply picking a single site, checking the entirety of it for relevant resource pages and broken links (and potentially creating content that will allow you to suggest your own site).
I don’t want to dig to deep into the entire process, you can read a fantastic guide over here on Moz by Russ Jones. However, as we get asked this question an awful lot, I wanted to explain how you can use the Screaming Frog SEO Spider tool to help scale the process, in particular for the first method listed above.
1) Upload Your Target URLs
When you have your list of relevant prospects you wish to check for broken links, fire up the Screaming Frog SEO Spider & switch the mode from ‘spider’ to ‘list’ and upload them.
2) Adjust Search Depth To 1
As default in list mode the crawl depth is essentially ’0′, only the URLs in your list will be crawled. However, we need to crawl the outbound URLs from the list, so adjust the search depth to ’1′ in the spider configuration. You can also ‘untick’ crawling of images, CSS, JS etc.
3) Start The Crawl
Now hit the ‘start’ button, let the SEO Spider crawl the URLs, reach 100% and come to a stop.
4) Click On ‘Advanced Export’ & ‘All Out Links’
This report will contain all links in the original seed list, as well as their outbound links.
5) Open Up In Excel & Filter The Status Code for 4XX
The seed list of URLs uploaded are the source URLs in column B, while their outbound links which we want to check for broken links are the destination URLs in column C. If you filter the ‘status code’ column, you may see some ’404′ broken links.
Here’s a quick screenshot of a dozen blog URLs I uploaded from our website and a few well know search marketing blogs (click for a larger image as it’s rather small).
So that’s it, you have a list of broken links against their sources for your broken link building. You can stop reading now, but just checking for 4XX errors will mean you miss out on further opportunities to explore. There are a couple of other points to consider –
- You may notice the ‘source URL’ column contains more URLs than just the original seed list. This is because the crawl depth was set at ’1′, meaning the outbound links from the seed list are included as well (as they have outbound links too!). However, their own ‘outbound links’ are not actually crawled for response codes (as crawl depth is ’1′ remember) so these are merely noted in the ‘destination URL’ column, without any response codes. So URLs without any response codes are quite normal and are the outlinks from the outlinks of the original seed list.
- The other thing to note is, URLs might not 404 error immediately. Quite often a URL will 302 (301, or 303) once or multiple times before reaching a final 404 error.
Hence, for 3XX responses, I’d recommend auditing them as follows.
1) Filter For 3XX Responses In The ‘Destination URL’ Column
Then cleaning the ‘destination URLs’ list a little, it will undoubtedly contain links like Twitter, Google Plus, Facebook, LinkedIn & login URLs etc which all redirect. Run a quick filter on this column and mass delete all the rubbish from the list.
2) Save This New 3XX List
You’ll need this list later to potentially match back the destination URL which is 3XX’ing to its originating source URL. This is what was left in my list after cleaning up, which we need to audit.
3) Now Audit Those Redirects
Follow the process outlined in the ‘How To Audit Redirects‘ guide by saving the ‘destination URLs’ into a new list & crawling until their final target URL using the ‘always follow redirects‘ option, to discover any broken links. The redirect chains report will provide a complete view of these!
4) Match The 4XX Errors Discovered Against Your Saved 3XX List Source URLs
So the ‘redirect chains’ report may contain 4XX errors you would have missed if you hadn’t audited the 3XX responses. For example, here are a couple more I discovered using this method –
The above contains a URL which 301′s to a 404 and another with a soft 404, a 302 to a 200 response. With this report you can match the ‘address’ URLs in ‘column A’ back to the ‘destination URLs’ and subsequent ‘source URLs’ from your saved 3XX list. Both of the above in this example come from the same blog post if you look at the images.
Hopefully the above process helps make broken link building more efficient. Please just let us know if you have any questions in the comments as usual.