Posted 27 January, 2014 by screamingfrog in Screaming Frog SEO Spider
How To Use The SEO Spider For Broken Link Building
If you’ve not heard of ‘broken link building’ before, it’s essentially a tactic that involves letting a webmaster know about broken links on their site and suggesting an alternative resource (perhaps your own site or a particular piece of content, alongside any others).
There’s a couple of ways that link builders approach this, which include –
- Collecting a big list of ‘prospects’ such as resource pages or pages around a particular content theme or search phrase. Then checking these pages for broken links.
- Another method is simply picking a single site, checking the entirety of it for relevant resource pages and broken links (and potentially creating content that will allow you to suggest your own site).
I don’t want to dig to deep into the entire process, you can read a fantastic guide over here on Moz by Russ Jones. However, as we get asked this question an awful lot, I wanted to explain how you can use the Screaming Frog SEO Spider tool to help scale the process, in particular for the first method listed above.
1) Upload Your Target URLs
When you have your list of relevant prospects you wish to check for broken links, fire up the Screaming Frog SEO Spider & switch the mode from ‘spider’ to ‘list’ and upload them.
2) Adjust Search Depth To 1
As default in list mode the crawl depth is essentially ‘0’, only the URLs in your list will be crawled. However, we need to crawl the outbound URLs from the list, so adjust the search depth to ‘1’ in the spider configuration. You can also ‘untick’ crawling of images, CSS, JS etc.
3) Start The Crawl
Now hit the ‘start’ button, let the SEO Spider crawl the URLs, reach 100% and come to a stop.
4) Click On ‘Advanced Export’ & ‘All Out Links’
This report will contain all links in the original seed list, as well as their outbound links.
5) Open Up In Excel & Filter The Status Code for 4XX
The seed list of URLs uploaded are the source URLs in column B, while their outbound links which we want to check for broken links are the destination URLs in column C. If you filter the ‘status code’ column, you may see some ‘404’ broken links.
Here’s a quick screenshot of a dozen blog URLs I uploaded from our website and a few well know search marketing blogs (click for a larger image as it’s rather small).
So that’s it, you have a list of broken links against their sources for your broken link building. You can stop reading now, but just checking for 4XX errors will mean you miss out on further opportunities to explore. There are a couple of other points to consider –
- You may notice the ‘source URL’ column contains more URLs than just the original seed list. This is because the crawl depth was set at ‘1’, meaning the outbound links from the seed list are included as well (as they have outbound links too!). However, their own ‘outbound links’ are not actually crawled for response codes (as crawl depth is ‘1’ remember) so these are merely noted in the ‘destination URL’ column, without any response codes. So URLs without any response codes are quite normal and are the outlinks from the outlinks of the original seed list.
- The other thing to note is, URLs might not 404 error immediately. Quite often a URL will 302 (301, or 303) once or multiple times before reaching a final 404 error.
Hence, for 3XX responses, I’d recommend auditing them as follows.
1) Filter For 3XX Responses In The ‘Destination URL’ Column
Then cleaning the ‘destination URLs’ list a little, it will undoubtedly contain links like Twitter, Google Plus, Facebook, LinkedIn & login URLs etc which all redirect. Run a quick filter on this column and mass delete all the rubbish from the list.
2) Save This New 3XX List
You’ll need this list later to potentially match back the destination URL which is 3XX’ing to its originating source URL. This is what was left in my list after cleaning up, which we need to audit.
3) Now Audit Those Redirects
Follow the process outlined in the ‘How To Audit Redirects‘ guide by saving the ‘destination URLs’ into a new list & crawling until their final target URL using the ‘always follow redirects‘ option, to discover any broken links. The redirect chains report will provide a complete view of these!
4) Match The 4XX Errors Discovered Against Your Saved 3XX List Source URLs
So the ‘redirect chains’ report may contain 4XX errors you would have missed if you hadn’t audited the 3XX responses. For example, here are a couple more I discovered using this method –
The above contains a URL which 301’s to a 404 and another with a soft 404, a 302 to a 200 response. With this report you can match the ‘address’ URLs in ‘column A’ back to the ‘destination URLs’ and subsequent ‘source URLs’ from your saved 3XX list. Both of the above in this example come from the same blog post if you look at the images.
Hopefully the above process helps make broken link building more efficient. Please just let us know if you have any questions in the comments as usual.
Update
If you’re just looking to discover broken links on a single website, read our How To Find Broken Links Using The SEO Spider guide.
Great Tutorial thanks a lot Dan! I used this method but screaming frog takes a long time to export all out links. Usually it takes 3-4 hours with me using a windows 8.1 – 64bits / i7 / 8GB Ram /
So it would be a good ideia to try this at night and take the results on the next morning :)
Update: it takes a long time just for excel exported file. If you export in csv and then import to excel is faster :)
Hey SEO Martin,
Yup, got spot on the Excel export speed issue. This will be a lot faster in our next release, just made a couple of improvements… :-)
Cheers.
Dan
Hey thanks a lot for such a valuable piece of article but many of the websites are not getting crawled. What to do in that case ?
Hi John,
No problem. When you say, many websites are not getting crawled, I am not sure entirely what you mean. What are the response codes, what happens when the SEO Spider attempts to crawl them?
Are they simply ‘no responses’, or are they 5XX’s etc. If they are ‘no response’, it might mean that it’s a malformed URL or perhaps the http response time took too long (you’ll see a connection timeout message) which you can adjust in the configuration.
My advice would be to take a look and it’s usually pretty obvious why. More in our FAQ on –
Connection timeouts – https://www.screamingfrog.co.uk/seo-spider/faq/#39
Connection errors – https://www.screamingfrog.co.uk/seo-spider/faq/#18
Connection refused – https://www.screamingfrog.co.uk/seo-spider/faq/#17
403 responses – https://www.screamingfrog.co.uk/seo-spider/faq/#19
Hope that helps!
Dan
Hello,
I want to check the broken links in my site so i can fix them, how can I use screaming frog to do that?
Hi Alejandra,
You can just run a crawl of your website, then look under the ‘Response Codes’ tab and ‘4XX Client Error’ filter to see any ‘404’ broken links.
To view the ‘source’ URLs (where they are linked from) on your website, you can use the ‘in links’ tab at the bottom which populates the lower window pane.
Hope this helps.
Dan
And if I want to check broken outlinks from just one domain, I can do the regular spider (not list) and not limit the search depth to 1 right? As I understand search depth it will only crawl the outlinks from the homepage as that’s how it looked for me.
Hey Joe,
Exactly right! I actually just updated the post to mention this as well.
We wrote a guide recently on how to find broken links using the SEO Spider over here –
https://www.screamingfrog.co.uk/broken-link-checker/
Cheers.
Dan
Hello,
Your tool can crawl a website completely and then export external broken link ?
Thanks !
Hi Lee,
Yes, you can. We have a guide over here –
https://www.screamingfrog.co.uk/broken-link-checker/
Thanks,
Dan
This is Amazing!!, I’ve been looking for a tool like this for hours!!
Thank you Dan!
I can’t import list url from txt file
Hi Ninh,
Make sure you’re using http or https in your URLs when you upload – https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#15
Thanks,
Dan
OK, it’s work now by remove all link in included list.
HI
What is the suggested list of URLS to upload at one single time ?
Can i upload 10,000 Urls at a single time or more than that?
Should i be using proxies if i will be doing multiple such runs of 10,000 urls
Thanks
Hi Anuj,
10k URLs will be fine. How much you can crawl is dependent on how much memory you allocate – https://www.screamingfrog.co.uk/seo-spider/faq/#29
You won’t need to use proxies.
Thanks,
Dan
What kind of file should we be using as a list for the crawl? TXT,CSV,XLS?
Hi Chris,
A .txt or .csv file is fine – https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#15
Thanks,
Dan
Great Tutorial. This is the best tool!!, I’ve been looking for a tool like spider for 5 hours!!
Thanks
Hi,
I’ve loaded the 3k list and followed the above tips as well. Export has been done. Now is there any way to crawl the contact data of the sites? So, that we’d be able to mass contact all of them regarding the dead links instead of one by one. I hope you understood my point.
Thanks
Hi Waseem,
You can collect email addresses using the custom extraction feature, we have an example over here – https://www.screamingfrog.co.uk/web-scraping/
Cheers.
Dan
Hi SF,
Great, I’ve acknowledged very useful trick to scrape the email address from the page. But what if the user has not disclosed the email address openly like you did in Footer Contact Us? Instead he has only contact us “form” page in his website. So, I think pulling out the data from whois would be the only solution, is their any solution for that purpose also?
Thanks
Hi Waseem,
No solution for that I am afraid, the SEO Spider wasn’t really built for that purpose. I believe a tool called URL Profiler might do this though, you may wish to check out!
Cheers.
Dan
Great, thank you so much Dan.
Hey Dan,
Great post and broken link building is a great way to get good content. So my question is this: if you stumble on a broken link, do you copy the content or rewrite it? Which do you support?
Is there any videos for finding broken links with Screaming Frog :)
Great post, thank you.