How To Audit Backlinks In The SEO Spider
There’s plenty of reasons you may wish to audit backlinks to a website, whether it’s to check the links are still live and passing link value, they’ve been removed or nofollowed after a link clean up, or you want to get more data on the links which Google Search Console doesn’t provide.
We previously wrote a guide back in 2011 on using the custom search feature to audit backlinks at scale, by simply searching for the presence of the link within the HTML of the website.
Since that time, we’ve released our custom extraction feature, which allows this process to be refined further and provide more detail. Rather than just check for the link, you can now collect all links, anchor text, and analyse whether they are passing value or have been blocked by robots.txt, or nofollowed, via link attribute, meta or X-Robots-tag.
Please note, the custom extraction feature outlined below is only available to licenced users. The steps to auditing links using custom extraction are as follows.
1) Configure XPath Custom Extraction
Open up the SEO Spider. In the top level menu, click on ‘Configuration > Custom > Extraction’ and input the following XPath below, but replace ‘screamingfrog.co.uk’ with the domain you’re auditing.
The XPath in text form for ease of copying and pasting into the custom extractors –
This XPath will collect every link, anchor text and link attribute to screamingfrog.co.uk from the backlinks to be audited. So if there are multiple links to the domain you’re auditing from a backlink, all of data will be collected.
2) Switch To List Mode
Next, change the mode to ‘list’, by clicking on ‘Mode > List’ from the top level menu.
This will allow you to upload your backlinks into the SEO Spider, but don’t do this just yet!
3) View URLs Blocked By Robots.txt
When switching to list mode, the SEO Spider assumes you want to crawl every URL in the list and automatically applies the ‘ignore robots.txt’ configuration. However in this scenario, you may find it useful to know if a URL is blocked by robots.txt when auditing backlinks. If so, navigate to ‘Configuration > Spider > Basic tab’, un-tick ‘Ignore robots.txt’ and tick ‘Show Internal URLs Blocked by robots.txt’.
You’ll then be able to view exactly which URLs are blocked by robots.txt in the crawl.
4) Upload Your Backlinks
When you have gathered the list of backlinks you wish to audit, upload them by clicking on the ‘Upload List’ button and choosing your preference for uploading the URLs. The ‘Paste’ functionality makes this super quick.
Please note, you must upload the absolute URL including protocol (http:// or https://) in list mode.
5) Start The Crawl
When you’ve uploaded the URLs, the SEO Spider will show a reading file dialog box and confirm the number of URLs found. Next click ‘OK’, and the crawl will start immediately.
You’ll then start to see the backlinks being crawled in real-time in the ‘Internal’ tab.
6) Review The Extracted Data
View the ‘Custom’ tab and ‘Extraction’ filter to see the data extracted from the backlinks audit.
You can drag and drop the columns to your preference. If there are multiple links to the domain you’ve audited from some external sites, then you will see multiple columns for link, anchor and attribute which are numbered, so they can be matched up.
This view shows whether the URLs exist, their status (including whether they are blocked by robots.txt), the link, anchor and the link attribute. It doesn’t show whether the URL has a ‘nofollow’ from a meta tag or HTTP Header, which can be seen under the ‘Internal’ tab, where all the custom extraction data is automatically appended for ease as well.
The SEO Spider wasn’t built with this purpose in mind, so it’s not a perfect solution, however the SEO Spider gets used in so many different ways and this should make the process of checking backlinks a little less tedious.