How To Audit Backlinks In The SEO Spider

Dan Sharp

Posted 7 September, 2016 by in Screaming Frog SEO Spider

How To Audit Backlinks In The SEO Spider

There are plenty of reasons you may wish to audit backlinks to a website, whether it’s to check the links are still live and passing link value, they’ve been removed or nofollowed after a link clean up, or you want to get more data on the links which Google Search Console doesn’t provide.

We previously wrote a guide back in 2011 on using the custom search feature to audit backlinks at scale, by simply searching for the presence of the link within the HTML of the website.

Since that time, we’ve released our custom extraction feature, which allows this process to be refined further and provide more detail. Rather than just check for the link, you can now collect all links, anchor text, and analyse whether they are passing value or have been blocked by robots.txt, or nofollowed, via link attribute, meta or X-Robots-tag.

Please note, the custom extraction feature outlined below is only available to licensed users. The steps to auditing links using custom extraction are as follows.

1) Configure XPath Custom Extraction

Open up the SEO Spider. In the top level menu, click on ‘Configuration > Custom > Extraction’ and input the following XPath below, but replace ‘screamingfrog.co.uk’ with the domain you’re auditing.

xpath for backlink auditing

The XPath in text form for ease of copying and pasting into the custom extractors –

//a[contains(@href, 'screamingfrog.co.uk')]/@href

//a[contains(@href, 'screamingfrog.co.uk')]

//a[contains(@href, 'screamingfrog.co.uk')]/@rel

This XPath will collect every link, anchor text and link attribute to screamingfrog.co.uk from the backlinks to be audited. So if there are multiple links to the domain you’re auditing from a backlink, all of the data will be collected.

2) Switch To List Mode

Next, change the mode to ‘list’, by clicking on ‘Mode > List’ from the top level menu.

list mode for backlink audit

This will allow you to upload your backlinks into the SEO Spider, but don’t do this just yet!

3) View URLs Blocked By Robots.txt

When switching to list mode, the SEO Spider assumes you want to crawl every URL in the list and automatically applies the ‘ignore robots.txt’ configuration. However in this scenario, you may find it useful to know if a URL is blocked by robots.txt when auditing backlinks. If so, navigate to ‘Configuration > Spider > Basic tab’, un-tick ‘Ignore robots.txt’ and tick ‘Show Internal URLs Blocked by robots.txt’.

Show URLs blocked by robots.txt

You’ll then be able to view exactly which URLs are blocked by robots.txt in the crawl.

4) Upload Your Backlinks

When you have gathered the list of backlinks you wish to audit, upload them by clicking on the ‘Upload List’ button and choosing your preference for uploading the URLs. The ‘Paste’ functionality makes this super quick.

Upload Backlinks to audit

Please note, you must upload the absolute URL including protocol (http:// or https://) in list mode.

5) Start The Crawl

When you’ve uploaded the URLs, the SEO Spider will show a reading file dialog box and confirm the number of URLs found. Next click ‘OK’, and the crawl will start immediately.

backlinks uploaded to audit

You’ll then start to see the backlinks being crawled in real-time in the ‘Internal’ tab.

6) Review The Extracted Data

View the ‘Custom’ tab and ‘Extraction’ filter to see the data extracted from the backlinks audit.

backlinks audited

You can drag and drop the columns to your preference. If there are multiple links to the domain you’ve audited from some external sites, then you will see multiple columns for link, anchor and attribute which are numbered, so they can be matched up.

This view shows whether the URLs exist, their status (including whether they are blocked by robots.txt), the link, anchor and the link attribute. It doesn’t show whether the URL has a ‘nofollow’ from a meta tag or HTTP Header, which can be seen under the ‘Internal’ tab, where all the custom extraction data is automatically appended for ease as well.

The SEO Spider wasn’t built with this purpose in mind, so it’s not a perfect solution, however, the SEO Spider gets used in so many different ways and this should make the process of checking backlinks a little less tedious.

Dan Sharp is founder & Director of Screaming Frog. He has developed search strategies for a variety of clients from international brands to small and medium-sized businesses and designed and managed the build of the innovative SEO Spider software.

44 Comments

  • Gerald Buckley 8 years ago

    Absolutely love this. While there may be other tools you’d deem “less tedious” this is precisely the method we were looking for with one notable API integration I think has made it to the product roadmap. We’ll see :) In any event, while there are other methods out there which will poll for the presence of a link, etc. The raw power the custom extractor represents and the utility of it dwarfs the other methods IMHO. Keep on keeping on guys. You’re doing fabulously well!

    Reply
    • screamingfrog 8 years ago

      Hey Gerald,

      Thanks for the kind comments and very cool to hear you’re enjoying the power of the extractors :-)

      Cheers.

      Dan

      Reply
  • April 8 years ago

    I’m attempting to follow these steps but I am never seeing any results from step 1. There is nothing in my Custom tab, extraction filter. Any assistance is appreciated.

    Reply
  • izhak agam 8 years ago

    Very nice, a significant improvement in this update.

    Reply
  • Dennis 8 years ago

    Well,

    This is a really good functionality of the tool. Thanks for the write up.

    I have a question, or rather, a request for updated versions of SFSS.
    When you crawl a website and click an URL you get the different tabs with information. Like inlinks and outlinks. Can those rows have an additional column in front, with the row number? This makes it easy to see the number of in- and outlinks :)

    Best,

    Dennis

    Reply
    • screamingfrog 8 years ago

      Hey Dennis,

      Apologies for the delayed reply. I see where you’re coming from, thanks for the suggestion!

      Cheers.

      Dan

      Reply
      • Dennis 8 years ago

        Hi Dan,

        Thanks for replying.
        If this suggestion is possible to built in, great. Otherwise, still love the tool :)

        Cheers

        Reply
  • Moumita Ghosh 8 years ago

    Hello,

    Excellent article indeed. From next time I will use this tool for auditing backlinks. Thanks so much sharing this outstanding article.

    Thanks,

    Moumita Ghosh

    Reply
  • Alisha 8 years ago

    Thanks for sharing this information. Its really helpful for people like me :)

    Reply
  • Shifa Chottani 8 years ago

    Screaming frog at its best.. This feature has now impressed me. Would surely try out your premium version now..

    Reply
  • dimitris 8 years ago

    That’s an awesome feature and a really well written and detailed guide to boot. Well done guys, colour me impressed!

    Reply
  • John 8 years ago

    Thanks Dan these days we need to make sure our backlink profile is as good as it can be.

    Reply
  • SEODominator 8 years ago

    Although I found the guide really useful, I think that steps two and three should be consolidated in a single step. In my opinion, the way the guide is currently structured, makes the whole process seem a bit more complicated than it is and that may put some users off.

    Reply
  • shane 8 years ago

    hi, i followed these steps and everything is fine, but when i check with other tools i can find the nofollow links but, screaming frog doesnt dispaly any nofollow links… is there any solution for that ?

    Reply
  • Pavel 8 years ago

    hi.
    1. How to get all internal links from list of pages.
    Example:
    10.000 categories
    Every have 1000 subcats
    How to get all subcats? Program can’t parse this number of pages.
    Good idea is get all links from this 10K pages.

    Custom extraction not work:Error, too big size of cell, over 32K.
    2. How to get list of uncrowled links (remaining links) -please can you add an export of this data?

    Reply
    • screamingfrog 8 years ago

      Hi Pavel,

      Thanks for the comment.

      1) You don’t need to use custom extraction to get data for internal links, it will do that by default anyway. Have a look at some of the ‘bulk export’ options, like all inlinks / outlinks etc. Any specific queries, you can pop through to support here – https://www.screamingfrog.co.uk/seo-spider/support/

      2) That’s a good idea, and one we are considering to introduce in an update. Thanks for the feedback.

      Cheers.

      Dan

      Reply
  • Riddhi Sharma 8 years ago

    Is there any trial version available?
    If yes, kindly share me the details.

    Reply
  • Mitch 8 years ago

    This is such a great feature. Screaming Frog makes it so easy for me to analyze my site, my clients sites, and competitors sites. I’ll keep my eyes open for any and all updates!

    Reply
  • Ober 8 years ago

    Java crashes on crawling 2Mln+ urls. Tested on extremely fast server PC. :(
    Looks it need some optimization in algo. Or maybe change engine.

    Its unnormaly when text info of few millions pages cannot be saved on terrabytes disks. :) Please help.

    Reply
    • screamingfrog 8 years ago

      Hi Ober,

      This isn’t a surprise, the SEO Spider is optimal for crawling sub 500k URLs really.

      Hence my advice is to upload the URLs in batches!

      Cheers.

      Dan

      Reply
  • perfectseonism 8 years ago

    This is actually a great tool, especially for people who have pretty old domains and some of their backlinks have been removed or are no follow. In general it’s useful to have it for regular backlinks check and monetization, The tool is user friendly and the guide is well written and pretty explanatory. Good job.

    Reply
  • Roei 7 years ago

    Just love how easy it is to use the spider to check for links. I used to get things like that all the time, you start to lose your position in Google, and soon enough when you check the links you find that hundreds of links have been deleted.
    Thanks for the article :)

    Reply
  • SEOgirl 7 years ago

    A day without using ScreamingFrog is a day lost! Thank you for guidance on using extractions, awesome feature.

    Reply
  • Bartek 7 years ago

    Thanks, a lot for this instructions. I just downloaded screaming frog and this article help me to do first backlinks analysis.

    Reply
  • Emmerey Rose 7 years ago

    Awesome tutorial! Very helpful. I was wondering, does the tool have free trial version? Would love to try it out.

    Reply
  • Yagle Mallorca 7 years ago

    Thanks!. Awesome tutorial! Very helpful.

    Reply
  • dialashop 7 years ago

    Very interesting article. I find it hard to manually check backlinks. Tried Seo Spider and it seems very useful. Cheers.

    Reply
  • Moe 7 years ago

    Thanks for the awesome tool so I can audit my backlink profile, I have 6 different files to look through and your audit tool will help save on time, and it’s free, yes i love free, keep up the good work.

    Reply
  • Eric Arellano 6 years ago

    Screaming frog is a very helpful tool.
    As a SEO I use this software almost 80% of my time when I need to check URLs.
    One of the best options.

    Reply
  • Patrizio Racco 6 years ago

    Hi,
    Is it possible to somehow evaluate the quality of the backlinks through Screamingfrog?
    In other words, is it possible to integrate with Majestic or Moz?
    Regards,
    Pat

    Reply
  • James Riley 5 years ago

    Wow am I late to this party (years late in fact)!

    I love this post and it really got me thinking, is there anything that can be done to make the XPath a bit more accurate?

    The problem with XPath like:

    //a[contains(@href, ‘screamingfrog.co.uk’)]/@href

    … is that it would match to backlinks like this one:

    [http/https]://mycoolalexascraper/screamingfrog.co.uk/results

    You know those spammy scraper sites that just link to everyone!? And they have your domain within their own URL structure, as they list your ‘results’ for your website. As such, using ‘contains’ is a bit dangerous

    Really the “screamingfrog.co.uk” part has to be very near the beginning of the href attribute value

    You could do something like this:

    //a[starts-with(@href, ‘https://screamingfrog.co.uk’)]/@href | //a[starts-with(@href, ‘http://screamingfrog.co.uk’)]/@href | //a[starts-with(@href, ‘https://www.screamingfrog.co.uk’)]/@href | //a[starts-with(@href, ‘http://www.screamingfrog.co.uk’)]/@href

    That covers the four main possible URL permutations. Your link can start with HTTP/domain, HTTPS/domain, HTTP/WWW/domain or HTTPS/WWW/domain. Just because you have one canonical structure, that doesn’t mean that spam-hat webmasters will use it when they link to you…

    The pipes make or statements (sort of, I think it’s kind of a node separator) which allow one single line of XPath to match multiple patterns which is quite handy, but I’m also pretty sure it could be written in a much more elegant way

    What do you think?

    Reply
  • inventateq 4 years ago

    Is this backlinks feature available only in paid version? cant i get sample backlinks report in free version before i purchase the versions?

    Reply
    • screamingfrog 4 years ago

      The SEO Spider doesn’t crawl the whole web to provide backlink data.

      You’ll need to use a backlink analyser to have your list of backlinks (Ahrefs, Moz Link Explorer, Majestic).

      Thanks,

      Dan

      Reply
    • Organik Monster 4 years ago

      I have used many tools so far, nobody has the features like screaming frog that too in a single license/ No monthly fee. But if you add the backlink audit tool such moz, ahref, you will be the king of all SEO tools.

      Reply
  • Alf Stegmeyer 4 years ago

    Screamingfrog has saved my ass.

    Reply
  • richard mclachlan 4 years ago

    Does not work, if you are check links, that go to a page with a permanent 301. So if you have changed your url structure…. shows the link as being missing.

    Reply
  • Dan Auerbach 3 years ago

    Indispensable tool for our recent site move. Tried several others. None as flexible especially these features. Gets better with every version. Love it!!

    Reply
  • Allison 2 years ago

    Hey, SF the ignore robots.txt is no longer in Config > Spider > Basic Although there is no Basic tab so what is the alternative in the latest version?

    Reply
    • screamingfrog 2 years ago

      Hi Allison,

      It’s under ‘Config > Robots.txt > Settings’ now :-)

      Cheers,

      Dan

      Reply

Leave A Comment.

Back to top