Download | User Guide | FAQ | Support | Terms | Purchase General | Configuration | Tabs
Check Images – Untick this box if you do not want to crawl images. (Please note, we check the link but don’t crawl the content).
Check CSS - Untick this box if you do not want to crawl CSS.
Check JavaScript – Untick this box if you do not want to crawl JavaScript.
Check External Links – Untick this box if you do not want to crawl any external links.
Follow Internal or External ‘nofollow’ - By default the spider will not crawl internal or external links with the ‘nofollow’ attribute or external links from pages with the meta nofollow tag. If you would like the spider to crawl these, simply tick the relevant option.
Ignore robots.txt – By default the spider will obey robots.txt protocol. The spider will not be able to crawl a site if its disallowed via robots.txt. However, this option allows you to ignore this protocol which is down to the responsiblity of the user. This option actually means the SEO spider will not even download the robots.txt file. So it also means ALL robots directives will be ignored.
Limit Search Total – The free version of the software will crawl 500 URI. If you have a licensed version of the tool this will be removed, but you can include any number here for greater control over the number of pages you wish to crawl.
Limit Search Depth – You can choose how deep the spider crawls a site (in terms of links away from your chosen start point).
This feature allows you to control which URL path the SEO spider will crawl using regex. It narrows the default search by only crawling the URLs that match the regex which is particularly useful for larger sites, or sites with less intuitive URL structures. The page that you start the crawl from must have an outbound link which matches the regex for this feature to work. (Obviously if there is not a URL which matches the regex from the start page, the SEO spider will not crawl anything!).
This allows you to list files and paths to exclude from crawling. This feature used to be robots.txt syntax but has now switched to regex for greater control and more balance from version 1.80. For example -
http://www.example.com/do-not-crawl-this-page.html
http://www.example.com/do-not-crawl-this-folder/.*
This feature allows you to control the speed of the spider, either by number of concurrent threads or by URLs requested per second. Increasing the number of threads allows you to significantly increase the speed of the SEO spider, so please use responsibly.
The user-agent switcher has inbuilt preset user agents for Googlebot, Bingbot, Yahoo! Slurp, various browsers and more. This feature also has a custom user-agent setting which allows you to specify your own user agent.
The spider allows you to find anything you want in the source code of a website. The custom query string search feature will check the source code of every page you decide to crawl for what it is you wish to find. There are five filters in total under the ‘custom’ configuration menu which allow you to input your query and find pages that either ‘contain’ or ‘does not contain’ your chosen input. You cannot ‘scrape’ or extract data from html elements using this feature at the moment.
The pages that either do or do not contain these can be found in the ‘custom’ heading tab and using the relevant filter number which match those in your configuration. For example, you may wish to choose ‘contains’ for pages like ‘Out of Stock’ as you wish to find any pages which have this on. When searching for something like Google Analytics code, it would make more sense to choose the ‘does not contain’ filter to find pages that do not include the code (rather than just list all those that do!). For example -

In this example above, any pages with ‘out of stock’ on them would appear in the custom tab under filter 1. Any pages which the spider could not find the Analytics UA number on would be listed under filter 2.
Please remember – the custom search checks the html source code of a website which might not be the text that is rendered in your browser. Hence, please ensure you are searching for the correct query from the source code.
This feature allows you to use a proxy with the SEO spider by specifying the address and port.
The default ‘mode’ is spider. Simply enter the URL of your choice and click ‘start’ to crawl the website. Alternatively switch to ‘list’ mode to upload a list of URLs to the spider or crawl a .xml file. Simply click ‘select file’ and browse to your file which contains the list of URLs or .xml file to upload. Please remember to choose the correct file type when you upload – a .txt, .csv, .xml or unicode text (.csv) file.
The only requirement is that links are proper hyperlinks (excluding for .xml files) including the ‘http://’. If you upload a list of URLs without the ‘http://’, the spider will not find them. No other formatting is required, the spider will find any URLs regardless of other text contained in the file, or which columns they are in or how they are spaced. For example, you can directly upload a Adwords download and all URLs will be found automatically.
Some of these are very amusing - http://t.co/ZCBdauy7 - 8 mins ago
9 Lessons from 1,000 SEO Questions - http://t.co/8ua03Rqe - 38 mins ago