How To Use Custom Search
Introduction To Custom Search
The SEO Spider allows you to find anything you want in the HTML or text of a website using its custom search feature.
This can be helpful when verifying analytics tags or discovering which pages have certain words or phrases, such as an old brand name, ‘out of stock’ or key phrases for internal linking opportunities.
You’re able to configure up to 100 search filters using custom search, which allow you to input text or regex and find pages that either ‘contain’ or ‘does not contain’ your chosen input and reports the number of occurrences.
This tutorial walks you through how to use the feature, common scenarios and more advanced searches.
1) Add Custom Search Filters
Click ‘Config > Custom > Search’ from the top-level menu to open the custom search configuration.
Then click ‘Add’ (in the bottom right) to set-up a custom search filter.
A custom search filter will appear. You’re able to add up to 100 separate filters in a crawl.
2) Input Your Search
Now enter your search in the ‘Enter Search Query’ box and adjust each search filter options.
From left to right, you can name the search filter, select ‘contains’ or ‘does not contain’, choose ‘text’ or ‘regex’, input your search query – and choose where the search is performed (HTML, page text, an element, or XPath and more).
The example above shows a search for ‘Out of stock’ across any page’s text and a search for any pages that do not contain a Google Tag Manager tracking code in the HTML head element of a page.
When the filters are set-up, you can click ‘OK’ and run a crawl to perform the search.
3) Crawl The Website
Type or copy in the website you wish to crawl in the ‘Enter URL to spider’ box and hit ‘Start’.
Wait until the crawl finishes and reaches 100%, or watch in real-time as the custom search tab filters populate.
4) View Data In The Custom Search Tab & Filters
Click on the Custom Search tab to view the results of your custom search in real-time. By default, data from all searches are shown together in the tab, but the filters can be used to refine the data to only show each separate filter.
The ‘contains’ filter will show the number of occurrences of the search, while a ‘does not contain’ search will either return ‘Contains’ or ‘Does Not Contain’.
In this search, there are 2 pages with ‘Out of stock’ text, each containing the word just once – while the GTM code was not found on any of the 10 pages.
These numbers can also be seen in the right-hand ‘Overview’ pane, which updates the filter counts in real-time.
Export custom search data by clicking the ‘export’ button, which works alongside the filters and your current view.
You can also export ‘inlinks’ (the source pages that link) to custom search filters via ‘Bulk Export > Custom Search > Filter X Inlinks’.
Advanced Search Filter Options
Custom search can be really powerful by combining filters together and adjusting the search filter configurations. In particular, using regex and choosing where the search is performed.
If you need to perform a case sensitive search, when searching for ‘text’ you can click on the arrows to the right side of the box to expand the text area and choose ‘case sensitive’.
‘Regex’ is case sensitive by default, to make it case insensitive use (?i) before the word. For example –
Would match against ‘optimisation’ and ‘OPTIMISATION’, or even ‘OpTiMiSaTiOn’.
Case sensitivity can be particularly useful when searching for misspellings of brand names, or acronyms etc.
Exact & Multiple Words
You can choose to search using regular text, or for more advanced uses you can switch to regex.
For example, using regular expressions you can match exact words using the following.
This would match a particular word (‘word’ in this case), as \b matches word boundaries.
This can be useful when searching for words or phrases that can be in other words, like ‘pr’, (which will appear in ‘promotion’, pre-rendering’ and more on our site!).
Without using word boundaries ‘pr’ is found 12 times on our digital PR page. With an exact, case sensitive match it’s actually 0.
You can also combine words together in a search. For example, if you wanted to find any pages with the words ‘natural’, ‘organic’ and ‘free’ you could combine words in a single filter using a pipe.
This will count every instance of each of the words, for example, our ‘search engine optimisation’ page has ‘organic’ 3 times and ‘natural’ and ‘free’ once, to make 5 in total.
You’re able to click on the heading to sort by occurrences as shown in the example.
You’re able to combine filters and view them together at the same time. So if you wanted to search for any page that contains a word, but does not contain another word – then use multiple filters and view together in the custom search tab.
In this example, you can see that there are no instances where the word ‘crawler’ and ‘best’ are not both used. This is appropriate!
You’re then able to refine exactly where the custom search is performed.
These 7 options available provide you with control of where you search –
- HTML – The full HTML of the web page.
- Page Text – The text of web pages, excluding any HTML.
- Page Text No Anchors – The text of web pages, excluding any HTML or any text contained within HTML anchor tags (also known as A Elements). This can be helpful when searching for words that are also included in link text within menus, which can cause every page to be flagged to contain the search otherwise.
- HTML Head – The HTML head of the web page.
- HTML Body – The HTML body of the web page, which can include both HTML and page text.
- XPath – You’re able to supply an XPath to specify the location in the HTML where the search is performed. For example, if you wanted to run the search only against text contained in h3 headings, you could supply //h3.
- Content Area – You can specify the content area used for word count, near duplicate content analysis and spelling and grammar checks – which can also be selected for custom search. By default this includes text contained within the body HTML element, excluding both the nav and footer elements to focus on the main content of the page. HTML elements, classes and IDs can be excluded and included, as per the content area guide.
Choosing where to search is often very powerful. A good example of this is finding where we misspell ‘Screaming Frog’ as ‘Screaming frog’, without a capital ‘F’ on our own website.
Running a case sensitive search with ‘Page Text’ brings back 7 occurrences on our broken links blog post.
However, when checking the page the misspellings are within the ‘comments’ section of the blog post, rather than in the main blog body.
To exclude this comments section from the custom search, you can right-click in a browser and ‘view source’ of the HTML and search for the appropriate ‘comments’ section in the HTML.
This shows an HTML ID of ‘comments’, which can be used for exclusion.
The ‘comments’ ID can then be excluded in ‘Content Area’ under ‘Configuration > Content > Area’.
The comments section then won’t be analysed for custom search, and we can see that re-running the search this shows there are 0 occurrences on this page.
You’re able to expand your custom search to be multiple lines in the HTML. This means it can be used to find full code in HTML, such as Google Analytics tracking codes (other analytics platforms are available).
Click on the arrows to the right side of the search query box to expand the text area and you can input an entire GTM container snippet for example.
This means you don’t need to compromise searches to smaller singular lines or words of a tracking tag, you can verify the whole snippet.
Analyse With Crawl Data
Custom search filter data is auto appended to the ‘Internal’ tab which combines all internal data in a crawl.
So you can match the custom searches against other crawl data for more insight.
Finally, it’s worth reiterating that custom search doesn’t ‘scrape’ or extract data, it only searches.
To extract content, you’ll need to use custom extraction instead.
The guide above should illustrate how to use the SEO Spider to find words, phrases, tracking tags or any snippets of text across pages on your website.
If you have any further queries, feedback or suggestions to improve custom search in the SEO Spider then just get in touch with our team via support.