You can now automatically verify search engine bots, either when uploading a log file or retrospectively after you have uploaded log files to a project.
When uploading logs, you’ll be given the opportunity to tick the ‘verify bots’ option.
If you have already imported log files, or would like to verify search engine bots retrospectively, then you can do so under the ‘Project > Verify Bots’ menu.
Search engine bots are often spoofed by other bots or crawlers, including our own SEO Spider software when emulating requests from specific search engine user-agents. Hence, when analysing logs, it’s important to know which events are genuine, and those that can be discounted.
The Log File Analyser will verify all major search engine bots according to their individual guidelines. For example, for Googlebot verification, the Log File Analyser will perform a reverse DNS lookup, verify the matching domain name and then run a forward DNS using the host command to verify it’s the same original requesting IP.
After validation, you can use the ‘verification status’ filter, to view log events that are verified, spoofed or if there are any errors in verification.
If you find all events being marked as Spoofed there are a few things to check:
- Is the Remote Host being read? Check the Remote Host value associated with the Events marked as spoofed. To do this click on one of the Events and look at the Remote Host value in the lower window pane. Remote Host is not mandatory, so if this was not available in the imported log file, it won’t be possible to verify the Event.
- The Remote Host has correct looking values: If the Remote Host values are all from a single, or small selection of IPs (Head over to the IP tab to see Unique IPs) then it’s likely these are from a load balancer. You’ll need to have the log format adjusted by the site administrator/hosting provider to include the real IP address. Before doing this you could double check that the real IP is not already in the log file. To do this open up the log file in a text editor and inspect a few of the lines, is there more than 1 IP address on each line? If so please send the first few lines of the log or the Log File Analyser debug logs (Help->Debug->Save Logs) to our support team so we can make sure this isn’t a parsing issue.
- Verify Manually: For Googlebot the Log File Analyser verifies as Google recommends try this yourself, if you get different results please let us know. If not, go ahead and request the real IP is added to the log.
Similar to the date range, you can switch user-agent using the drop down filter in the top right of the application. For an ‘All Bots’ project, the Log File Analyser will by default import data for the following search bot user-agents –
- All Googlebots – This includes Googlebot, Googlebot Smartphone and Googlebot Mobile.
- Googlebot Mobile
- Googlebot Smartphone
We plan on making this configurable soon. Switching the user-agent will update the data for all tabs, not just the tab you’re on.
In the top right hand side of the application, you can change the date range of your view across the project. There are 3 preset date ranges, the last day, the last 7 days or last 30 days, as well as an option for a custom date range.
You can also skip backwards and forwards with dates using the arrows at the side. This will update the date range for all tabs, not just the tab you’re on.
You’re able to view the log file import history of a project by clicking on ‘Project > Import History’ via the top level menu.
This allows you to view the first and last events from the log files, as well as the import date, number of events contained within the log file, site URL provided, log file format and the file name.
By clicking on the individual import rows, you can also delete import history, if you accidentally import incorrect logs.