If you need more, than just a checking a single URL or whole web site, you can use advanced settings for configuring Atomseo crawler.
You can include and exclude urls, specific folders, files etc. You can include and exclude urls, specific folders, files etc.
As well, you can switch between user agents and set up the speed of crawling. Also, it’s possible to exclude external links from crawling.
How to Use It
Option ‘Include’
Allows to specify URLs to check. For example, if you need to check only specified folder, or subfolder, or page types, or images.
If you need to check all pages under the folder, for example http://www.example.com/info/*
It will check only pages starting with http://www.example.com/info/, e.g. http://www.example.com/info/page1, http://www.example.com/info/page2
If you need to check every page under subfolder, you can use http://www.example.com/*/subfolder/*
In this case, it provides the result for every page under /subfolder/, disregarding the mother folder, it can be any.
If you need to check every page under subfolder, you can use http://www.example.com/folder1/subfolder/page1 http://www.example.com/folder1/subfolder/page2
and
http://www.example.com/folder2/subfolder/page1 http://www.example.com/folder2/subfolder/page2
If you want to include for search only specific type of files, you can use
http://www.example.com/*jpg — searches only for .jpg files http://www.example.com/*pfd — searches only for .pdf files
If you want to crawl pages with specific parameters, e.g. ?utm, ?price, use
http://www.example.com/?utm=* or http://www.example.com/?price=*
Option ‘Exclude’
If you need to check the whole web site, but to need to exclude certain pages and/or save your crawling budget, you can configure your search.
For example, if you do not need to check only one page, please add the full address
http://www.example.com/url-not-for-scanning
If you need to avoid folders or subfolders, just use
http://www.example.com/login/*
So, it will pass all the pages under folder /login/
You can also use any of examples stated above, at the Include Section.
Important! You can use any number of excluding and including rules — just add them line by line at the proper section. And don’t forget to do this correctly, otherwise you will get the incorrect result. If you have any doubts, write us at info@atomseo.com
Option ‘User-Agent’
By default, ATOMSEO use this User-Agent: Mozilla/5.0 (compatible; Atomseobot/2.0; +http://https://error404.atomseo.com/)
However, it has inbuilt preset user agents for various browsers. This allows you to switch between them quickly when required.
Custom HTTP Headers
You can use any of custom headers [HttpHeaderName]: [HttpHeaderValue]:
This may be useful for passing anti-spam or access hidden pages.
You can put any number of custom HTTP, for example: