Known Bot Detection

Known bot detection is the first step. It compares the user agent (UA) keywords carried in user requests with the UA signature database in bot protection. If a request is from a known bot (known client), the request will be handled based on the configured protective action.

Based on the open-source UA signature intelligence on the Internet and the UA signature library of WAF for anti-crawler protection, WAF can detect 10 types of known bots.

Type	Description
Search engine bots	Search engines use web crawlers to aggregate and index online content (such as web pages, images, and other file types). They provide search results in real time.
Online scanners	An online scanner typically scans assets on the Internet for viruses or vulnerabilities that are caused by configuration errors or programming defects and exploits such weak points to launch attacks. Typical scanners include Nmap, sqlmap, and WPSec.
Web crawlers	Popular crawler tools or services on the Internet. They are often used to capture any web page and extract content to meet user requirements. Scrapy, Pyspider, and Prerender are typical ones.
Website development and monitoring bots	Some companies use robots to provide services and help web developers monitor status of their sites. These bots can check the availability of links and domain names, connections and web page loading time for requests from different geographical locations, DNS resolution issues, and other functions.
Business analysis and marketing bots	A company offering business analysis and marketing services utilizes bots to evaluate website content, conduct audience and competitor analysis, support online advertising and marketing campaigns, and optimize website or web page rankings in search engine results.
News and social media bots	News and social media platforms allow users to browse hot news, share ideas, and interact with each other online. Many enterprises' marketing strategies include operating pages on these websites and interacting with consumers about products or services. Some companies use robots to collect data from these platforms for insights into media trends and products, enriching network experience.
Screenshot bots	Some companies use bots to provide website screenshot services. It can take complete long-screen screenshots of online content such as posts on websites and social networks, news, and posts on forums and blogs.
Academic and research bots	Some universities and companies use bots to collect data from various websites for academic or research purposes, including reference search, semantic analysis, and specific types of search engines.
RSS feed reader	RSS uses the standard XML web feed format to publish content. Some Internet services use bots to aggregate information from RSS feeds.
Online archiver	Some organizations such as Wikipedia use bots to periodically crawl and archive valuable online information and content copies. These web archiving services are very similar to search engines, but the data provided is not up-to-date. They are mainly used for research.