Configuring Bot Protection Rules to Defend Against Bot Behavior

There are four bot protection checks: known bot detection, signature-based request detection, bot behavior detection, and proactive feature detection. With such layered bot detection, WAF can accurately identify and manage bot behavior in website traffic, effectively reducing risks such as data leakage and performance deterioration caused by bot attacks.

To enable this function, submit a service ticket.

Function

WAF bot protection provides the following functions.

If you enable bot protection, WAF protects all URLs of the protected domain name by default. You can specify protected objects for bot protection rules if you want WAF to protect specific service scenarios, such as login and registration.

The following table lists the conditions that can be used to specify protected objects for bot protection rules.

**Table 1** Condition list
Field	Field Description	Subfield	Logic	Content
Path	The path of a resource requested by the client. A path is part of a URL.	--	The following logical relationships are supported: Include, Exclude, Equal to, Not equal to, Prefix is, Prefix is not, Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, and Prefix is not any value. NOTE: If the logical relationship is Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, or Prefix is not any value, you can select an existing reference table for Content. For details about how to add and manage a reference table, see Creating a Reference Table to Configure Protection Indicators in Batches.	Enter the path to be protected. Configuration description: The path does not contain a domain name and supports only exact match. So, the path to be protected must be the same as the path you configure. If the path to be protected is /admin, set Path to /admin. If Path is set to /, all paths of the website are protected. The path content cannot contain the following special characters: (<>*)
Method	The request method.	--	The following logical relationships are supported: Equal to and Not equal to.	Enter the request method, for example, GET, POST, PUT, DELETE, or PATCH.
Cookie	The cookie in the request.	Custom subfield. Length: 1 to 2,048 characters.	The following logical relationships are supported: Include, Exclude, Equal to, Not equal to, Prefix is, Prefix is not, Suffix is, Suffix is not, Has, Does not have, Equal to any value, Not equal to any value, and Exclude any value. NOTE: If the logical operator is Equal to any value, Not equal to any value, or Exclude any value, you can select an existing reference table for Content. For details about how to add and manage a reference table, see Creating a Reference Table to Configure Protection Indicators in Batches.	Enter the cookie value of the request, for example, jsessionid.
Header	The request header content.			Enter the request header content, for example, text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8.
Params	The query parameter in the URL. The query parameter is the content following the question mark (?).			Enter the query parameter, for example, 201901150929.
Referer	The source of the access request.	--		Enter the request access source. For example, if the protected path is /admin/xxx and you do not want visitors to be able to access the page from www.test.com, set Content for Referer to http://www.test.com.

Known bot detection is the first step. It compares the user agent (UA) keywords carried in user requests with the UA signature database in bot protection. If a request is from a known bot (known client), the request will be handled based on the configured protective action.

Based on the open-source UA signature intelligence on the Internet and the UA signature library of WAF for anti-crawler protection, WAF can detect 10 types of known bots.

Type	Description
Search engine bots	Search engines use web crawlers to aggregate and index online content (such as web pages, images, and other file types). They provide search results in real time.
Online scanners	An online scanner typically scans assets on the Internet for viruses or vulnerabilities that are caused by configuration errors or programming defects and exploits such weak points to launch attacks. Typical scanners include Nmap, sqlmap, and WPSec.
Web crawlers	Popular crawler tools or services on the Internet. They are often used to capture any web page and extract content to meet user requirements. Scrapy, Pyspider, and Prerender are typical ones.
Website development and monitoring bots	Some companies use robots to provide services and help web developers monitor status of their sites. These bots can check the availability of links and domain names, connections and web page loading time for requests from different geographical locations, DNS resolution issues, and other functions.
Business analysis and marketing bots	A company offering business analysis and marketing services utilizes bots to evaluate website content, conduct audience and competitor analysis, support online advertising and marketing campaigns, and optimize website or web page rankings in search engine results.
News and social media bots	News and social media platforms allow users to browse hot news, share ideas, and interact with each other online. Many enterprises' marketing strategies include operating pages on these websites and interacting with consumers about products or services. Some companies use robots to collect data from these platforms for insights into media trends and products, enriching network experience.
Screenshot bots	Some companies use bots to provide website screenshot services. It can take complete long-screen screenshots of online content such as posts on websites and social networks, news, and posts on forums and blogs.
Academic and research bots	Some universities and companies use bots to collect data from various websites for academic or research purposes, including reference search, semantic analysis, and specific types of search engines.
RSS feed reader	RSS uses the standard XML web feed format to publish content. Some Internet services use bots to aggregate information from RSS feeds.
Online archiver	Some organizations such as Wikipedia use bots to periodically crawl and archive valuable online information and content copies. These web archiving services are very similar to search engines, but the data provided is not up-to-date. They are mainly used for research.

Signature-based request detection is the second step. This approach identifies the HTTP request header features in user requests, matches mainstream development frameworks and HTTP libraries, stimulates known bots, and uses automated programs to detect bots. If a request matches a bot signature, the request will be handled based on the configured protective action.

Type	Description
Abnormal request header	A request header that does not contain User Agent or whose User Agent is empty is abnormal.
Impersonators of known bots	If this function is enabled, the system checks whether the source IP address of a known bot request is its valid client IP address to prevent spoofing.
Development frameworks and HTTP libraries	A mainstream development framework and HTTP library have the following features: aiohttp, Apache-HttpClient, Apache-HttpAsyncClient, Commons-HttpClient, HttpComponents, PhantomJS, CakePHP, curl, Jetty, wget, http-kit, python-requests, Ruby, WebClient, WinHttpRequest, HttpUrlConnection, OxfordCloudService, http_request2, PEAR HTTPRequest, Python-urllib, RestSharp, Mojolicious (Perl), PHP, libwww-perl, okhttp, HTMLParser, Go-http-client, axios, Dispatch, LibVLC, node-superagent, curb, Needle, IPWorks, lwp-trivial, Custom-AsyncHttpClient, Convertify, AsyncHttpClient, Embed PHP Library, Apache Synapse, node-fetch, electron-fetch, asynchttp, Dolphin http client, EventMachine HttpClient, httpunit, Zend_Http_Client, Python-httplib2, spray-can, http_requester, AndroidDownloadManager, bluefish, Java, git, and Prerender.cloud
Automation program	The service can detect automation programs with crawler behavior characteristics but unclear purposes.

Bot behavior detection is the third step. WAF uses an AI protection engine to analyze and automatically learn requests, and then handles the attack behavior based on the configured behavior detection score and protective action.

You can set three score ranges for bot behavior detection. Score range: 0 to 100. A score closer to 0 indicates that the request feature is more like a normal request, and a score closer to 100 indicates that the request feature is more like a bot.

Currently, proactive feature detection can check traffic of websites that are connected to WAF in cloud CNAME access mode or dedicated mode. Currently, proactive feature detection is not supported in cloud load balance access mode.
Currently, proactive feature detection supports only web browser services. Before enabling this function, ensure that the protected object is a browser client. Alternatively, configure matching conditions to ensure that the resources can be accessed only by web browsers. Otherwise, mobile app access may be affected.
Currently, proactive feature detection supports only origin server HTML pages that are encoded using UTF-8. To keep normal access to origin servers unaffected, ensure that UTF-8 is used for encoding HTML packets and Content-Type is HTML for origin servers before enabling this function.

Proactive feature detection is the fourth phase during bot detection. The system injects JavaScript code into HTML responses to monitor and verify the client browser's runtime environment, including keyboard and mouse interaction behaviors. In this way, the system can identify requests from tools and normal requests, and handle bot attacks based on the interaction confidence and protective action.

Interaction confidence indicates the frequency of interactions (such as keyboard and mouse operations) generated by a client within a period of time. A lower confidence level indicates a lower frequency of client interactions and a higher probability that the client is an automation program. A higher confidence level indicates a higher probability that the client initiates a normal operation.

The confidence levels are as follows:

Skip: Do not detect interactions. For example, if you enable proactive feature detection rules and set the interaction confidence to Skip, the system does not count the keyboard and mouse interactions of the client. If the interaction confidence is set to a low value, the client that has no keyboard and mouse interactions generated will be blocked.
High: More than 10 interactions are generated within 600s.
Medium: More than 5 interactions are generated within 600s.
Low: More than 0 interactions are generated within 600s.

If a client accesses the protected website for the first time or does not send any requests to the website within 600 seconds, the request is allowed.

Constraints

This function is supported only when Cloud Mode - CNAME access is used.
This function is supported by the standard, professional, and enterprise editions for cloud mode. If you buy the standard, professional, or enterprise edition for cloud mode, the bot protection edition is automatically adapted to the standard, professional, or professional edition, respectively. The bot protection service and WAF edition you buy have the same required duration.

Prerequisites

You have connected the website you want to protect to WAF. For details, see Connecting Your Website to WAF.
At least one protection rule has been configured for the domain name. For details, see Configuring Protection Policies.
You have created a policy, and this policy has not been shared with others.
This function cannot be configured in shared policies.

Configuring a Bot Protection Rule

Log in to the WAF console.
Click in the upper left corner and select a region or project.
(Optional) If you have enabled the enterprise project function, in the upper part of the navigation pane on the left, select your enterprise project from the Filter by enterprise project drop-down list. Then, WAF will display the related security data in the enterprise project on the page.
In the navigation pane on the left, click Policies.
Click the name of the target policy to go to the protection rule configuration page.

Before configuring protection rules, ensure that the target protection policy has been applied to a domain name. A protection policy can be applied to multiple protected domain names, but a protected domain name can have only one protection policy.
Click the Bot Protection configuration box and enable bot protection.

: enabled.

On the Custom Protected Objects tab, click Add Protected Object Feature, set Field, Subfield, Logic, and Content, and click Save.

The following table lists the conditions that can be matched by protected objects in bot rules.

**Table 2** Condition list
Field	Field Description	Subfield	Logic	Content
Path	The path of a resource requested by the client. A path is part of a URL.	--	The following logical relationships are supported: Include, Exclude, Equal to, Not equal to, Prefix is, Prefix is not, Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, and Prefix is not any value. NOTE: If the logical relationship is Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, or Prefix is not any value, you can select an existing reference table for Content. For details about how to add and manage a reference table, see Creating a Reference Table to Configure Protection Indicators in Batches.	Enter the path to be protected. Configuration description: The path does not contain a domain name and supports only exact match. So, the path to be protected must be the same as the path you configure. If the path to be protected is /admin, set Path to /admin. If Path is set to /, all paths of the website are protected. The path content cannot contain the following special characters: (<>*)
Method	The request method.	--	The following logical relationships are supported: Equal to and Not equal to.	Enter the request method, for example, GET, POST, PUT, DELETE, or PATCH.
Cookie	The cookie in the request.	Custom subfield. Length: 1 to 2,048 characters.	The following logical relationships are supported: Include, Exclude, Equal to, Not equal to, Prefix is, Prefix is not, Suffix is, Suffix is not, Has, Does not have, Equal to any value, Not equal to any value, and Exclude any value. NOTE: If the logical operator is Equal to any value, Not equal to any value, or Exclude any value, you can select an existing reference table for Content. For details about how to add and manage a reference table, see Creating a Reference Table to Configure Protection Indicators in Batches.	Enter the cookie value of the request, for example, jsessionid.
Header	The request header content.			Enter the request header content, for example, text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8.
Params	The query parameter in the URL. The query parameter is the content following the question mark (?).			Enter the query parameter, for example, 201901150929.
Referer	The source of the access request.	--		Enter the request access source. For example, if the protected path is /admin/xxx and you do not want visitors to be able to access the page from www.test.com, set Content for Referer to http://www.test.com.

Click Add Rule and add more than one protected object. A maximum of 10 protected objects can be added.
If there are multiple rules, the AND operator is used. The feature cannot be matched unless all rules are met.

After the preceding configurations are complete, you can view, modify, or delete the configured rules in the protected object feature list.

On the Known bots, Signature-based requests, Proactive feature detection, or Bot behavior card, configure rules.

Click the Known bots card and toggle Status on.
- Figure 1 shows the default configurations after you enable this function.
  Figure 1 Known bots
- Click on the left of the protection rule to view the details about and features of the protection rule.
Enable or disable a specific rule and configure protective actions based on your service requirements.
The protective actions are as follows:
- Log only: WAF only logs requests that match the features.
- JS Challenge: After identifying the feature, WAF returns a segment of JavaScript code that can be automatically executed by a normal browser to the client. If the client properly executes the JavaScript code, WAF allows all requests from the client within a period of time (30 minutes by default). During this period, no verification is required. If the client fails to execute the code, WAF blocks the requests.
  
  If the referer in the request is different from the current host, the JS challenge does not work.
- Block: WAF blocks requests that match the features.

Click the Signature-based requests card and toggle Status on.
- Figure 2 shows the default configurations after you enable this function.
  Figure 2 Signature-based requests
- Click on the left of the protection rule to view the details about the protection rule.
Enable or disable a specific rule and configure protective actions based on your service requirements.
The protective actions are as follows:
- Log only: WAF only logs requests that match the features.
- JS Challenge: After identifying the feature, WAF returns a segment of JavaScript code that can be automatically executed by a normal browser to the client. If the client properly executes the JavaScript code, WAF allows all requests from the client within a period of time (30 minutes by default). During this period, no verification is required. If the client fails to execute the code, WAF blocks the requests.
  
  If the referer in the request is different from the current host, the JS challenge does not work.
- Block: WAF blocks requests that match the features.

Click the Bot behavior card and enable AI-based behavior detection.

Figure 3 shows the default configurations after you enable this function.
Figure 3 Bot behavior
Set three behavior detection score ranges based on service requirements. Score range: 0 to 100. A score closer to 0 indicates that the request feature is more like a normal request, and a score closer to 100 indicates that the request feature is more like a bot.
Configure a protective action for each score range.
The protective actions are as follows:
- Allow: WAF allows requests that match the features to pass.
- Log only: WAF only logs requests that match the features.
- JS Challenge: After identifying the feature, WAF returns a segment of JavaScript code that can be automatically executed by a normal browser to the client. If the client properly executes the JavaScript code, WAF allows all requests from the client within a period of time (30 minutes by default). During this period, no verification is required. If the client fails to execute the code, WAF blocks the requests.
  
  If the referer in the request is different from the current host, the JS challenge does not work.
- Block: WAF blocks requests that match the features.

Currently, proactive feature detection can check traffic of websites that are connected to WAF in cloud CNAME access mode or dedicated mode. Currently, proactive feature detection is not supported in cloud load balance access mode.
Currently, proactive feature detection supports only web browser services. Before enabling this function, ensure that the protected object is a browser client. Alternatively, configure matching conditions to ensure that the resources can be accessed only by web browsers. Otherwise, mobile app access may be affected.
Currently, proactive feature detection supports only origin server HTML pages that are encoded using UTF-8. To keep normal access to origin servers unaffected, ensure that UTF-8 is used for encoding HTML packets and Content-Type is HTML for origin servers before enabling this function.

Click the Proactive feature detection card and toggle Status on.

Figure 4 shows the default configurations after you enable this function.
Figure 4 Proactive feature detection
Set Confidence for detection based on actual service requirements.

A lower confidence level indicates a lower frequency of client interactions and a higher probability that the client is an automation program. A higher confidence level indicates a higher probability that the client initiates a normal operation. Proactive feature detection supports the following confidence levels:
- Skip: Do not detect interactions. For example, if you enable proactive feature detection rules and set the interaction confidence to Skip, the system does not count the keyboard and mouse interactions of the client. If the interaction confidence is set to a low value, the client that has no keyboard and mouse interactions generated will be blocked.
- High: More than 10 interactions are generated within 600s.
- Medium: More than 5 interactions are generated within 600s.
- Low: More than 0 interactions are generated within 600s.
Configure a protective action for each confidence level.
The protective actions are as follows:
- Log only: WAF only logs requests that match the features.
- Block: WAF blocks requests that match the features.

Protection Verification

To verify that WAF is protecting your domain name (www.example.com) according to the default settings (with Protective Action set to Block), take the following steps:

Clear the browser cache and enter the domain name in the address bar to check whether the website is accessible.
- If the website is inaccessible, connect the website domain name to WAF by referring to Connecting Your Website to WAF with Cloud Mode - CNAME Access.
- If the website is accessible, go to 2.
Simulate a bot behavior.
Return to the WAF console. In the navigation pane on the left, click Events. On the displayed page, check event logs.

Related Operations

Bot Protection Statistics: You can learn of bot protection statistics, including traffic distribution, action distribution, traffic trends, BOT score distribution, and top event source statistics.
Querying a Protection Event: Click Details in the Operation column of the target event to view event details.
You can disable Known bots, Signature-based request, Bot behavior, or Proactive feature detection to disable all rules under the detection. Your settings will be retained even if you disable the corresponding detection.