Configuring Anti-Crawler Rules to Prevent Crawler Attacks

Web crawlers make network information collection and query easy, but they also introduce the following negative impacts:

Web crawlers always consume too much server bandwidth and increase server load as they use specific policies to browser as much information of high value on a website as possible.
Bad actors may use web crawlers to launch DoS attacks against websites. As a result, websites may fail to provide normal services due to resource exhaustion.
Bad actors may use web crawlers to steal mission-critical data on your websites, which will damage your economic interests.

WAF provides three anti-crawler policies, bot detection by identifying User-Agent, website anti-crawler by checking browser validity, and CC attack protection by limiting the access frequency, to comprehensively mitigate crawler attacks against your websites.

Prerequisites

The domain name has been connected to WAF.

Enabling Robot Detection to Identify User-Agent

If you enable robot detection, WAF can detect and block threats such as malicious crawlers, scanners, and web shells.

Log in to the management console.
Click in the upper left corner of the management console and select a region or project.
Click in the upper left corner and choose Web Application Firewall (Dedicated) under Security.
In the navigation pane on the left, choose Website Settings.
In the Policy column of the row containing the domain name, click the number to go to the Policies page.
Ensure that Basic Web Protection is enabled (status: ).
Click Advanced Settings. On the Protection Status page, enable General Check and Webshell Detection.
In the Anti-Crawler configuration area, toggle it on. Click Configure Anti-Crawler.
On the Feature Library page, enable protection functions based on your business needs.

If WAF detects that a malicious crawler or scanner is crawling your website, WAF immediately blocks it and logs the event. You can view the crawler protection logs on the Events page.

Enabling Anti-Crawler Protection to Verify Browser Validity

If you enable anti-crawler protection, WAF dynamically analyzes website service models and accurately identifies crawler behavior based on data risk control and bot identification approaches.

Log in to the management console.
Click in the upper left corner of the management console and select a region or project.
Click in the upper left corner and choose Web Application Firewall (Dedicated) under Security.
In the navigation pane on the left, choose Website Settings.
In the Policy column of the row containing the domain name, click the number to go to the Policies page.
In the Anti-Crawler configuration area, toggle on the function if needed. Then, click Configure Anti-Crawler.

Configure a JavaScript-based anti-crawler rule by referring to Table 1.

Two protective actions are provided: Protect all requests and Protect specified requests.

To protect all paths except a specified path
Set Protection Mode to Protect all paths. Then, click Exclude Path, configure protected paths, and click Confirm.

To protect a specified path only
Set Protection Mode to Protect specified requests, click Add Rule, configure the request rule, and click Confirm.

**Table 1** Parameters of a JavaScript-based anti-crawler protection rule
Parameter	Description	Example Value
Rule Name	Name of the rule	wafjs
Path	A part of the URL, not including the domain name A URL is used to define the address of a web page. The basic URL format is as follows: Protocol name://Domain name or IP address[:Port]/[Path/.../File name]. For example, if the URL is http://www.example.com/admin, set Path to /admin. NOTE: The path does not support regular expressions. The path cannot contain two or more consecutive slashes. For example, ///admin. If you enter ///admin, WAF converts /// to /.	/admin
Logic	Select a logical relationship from the drop-down list.	Include
Rule Description	A brief description of the rule.	None
Effective Date	Immediate	Immediate

Configuring CC Attack Protection to Limit Access Frequency

A CC attack protection rule uses a specific IP address, cookie, or referer to limit the access to a specific path (URL), mitigating the impact of CC attacks on web services.

Log in to the management console.
Click in the upper left corner of the management console and select a region or project.
Click in the upper left corner and choose Web Application Firewall (Dedicated) under Security.
In the navigation pane on the left, choose Website Settings.
In the Policy column of the row containing the target domain name, click the number of enabled protection rules. On the displayed Policies page, keep the Status toggle on () for CC Attack Protection.
In the upper left corner above the CC Attack Protection rule list, click Add Rule. The following uses IP address-based rate limiting and human-machine verification as examples to describe how to add an IP address-based rate limiting rule, as shown in Figure 1.

Figure 1 Per IP address

If the number of access requests exceeds the configured rate limit, the visitors are required to enter a verification code to continue the access.