Configuring Anti-Crawler Rules to Prevent Crawler Attacks
Web crawlers facilitate network information collection and query, but they also introduce the following negative impacts:
- Web crawlers always consume too much server bandwidth and increase server load as they use specific policies to browser as much information of high value on a website as possible.
- Bad actors may use web crawlers to launch DoS attacks against websites. As a result, websites may fail to provide normal services due to resource exhaustion.
- Bad actors may use web crawlers to steal mission-critical data on your websites, which will damage your economic interests.
WAF provides three anti-crawler policies, bot detection by identifying User-Agent, website anti-crawler by checking browser validity, and CC attack protection by limiting the access frequency, to comprehensively mitigate crawler attacks against your websites.
Prerequisites
The domain name has been connected to WAF.
Enabling Robot Detection to Identify User-Agent
If you enable robot detection, WAF can detect and block threats such as malicious crawlers, scanners, and web shells.
- Log in to the management console.
- Click
in the upper left corner of the management console and select a region or project.
- Click
in the upper left corner and choose Web Application Firewall under Security & Compliance.
- In the navigation pane on the left, choose Website Settings.
- Locate the row that contains the target domain name and click Configure Policy in the Policy column. On the displayed page, confirm that the Status icon of Basic Web Protection is shown as
. Figure 1 shows an example.
- Click Advanced Settings. On the Protection Status page, enable General Check and Webshell Detection.
- In the Anti-Crawler configuration area, enable anti-crawler using the toggle on the right, as shown in Figure 2. If you enable this function, click Configure Anti-Crawler.
- On the Feature Library page, enable all robot detection features, as shown in Figure 3.
If WAF detects that a malicious crawler or scanner is crawling your website, WAF immediately blocks it and logs the event. You can view the crawler protection logs on the Events page.
Enabling Anti-Crawler Protection to Verify Browser Validity
If you enable anti-crawler protection, WAF dynamically analyzes website service models and accurately identifies crawler behavior based on data risk control and bot identification approaches.
- Log in to the management console.
- Click
in the upper left corner of the management console and select a region or project.
- Click
in the upper left corner and choose Web Application Firewall under Security & Compliance.
- In the navigation pane on the left, choose Website Settings.
- In the Policy column of the row containing the target domain name, click Configure Policy to go to the configuration page.
- In the Anti-Crawler configuration area, enable anti-crawler using the toggle on the right, as shown in Figure 4. If you enable this function, click Configure Anti-Crawler.
- Select the JavaScript tab and configure Status and Protective Action.
JavaScript anti-crawler is disabled by default. To enable it, click
and click Confirm in the displayed dialog box.
indicates that JavaScript anti-crawler is enabled.
- Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
- If your service is connected to CDN, exercise caution when using the JS anti-crawler function.
CDN caching may impact JS anti-crawler performance and page accessibility.
- Configure a JavaScript-based anti-crawler rule by referring to Table 1.
Two protective actions are provided: Protect all paths and Protect a specified path.
- To protect all paths except a specified path
- To protect a specified path only
Select Protect a specified path. In the upper left corner of the page, click Add Path. In the displayed dialog box, configure required parameters and click OK.
Figure 6 Add Path
Table 1 Parameters of a JavaScript-based anti-crawler protection rule Parameter
Description
Example Value
Rule Name
Name of the rule
wafjs
Path
A part of the URL, not including the domain name
A URL is used to define the address of a web page. The basic URL format is as follows:
Protocol name://Domain name or IP address[:Port]/[Path/.../File name].
For example, if the URL is http://www.example.com/admin, set Path to /admin.
NOTE:- The path does not support regular expressions.
- The path cannot contain two or more consecutive slashes. For example, ///admin. If you enter ///admin, WAF converts /// to /.
/admin
Logic
Select a logical relationship from the drop-down list.
Include
Rule Description
A brief description of the rule.
None
If you enable anti-crawler, web visitors can only access web pages through a browser.
Configuring CC Attack Protection to Limit Access Frequency
A CC attack protection rule uses a specific IP address, cookie, or referer to limit the access to a specific path (URL), mitigating the impact of CC attacks on web services.
- Log in to the management console.
- Click
in the upper left corner of the management console and select a region or project.
- Click
in the upper left corner and choose Web Application Firewall under Security & Compliance.
- In the navigation pane on the left, choose Website Settings.
- Locate the row that contains the target domain name and click Configure Policy in the Policy column. On the displayed page, confirm that the Status icon for CC Attack Protection is shown as
. Figure 7 shows an example.
- In the upper left corner of the CC Attack Protection page, click Add Rule. The following uses IP address-based rate limiting and human-machine verification as examples to describe how to add an IP address-based rate limiting rule, as shown in Figure 8.
If the number of access requests exceeds the configured rate limit, the visitors are required to enter a verification code to continue the access.
Feedback
Was this page helpful?
Provide feedbackFor any further questions, feel free to contact us through the chatbot.
Chatbot