Help Center> Web Application Firewall> Best Practices> Configuring Protection Policies> Configuring Anti-Crawler Rules to Prevent Crawler Attacks
Updated on 2024-02-29 GMT+08:00

Configuring Anti-Crawler Rules to Prevent Crawler Attacks

Web crawlers make network information collection and query easy, but they also introduce the following negative impacts:

  • Web crawlers always consume too much server bandwidth and increase server load as they use specific policies to browser as much information of high value on a website as possible.
  • Bad actors may use web crawlers to launch DoS attacks against websites. As a result, websites may fail to provide normal services due to resource exhaustion.
  • Bad actors may use web crawlers to steal mission-critical data on your websites, which will damage your economic interests.

WAF provides three anti-crawler policies, bot detection by identifying User-Agent, website anti-crawler by checking browser validity, and CC attack protection by limiting the access frequency, to comprehensively mitigate crawler attacks against your websites.

Prerequisites

The domain name has been connected to WAF.

Enabling Robot Detection to Identify User-Agent

If you enable robot detection, WAF can detect and block threats such as malicious crawlers, scanners, and web shells.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Choose Security > Web Application Firewall to go to the Dashboard page.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the domain name, click the number to go to the Policies page.
  6. Ensure that Basic Web Protection is enabled (status: ).

    Figure 1 Basic Web Protection configuration area

  7. Click Advanced Settings. On the Protection Status page, enable General Check and Webshell Detection.
  8. In the Anti-Crawler configuration area, toggle it on. Click Configure Bot Mitigation.

    Figure 2 Anti-Crawler configuration area

  9. On the Feature Library page, enable protection functions based on your business needs.

    Figure 3 Feature Library

If WAF detects that a malicious crawler or scanner is crawling your website, WAF immediately blocks it and logs the event. You can view the crawler protection logs on the Events page.

Enabling Anti-Crawler Protection to Verify Browser Validity

If you enable anti-crawler protection, WAF dynamically analyzes website service models and accurately identifies crawler behavior based on data risk control and bot identification approaches.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Choose Security > Web Application Firewall to go to the Dashboard page.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the domain name, click the number to go to the Policies page.
  6. In the Anti-Crawler configuration area, toggle on the function if needed. Then, click Configure Bot Mitigation.

    Figure 4 Anti-Crawler configuration area

  7. Select the JavaScript tab and change Status if needed.

    JavaScript anti-crawler is disabled by default. To enable it, click and then click Confirm in the displayed dialog box to toggle on .

    • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
    • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

      CDN caching may impact JS anti-crawler performance and page accessibility.

  8. Configure a JavaScript-based anti-crawler rule by referring to Table 1.

    Two protective actions are provided: Protect all requests and Protect specified requests.

    • To protect all paths except a specified path

      Set Protection Mode to Protect all paths. Then, click Exclude Path, configure protected paths, and click Confirm.

      Figure 5 Exclude Rule
    • To protect a specified path only

      Set Protection Mode to Protect specified requests, click Add Rule, configure the request rule, and click Confirm.

      Figure 6 Add Rule
    Table 1 Parameters of a JavaScript-based anti-crawler protection rule

    Parameter

    Description

    Example Value

    Rule Name

    Name of the rule

    wafjs

    Path

    A part of the URL, not including the domain name

    A URL is used to define the address of a web page. The basic URL format is as follows:

    Protocol name://Domain name or IP address[:Port]/[Path/.../File name].

    For example, if the URL is http://www.example.com/admin, set Path to /admin.

    NOTE:
    • The path does not support regular expressions.
    • The path cannot contain two or more consecutive slashes. For example, ///admin. If you enter ///admin, WAF converts /// to /.

    /admin

    Logic

    Select a logical relationship from the drop-down list.

    Include

    Rule Description

    A brief description of the rule.

    None

    Effective Date

    Immediate

    Immediate

If you enable anti-crawler, web visitors can only access web pages through a browser.

Configuring CC Attack Protection to Limit Access Frequency

A CC attack protection rule uses a specific IP address, cookie, or referer to limit the access to a specific path (URL), mitigating the impact of CC attacks on web services.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Choose Security > Web Application Firewall to go to the Dashboard page.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the target domain name, click the number of enabled protection rules. On the displayed Policies page, keep the Status toggle on () for CC Attack Protection.

    Figure 7 CC Attack Protection configuration area

  6. In the upper left corner above the CC Attack Protection rule list, click Add Rule. The following uses IP address-based rate limiting and human-machine verification as examples to describe how to add an IP address-based rate limiting rule, as shown in Figure 8.

    Figure 8 Per IP address

    If the number of access requests exceeds the configured rate limit, the visitors are required to enter a verification code to continue the access.