Help Center> Web Application Firewall> Best Practices> Configuring Protection Policies> Configuring Anti-Crawler Rules to Prevent Crawler Attacks
Updated on 2024-04-10 GMT+08:00

Configuring Anti-Crawler Rules to Prevent Crawler Attacks

Web crawlers make network information collection and query easy, but they also introduce the following negative impacts:

  • Web crawlers always consume too much server bandwidth and increase server load as they use specific policies to browser as much information of high value on a website as possible.
  • Bad actors may use web crawlers to launch DoS attacks against websites. As a result, websites may fail to provide normal services due to resource exhaustion.
  • Bad actors may use web crawlers to steal mission-critical data on your websites, which will damage your economic interests.

WAF provides three anti-crawler policies, bot detection by identifying User-Agent, website anti-crawler by checking browser validity, and CC attack protection by limiting the access frequency, to comprehensively mitigate crawler attacks against your websites.

Prerequisites

The domain name has been connected to WAF.

Enabling Robot Detection to Identify User-Agent

If you enable robot detection, WAF can detect and block threats such as malicious crawlers, scanners, and web shells.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Click in the upper left corner and choose Web Application Firewall under Security & Compliance.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the domain name, click the number to go to the Policies page.
  6. Ensure that Basic Web Protection is enabled (status: ).

    Figure 1 Basic Web Protection configuration area

  7. On the Protection Status page, enable General Check and Webshell Detection.
  8. Click the Anti-Crawler configuration area and toggle it on.

    • : enabled.
    • : disabled.

  9. On the Feature Library page, enable protection functions based on your business needs.

    Figure 2 Feature Library

If WAF detects that a malicious crawler or scanner is crawling your website, WAF immediately blocks it and logs the event. You can view the crawler protection logs on the Events page.

Enabling Anti-Crawler Protection to Verify Browser Validity

If you enable anti-crawler protection, WAF dynamically analyzes website service models and accurately identifies crawler behavior based on data risk control and bot identification approaches.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Click in the upper left corner and choose Web Application Firewall under Security & Compliance.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the domain name, click the number to go to the Policies page.
  6. Click the Anti-Crawler configuration area and toggle it on.

    • : enabled.
    • : disabled.

  7. Select the JavaScript tab and change Status if needed.

    JavaScript anti-crawler is disabled by default. To enable it, click and then click OK in the displayed dialog box to toggle on .

    Protective Action: Block, Verification code, and Log only.

    Verification code: If the JavaScript challenge fails, a verification code is required. Requests will be blocked unless the visitor enters a correct verification code.

    • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
    • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

      CDN caching may impact JS anti-crawler performance and page accessibility.

  8. Configure a JavaScript-based anti-crawler rule by referring to Table 1.

    Two protective actions are provided: Protect all requests and Protect specified requests.

    • To protect all requests except requests that hit a specified rule
      Set Protection Mode to Protect all requests. Then, click Exclude Rule, configure the request exclusion rule, and click Confirm.
      Figure 3 Exclude Rule
    • To protect a specified request only

      Set Protection Mode to Protect specified requests, click Add Rule, configure the request rule, and click Confirm.

      Figure 4 Add Rule
    Table 1 Parameters of a JavaScript-based anti-crawler protection rule

    Parameter

    Description

    Example Value

    Rule Name

    Name of the rule

    waf

    Rule Description

    A brief description of the rule. This parameter is optional.

    -

    Effective Date

    Time the rule takes effect.

    Immediate

    Condition List

    Parameters for configuring a condition are as follows:

    • Field: Select the field you want to protect from the drop-down list. Currently, only Path and User Agent are included.
    • Subfield
    • Logic: Select a logical relationship from the drop-down list.
      NOTE:

      If you set Logic to Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, Prefix is not any of them, Suffix is any value, or Suffix is not any of them, you need to select a reference table.

    • Content: Enter or select the content that matches the condition.
    • Case sensitive: This parameter can be configured if Path is selected for Field. If you enable this, the system matches the case-sensitive path.

    Path Include /admin

    Priority

    Rule priority. If you have added multiple rules, rules are matched by priority. The smaller the value you set, the higher the priority.

    5

If you enable anti-crawler, web visitors can only access web pages through a browser.

Configuring CC Attack Protection to Limit Access Frequency

A CC attack protection rule uses a specific IP address, cookie, or referer to limit the access to a specific path (URL), mitigating the impact of CC attacks on web services.

  1. Log in to the management console.
  2. Click in the upper left corner of the management console and select a region or project.
  3. Click in the upper left corner and choose Web Application Firewall under Security & Compliance.
  4. In the navigation pane on the left, choose Website Settings.
  5. In the Policy column of the row containing the target domain name, click the number of enabled protection rules. On the displayed Policies page, keep the Status toggle on () for CC Attack Protection.

    Figure 5 CC Attack Protection configuration area

  6. In the upper left corner above the CC Attack Protection rule list, click Add Rule. The following uses IP address-based rate limiting and human-machine verification as examples to describe how to add an IP address-based rate limiting rule, as shown in Figure 6.

    Figure 6 Per IP address

    If the number of access requests exceeds the configured rate limit, the visitors are required to enter a verification code to continue the access.