Updated on 2025-08-19 GMT+08:00

Configuring Anti-Crawler Rules

You can configure website anti-crawler protection rules to protect against search engines, scanners, script tools, and other crawlers, and use JavaScript to create custom anti-crawler protection rules.

Prerequisites

Constraints

  • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
  • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

    CDN caching may impact JS anti-crawler performance and page accessibility.

  • The JavaScript anti-crawler function is unavailable for pay-per-use WAF instances.
  • This function is not supported in the standard edition.
  • JS anti-crawler protection is not supported in Cloud - Load balancer WAF.
  • If JavaScript anti-crawler event logs cannot be viewed, see Why Are There No Protection Logs for Some Requests Blocked by WAF JavaScript Anti-Crawler Rules?
  • The protective action for website anti-crawler JavaScript challenge is Log only, and that for JavaScript authentication is Verification code. If a visitor fails the JavaScript authentication, a verification code is required for access. Requests will be forwarded as long as the visitor enters a valid verification code.
  • WAF JavaScript-based anti-crawler rules only check GET requests and do not check POST requests.

Configuring an Anti-Crawler Rule

  1. Log in to the WAF console.
  2. Click in the upper left corner and select a region or project.
  3. (Optional) If you have enabled the enterprise project function, in the upper part of the navigation pane on the left, select your enterprise project from the Filter by enterprise project drop-down list. Then, WAF will display the related security data in the enterprise project on the page.
  4. In the navigation pane on the left, click Policies.
  5. Click the name of the target policy to go to the protection rule configuration page.

    Before configuring protection rules, ensure that the target protection policy has been applied to a domain name. A protection policy can be applied to multiple protected domain names, but a protected domain name can have only one protection policy.

  6. In the Anti-Crawler configuration area, toggle on the function if needed.

    : enabled.

  7. On the Feature Library tab, configure the protective action for attacks that match the feature library rule and enable necessary detection types. Table 1 describes detection types. Figure 3 shows an example.

    Figure 3 Feature Library
    • Protective Action: response action when a request matches the rule.
      • Block: WAF blocks and logs detected attacks.

        Enabling this feature may have the following impacts:

        • Blocking requests of search engines may affect your website SEO.
        • Blocking scripts may block some applications because those applications may trigger anti-crawler rules if their user-agent field is not modified.
      • Log only (default): WAF only logs detected attacks.
    • Detection type: WAF enables Scanner by default. You can configure detection types based on your service requirements.
      Table 1 Anti-crawler detection features

      Type

      Description

      Remarks

      Search Engine

      This rule is used to block web crawlers, such as Googlebot and Baiduspider, from collecting content from your site.

      • If you enable this rule, WAF detects and blocks search engine crawlers.
      • If you disable this, WAF does not block POST requests from Googlebot or Baiduspider. If you want to block POST requests from Baiduspider, use the configuration described in Configuration Example: Search Engine.

      Scanner

      This rule is used to block scanners, such as OpenVAS and Nmap. A scanner scans for vulnerabilities, viruses, and other jobs.

      After you enable this rule, WAF detects and blocks scanner crawlers.

      Script Tool

      This rule is used to block script tools. A script tool is often used to execute automatic tasks and program scripts, such as HttpClient, OkHttp, and Python programs.

      If you enable this rule, WAF detects and blocks the execution of automatic tasks and program scripts.

      If your application uses scripts such as HttpClient, OkHttp, and Python, disable Script Tool. Otherwise, WAF will identify such script tools as crawlers and block the application.

      Other

      This rule is used to block crawlers used for other purposes, such as site monitoring, using access proxies, and web page analysis.

      To avoid being blocked by WAF, crawlers may use a large number of IP address proxies.

      If you enable this rule, WAF detects and blocks crawlers that are used for various purposes.

  8. On the JavaScript tab, configure a JS anti-crawler rule.

    • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
    • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

      CDN caching may impact JS anti-crawler performance and page accessibility.

    Figure 4 JavaScript
    • Status: Enable or disable JavaScript.
      • : enabled.
      • (default): disabled.
    • Protective Action: response action when a request matches the rule.
      • Block: Requests that hit the rule will be blocked, and a block response page will be returned to the client that initiates the requests.
      • Log only: Requests that hit the rule will be logged but not be blocked.
      • Verification code: If the JavaScript challenge fails, a verification code is required. Requests will be blocked unless the visitor enters a correct verification code.
    • Protection Mode: Select the protection mode for JavaScript anti-crawler.
      • Protect all requests: All requests except request matching excluded rules are protected. In other words, requests that hit enabled rules are not protected. Click Exclude Rule and add the rule to be excluded. Table 2 describes parameters.
      • Protect specified requests: Only specified requests are protected. If you set Path to / or disabled the rule, no request paths are protected. Click Protect specified requests and add a rule. Table 2 describes parameters.
      Table 2 Parameters of a JavaScript-based anti-crawler protection rule

      Parameter

      Description

      Example Value

      Rule Name

      Name of the rule.

      waf

      Rule Description (Optional)

      Description of the rule.

      -

      Condition List

      Request features to be matched by the rule. If a request matches the features, WAF handles the request according to the configured rule.

      • At least one condition is required for the rule to take effect. If multiple conditions are configured, the rule takes effect only when all conditions are met.
      • Click Add Condition to add a condition. You can add up to 30 conditions.

      Condition parameter description:

      • Field: Select the field to be checked from the drop-down list. Currently, only Path and User Agent are included.
      • Subfield: You do not need to set subfields for Path and User Agent fields.
      • Logic: Select the desired logical relationship from the drop-down list.
      • Content: Enter or select the content that matches the condition.

        If you set Logic to Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, Prefix is not any of them, Suffix is any value, or Suffix is not any of them, you need to select a reference table.

      • Case-Sensitive: This parameter can be configured if Path is selected for Field. If you enable this, the system matches the case-sensitive path. It helps the system accurately identify and handle various crawler requests, improving the accuracy and effectiveness of anti-crawler policies.

      Path for Field, Include for Logic, and /admin for Content

      Application Schedule

      The rule takes effect immediately after being added.

      Immediate

      Priority

      Rule priority. If you have added multiple rules, rules are matched by priority. The smaller the value you set, the higher the priority.

      5

      After completing the preceding configurations, you can:

      • Check the rule status: In the protection rule list, check the rule you added. Rule Status is Enabled by default.
      • Disable the rule: If you do not want the rule to take effect, click Disable in the Operation column of the rule.
      • Delete or modify the rule: Click Delete or Modify in the Operation column of the rule.

  9. Configure a JavaScript-based anti-crawler rule by referring to Table 3.

    Two protective actions are provided: Protect all requests and Protect specified requests.

    • To protect all requests except requests that hit a specified rule
      Set Protection Mode to Protect all requests. Then, click Exclude Rule, configure the request exclusion rule, and click OK.
      Figure 5 Exclude Rule
    • To protect a specified request only

      Set Protection Mode to Protect specified requests, click Add Rule, configure the request rule, and click OK.

      Figure 6 Configuring request rules
    Table 3 Parameters of a JavaScript-based anti-crawler protection rule

    Parameter

    Description

    Example Value

    Rule Name

    Name of the rule.

    waf

    Rule Description

    A brief description of the rule. This parameter is optional.

    -

    Effective Date

    Time the rule takes effect.

    Immediate

    Condition List

    Condition parameter description:

    • Field: Select the field you want to protect from the drop-down list. Currently, only Path and User Agent are included.
    • Subfield
    • Logic: Select a logical relationship from the drop-down list.
      NOTE:

      If you set Logic to Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, Prefix is not any of them, Suffix is any value, or Suffix is not any of them, you need to select a reference table.

    • Content: Enter or select the content that matches the condition.
    • Case-Sensitive: This parameter can be configured if Path is selected for Field. If you enable this, the system matches the case-sensitive path. It helps the system accurately identify and handle various crawler requests, improving the accuracy and effectiveness of anti-crawler policies.

    Path Include /admin

    Priority

    Rule priority. If you have added multiple rules, rules are matched by priority. The smaller the value you set, the higher the priority.

    5

    After completing the preceding configurations, you can:

    • Check the rule status: In the protection rule list, check the rule you added. Rule Status is Enabled by default.
    • Disable the rule: If you do not want the rule to take effect, click Disable in the Operation column of the rule.
    • Delete or modify the rule: Click Delete or Modify in the Operation column of the rule.

Configuration Example: Logging Script Crawlers Only

You can take the following steps to verify that WAF is protecting your website domain name (www.example.com) against an anti-crawler rule.

  1. Execute a JavaScript tool to crawl web page content.
  2. On the Feature Library tab, enable Script Tool and select Log only for Protective Action. (If WAF detects an attack, it logs the attack only.)

    Figure 7 Enabling Script Tool

  3. Enable anti-crawler protection.

    Figure 8 Enabling Anti-Crawler protection

  4. In the navigation pane on the left, choose Events to go to the Events page.

    Figure 9 Viewing Events - Script crawlers

Configuration Example: Search Engine

To allow the search engine of Baidu or Google and block the POST request of Baidu:

  1. Set Status of Search Engine to by referring to 6.
  2. Configure a precise protection rule by referring to Configuring Custom Precise Protection Rules.

    Figure 10 Blocking POST requests