Este conteúdo foi traduzido por máquina para sua conveniência e a Huawei Cloud não pode garantir que o conteúdo foi traduzido com precisão. Para exibir o conteúdo original, use o link no canto superior direito para mudar para a página em inglês.
Atualizado em 2023-09-21 GMT+08:00

Configuring Anti-Crawler Rules

You can configure website anti-crawler protection rules to protect against search engines, scanners, script tools, and other crawlers, and use JavaScript to create custom anti-crawler protection rules.

Se você ativou projetos corporativos, verifique se tem todas as permissões de operação para o projeto em que sua instância do WAF está localizada. Em seguida, você pode selecionar o projeto na lista suspensa Enterprise Project e configurar políticas de proteção para os nomes de domínio no projeto.

Pré-requisitos

Um site foi adicionado ao WAF.

Constraints

  • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
  • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

    CDN caching may impact JS anti-crawler performance and page accessibility.

  • The JavaScript anti-crawler function is unavailable for pay-per-use WAF instances.
  • This function is unavailable in the standard edition, formerly professional edition.
  • Currently, the JavaScript anti-crawler is supported in CN-Hong Kong and AP-Bangkok regions.
  • If no blocking logs are available after you enable the JavaScript-based anti-crawler rules, handle the issue by referring to Why No Logs Are Found for Some Requests Blocked by WAF After Anti-Crawler Is Enabled?
  • WAF only logs JavaScript challenge and JavaScript authentication events. No other protective actions can be configured for JavaScript challenge and authentication.
  • WAF JavaScript-based anti-crawler rules only check GET requests and do not check POST requests.

How JavaScript Anti-Crawler Protection Works

Figura 1 shows how JavaScript anti-crawler detection works, which includes JavaScript challenges (step 1 and step 2) and JavaScript authentication (step 3).

Figura 1 JavaScript Anti-Crawler protection process

If JavaScript anti-crawler is enabled when a client sends a request, WAF returns a piece of JavaScript code to the client.

  • If the client sends a normal request to the website, triggered by the received JavaScript code, the client will automatically send the request to WAF again. WAF then forwards the request to the origin server. This process is called JavaScript verification.
  • If the client is a crawler, it cannot be triggered by the received JavaScript code and will not send a request to WAF again. The client fails JavaScript authentication.
  • If a client crawler fabricates a WAF authentication request and sends the request to WAF, the WAF will block the request. The client fails JavaScript authentication.

By collecting statistics on the number of JavaScript challenges and authentication responses, the system calculates how many requests the JavaScript anti-crawler defends. In Figura 2, the JavaScript anti-crawler has logged 18 events, 16 of which are JavaScript challenge responses, and 2 of which are JavaScript authentication responses. Others is the number of WAF authentication requests fabricated by the crawler.

Figura 2 Parameters of a JavaScript anti-crawler protection rule

WAF only logs JavaScript challenge and JavaScript authentication events. No other protective actions can be configured for JavaScript challenge and authentication.

Procedure

  1. Efetue login no console de gerenciamento.
  2. Clique em no canto superior esquerdo do console de gerenciamento e selecione uma região ou projeto.
  3. Clique em no canto superior esquerdo e escolha Web Application Firewall em Security & Compliance.
  4. No painel de navegação, escolha Website Settings.
  5. Na coluna Policy da linha que contém o nome de domínio, clique em Configure Policy.
  6. In the Anti-Crawler configuration area, enable anti-crawler using the toggle on the right, as shown in Figura 3. If you enable this function, click Configure Bot Mitigation.

    Figura 3 Anti-Crawler configuration area

  7. Select the Feature Library tab and enable the protection by referring to Tabela 1. Figura 4 shows an example.

    A feature-based anti-crawler rule has two protective actions:
    • Block

      WAF blocks and logs detected attacks.

    • Log only

      Detected attacks are logged only. This is the default protective action.

    Scanner is enabled by default, but you can enable other protection types if needed.
    Figura 4 Feature Library

    Tabela 1 Anti-crawler detection features

    Type

    Description

    Remarks

    Search Engine

    This rule is used to block web crawlers, such as Googlebot and Baiduspider, from collecting content from your site.

    If you enable this rule, WAF detects and blocks search engine crawlers.

    NOTA:

    If Search Engine is not enabled, WAF does not block POST requests from Googlebot or Baiduspider. If you want to block POST requests from Baiduspider, use the configuration described in Configuration Example - Search Engine.

    Scanner

    This rule is used to block scanners, such as OpenVAS and Nmap. A scanner scans for vulnerabilities, viruses, and other jobs.

    After you enable this rule, WAF detects and blocks scanner crawlers.

    Script Tool

    This rule is used to block script tools. A script tool is often used to execute automatic tasks and program scripts, such as HttpClient, OkHttp, and Python programs.

    If you enable this rule, WAF detects and blocks the execution of automatic tasks and program scripts.

    NOTA:

    If your application uses scripts such as HttpClient, OkHttp, and Python, disable Script Tool. Otherwise, WAF will identify such script tools as crawlers and block the application.

    Other

    This rule is used to block crawlers used for other purposes, such as site monitoring, using access proxies, and web page analysis.

    NOTA:

    To avoid being blocked by WAF, crawlers may use a large number of IP address proxies.

    If you enable this rule, WAF detects and blocks crawlers that are used for various purposes.

  8. Select the JavaScript tab and configure Status and Protective Action.

    JavaScript anti-crawler is disabled by default. To enable it, click and click Confirm in the displayed dialog box. indicates that the JavaScript anti-crawler is enabled.

    • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
    • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

      CDN caching may impact JS anti-crawler performance and page accessibility.

  9. Configure a JavaScript-based anti-crawler rule by referring to Tabela 2.

    Two protective actions are provided: Protect all paths and Protect a specified path.

    • To protect all paths except a specified path
      Select Protect all paths, but then in the upper left corner of the page, click Exclude Path. Configure the required parameters in the displayed dialog box and click OK.
      Figura 5 Exclude Path
    • To protect a specified path only

      Select Protect a specified path. In the upper left corner of the page, click Add Path. In the displayed dialog box, configure required parameters and click OK.

      Figura 6 Add Path
    Tabela 2 Parameters of a JavaScript-based anti-crawler protection rule

    Parameter

    Description

    Example Value

    Rule Name

    Name of the rule

    wafjs

    Path

    A part of the URL, not including the domain name

    A URL is used to define the address of a web page. The basic URL format is as follows:

    Protocol name://Domain name or IP address[:Port]/[Path/.../File name].

    For example, if the URL is http://www.example.com/admin, set Path to /admin.

    NOTA:
    • The path does not support regular expressions.
    • The path cannot contain two or more consecutive slashes. For example, ///admin. If you enter ///admin, WAF converts /// to /.

    /admin

    Logic

    Select a logical relationship from the drop-down list.

    Include

    Rule Description

    A brief description of the rule.

    None

Other Operations

  • Para desativar uma regra, clique em Disable na coluna Operation da regra. O Rule Status padrão é Enabled.
  • To modify a rule, click Modify in the row containing the rule.
  • To delete a rule, click Delete in the row containing the rule.

Configuration Example - Logging Script Crawlers Only

To verify that WAF is protecting domain name www.example.com against an anti-crawler rule:

  1. Execute a JavaScript tool to crawl web page content.
  2. On the Feature Library tab, enable Script Tool and select Log only for Protective Action. (If WAF detects an attack, it logs the attack only.)

    Figura 7 Enabling Script Tool

  3. Enable anti-crawler protection.

    Figura 8 Anti-Crawler configuration area

  4. In the navigation pane on the left, choose Events to go to the Events page.

    Figura 9 Viewing Events - Script crawlers

Configuration Example - Search Engine

The following shows how to allow the search engine of Baidu or Google and block the POST request of Baidu.

  1. Set Status of Search Engine to by referring to the instructions in 6.
  2. Configure a precise protection rule by referring to Configuração de uma regra de proteção precisa, as shown in Figura 10.

    Figura 10 Blocking POST requests