Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring a BOT Protection Rule

Updated on 2025-03-04 GMT+08:00

You can configure BOT protection rules to protect against search engines, scanners, script tools, and other crawlers, and use JavaScript to create custom anti-crawler protection rules.

Prerequisites

A protected website has been added. For details, see Adding a Website to EdgeSec.

Constraints

  • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
  • It takes several minutes for a new rule to take effect. After the rule takes effect, protection events triggered by the rule will be displayed on the Events page.
  • If your service is connected to CDN, exercise caution when using this function.

    CDN caching may impact Anti-Crawler performance and page accessibility.

How JavaScript Anti-Crawler Protection Works

Figure 1 shows how JavaScript anti-crawler detection works, which includes JavaScript challenges (step 1 and step 2) and JavaScript authentication (step 3).

Figure 1 JavaScript Anti-Crawler protection process

If JavaScript anti-crawler is enabled when a client sends a request, EdgeSec returns a piece of JavaScript code to the client.

  • If the client sends a normal request to the website, triggered by the received JavaScript code, the client will automatically send the request to EdgeSec again. EdgeSec then forwards the request to the origin server. This process is called JavaScript verification.
  • If the client is a crawler, it cannot be triggered by the received JavaScript code and will not send a request to EdgeSec again. The client fails JavaScript authentication.
  • If a client crawler fabricates an EdgeSec authentication request and sends the request to EdgeSec, the EdgeSec will block the request. The client fails JavaScript authentication.

By collecting statistics on the number of JavaScript challenges and authentication responses, the system calculates how many requests the JavaScript anti-crawler defends. In Figure 2, the JavaScript anti-crawler has logged 18 events, 16 of which are JavaScript challenge responses, and 2 of which are JavaScript authentication responses. Others is the number of EdgeSec authentication requests fabricated by the crawler.

Figure 2 Parameters of a JavaScript anti-crawler protection rule
NOTICE:

EdgeSec only logs JavaScript challenge and JavaScript authentication events. No other protective actions can be configured for JavaScript challenge and authentication.

Procedure

  1. Log in to the management console.
  2. Click in the upper left corner of the page and choose Content Delivery & Edge Computing > CDN and Security.
  3. In the navigation pane on the left, choose Edge Security > Website Settings. The Website Settings page is displayed.
  4. In the Policy column of the row containing the domain name, click the number to go to the Policies page.

    Figure 3 Website list

  5. In the BOT Management configuration area, you can change the Status as required by referring to the Figure 4 and click Configure BOT Mitigation.

    Figure 4 BOT management configuration

  6. On the Bot Management tab, enable the following detection types based on your service requirements: For details about the detection items, see Table 1 and Table 2.

    • Known bot

      Select a detection item, click next to it, and set the protection action. The protection action can be Allow, Log only, JS challenge, or Block. You can set or modify the protection action after you enable a rule.

      You can select multiple BOT names and click Enable, Disable, or Select All Across Pages to enable or disable protection for them in batches. You can also select multiple BOT names and click Setting Protection Action in Batches to set protection actions in batches.
      Table 1 Known BOT detection items

      Detection Item

      Description

      Web Search Engine Bots

      Search engines use web crawlers to aggregate and index online content (such as web pages, images, and other file types). They provide search results in real time.

      Web Scanners

      Virus/Vulnerability scanners detect viruses or vulnerabilities caused by configuration errors or programming defects in network assets. Typical scanners include Nmap, sqlmap, and WPSec.

      Web Scrapers

      Crawler tools or services, such as Scrapy, pyspider, and Prerender, capture any web page and extract the content needed by users.

      Site Monitoring or Web Development Bots

      These bots help web developers monitor the performance of their sites. Apart from DNS resolution errors and other issues, they also check link and domain availability, connection speeds, and web page load times from different geographical locations.

      SEO or BIOR Marketing Bots

      SEO helps websites or web pages rank higher in search engine results. SEO companies often use bots to analyze website content, audience, and competitiveness for online advertising and marketing.

      News, Social Media or Blog Bots

      News and social media platforms offer users trending news and interaction. Companies also use these platforms to engage with consumers about their products or services as part of their marketing efforts. They may use bots to collect data from these platforms for insights.

      Website Screenshot Creator

      These bots capture full-page screenshots of online content, including website posts, social media updates, news articles, forum discussions, and blogs.

      Academic or Research Bots

      Universities and companies use bots to collect data from a range of websites for academic or research purposes. This data collection involves reference searches, semantic analysis, and specialized search engines.

      RSS Feed Reader Bots

      Universities and companies use bots to collect data from a range of websites for academic or research purposes. This data collection involves reference searches, semantic analysis, and specialized search engines.

      Web Archiver Bots

      Wikipedia and other organizations use bots to regularly crawl and archive valuable online information and content copies from the web. These archives are similar to search engine results, but older. They are mainly used for research.

    • Request feature detection

      Select a detection item, click next to it, and set the protection action. The protection action can be Allow, Log only, JS challenge, or Block. You can set or modify the protection action after you enable a rule.

      You can select multiple BOT names and click Enable, Disable, or Select All Across Pages to enable or disable protection for them in batches. You can also select multiple BOT names and click Setting Protection Action in Batches to set protection actions in batches.

      Table 2 Request feature detection items

      Detection Item

      Description

      HTTP request header detection

      --

      Development framework and HTTP library

      Popular development frameworks and HTTP libraries include Apache HttpComponents, OKHttp, Python requests, and Go HTTP client.

      Other

      --

    • BOT behavior detection

      The AI behavior detection button is enabled by default. You can set the behavior detection score and protection action, and click Save to modify the protection rule.

  7. Select the JavaScript tab and configure Status and Protective Action.

    JavaScript anti-crawler is disabled by default. To enable it, click and click OK in the displayed dialog box.

    A JS anti-crawler rule provides three protective actions:

    • Block: After a JavaScript challenge fails, the system immediately blocks the request and records the failure.
    • Log only: After the JavaScript challenge fails, the system only records the failure but does not block the request.
    • Verification code: After the JavaScript challenge fails, a verification code is used for verification.
    NOTICE:
    • Cookies must be enabled and JavaScript supported by any browser used to access a website protected by anti-crawler protection rules.
    • If your service is connected to CDN, exercise caution when using the JS anti-crawler function.

      CDN caching may impact JS anti-crawler performance and page accessibility.

  8. Configure a JavaScript-based anti-crawler rule by referring to Table 3.

    Two protective actions are provided: Protect all requests and Protect specified requests.

    • To protect all requests except requests that hit a specified rule
      Set Protection Mode to Protect all requests. Then, click Exclude Rule, configure the request exclusion rule, and click Confirm.
      Figure 5 Exclude Path
    • To protect a specified request only

      Set Protection Mode to Protect specified requests, click Add Rule, configure the request rule, and click Confirm.

      Figure 6 Add Rule
    Table 3 Parameters of a JavaScript-based anti-crawler protection rule

    Parameter

    Description

    Example Value

    Rule Name

    Name of the rule

    EdgeSec

    Rule Description

    A brief description of the rule. This parameter is optional.

    -

    Effective Date

    Time the rule takes effect.

    Immediate

    Condition List

    Parameters for configuring a condition are described as follows:

    • Field: Select the field you want to protect from the drop-down list. Currently, only Path and User Agent are included.
    • Subfield
    • Logic: Select a logical relationship from the drop-down list.
      NOTE:

      If you select Include any value, Exclude any value, Equal to any value, Not equal to any value, Prefix is any value, Prefix is not any of them, Suffix is any value, or Suffix is not any of them, a reference table must be selected for Content. For details about reference tables, see Creating a Reference Table.

    • Content: Enter or select the content that matches the condition.

    Path Include /admin

    Priority

    Rule priority. If you have added multiple rules, rules are matched by priority. The smaller the value you set, the higher the priority.

    5

Other Operations

  • To modify a rule, click Modify in the row containing the rule.
  • To delete a rule, click Delete in the row containing the rule.

Configuration Example - Logging Script Crawlers Only

To verify that EdgeSec is protecting domain name www.example.com against an anti-crawler rule:

  1. Execute a JavaScript tool to crawl web page content.
  2. On the Feature Library tab, enable Script Tool and select Log only for Protective Action. (If EdgeSec detects an attack, it logs the attack only.)

    Figure 7 Enabling Script Tool

  3. Enable anti-crawler protection.

    Figure 8 BOT management configuration

  4. In the navigation pane on the left, choose Events to go to the Events page.

    Figure 9 Viewing Events - Script crawlers

Configuration Example - Search Engine

The following shows how to allow the search engine of Baidu or Google and block the POST request of Baidu.

  1. Set Status of Search Engine to by referring to the instructions in 5.
  2. Configure a precise protection rule by referring to Configuring a Precise Protection Rule.

    Figure 10 Blocking POST requests

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback