El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring HDFS DiskBalancer

Updated on 2024-11-29 GMT+08:00

Scenario

DiskBalancer is an online disk balancer that balances disk data on running DataNodes based on various indicators. It works in the similar way of the HDFS Balancer. The difference is that HDFS Balancer balances data between DataNodes, while HDFS DiskBalancer balances data among disks on a single DataNode.

Data among disks may be unevenly distributed if a large number of files have been deleted from a cluster running for a long time, or disk capacity expansion is performed on a node in the cluster. Uneven data distribution may deteriorate the concurrent read/write performance of the HDFS, or cause service failure due to inappropriate HDFS write policies. In this case, the data density among disks on a node needs to be balanced to prevent heterogeneous small disks from becoming the performance bottleneck of the node.

Configuration Description

Go to the All Configurations page of HDFS and enter a parameter name in the search box by referring to Modifying Cluster Service Configuration Parameters.

Table 1 Parameter description

Parameter

Description

Default Value

dfs.disk.balancer.auto.enabled

Indicates whether to enable the HDFS DiskBalancer function. The default value is false, indicating that this function is disabled.

false

dfs.disk.balancer.auto.cron.expression

CRON expression of the HDFS disk balancing operation, which is used to control the start time of the balancing operation. This parameter is valid only when dfs.disk.balancer.auto.enabled is set to true. The default value is 0 1 * * 6, indicating that tasks are executed at 01:00 every Saturday. For details about cron expression, see Table 2. The default value indicates that the DiskBalancer check is executed at 01:00 every Saturday.

0 1 * * 6

dfs.disk.balancer.max.disk.throughputInMBperSec

Specifies the maximum disk bandwidth that can be used for disk data balancing. The unit is MB/s, and the default value is 10. Set this parameter based on the actual disk conditions of the cluster.

10

dfs.disk.balancer.max.disk.errors

Specifies the maximum number of errors that are allowed in a specified movement process. If the value exceeds this threshold, the movement fails.

5

dfs.disk.balancer.block.tolerance.percent

Specifies the difference threshold between the data storage capacity and optimal status of each disk during data balancing among disks. For example, the ideal data storage capacity of each disk is 1 TB, and this parameter is set to 10. When the data storage capacity of the target disk reaches 900 GB, the storage status of the disk is considered as perfect. Value range: 1 to 100.

10

dfs.disk.balancer.plan.threshold.percent

Specifies the data density difference that is allowed between two disks during disk data balancing. If the absolute value of the data density difference between any two disks exceeds the threshold, data balancing is required. Value range: 1 to 100.

10

dfs.disk.balancer.top.nodes.number

Specifies the top N nodes whose disk data needs to be balanced in the cluster.

5

To use this function, set dfs.disk.balancer.auto.enabled to true and configure a proper CRON expression. Set other parameters based on the cluster status.

Table 2 CRON expressions

Column

Description

1

Minute. The value ranges from 0 to 59.

2

Hour. The value ranges from 0 to 23.

3

Date. The value ranges from 1 to 31.

4

Month. The value ranges from 1 to 12.

5

Week. The value ranges from 0 to 6. 0 indicates Sunday.

Use Restrictions

  1. Data can only be moved between disks of the same type. For example, data can only be moved between SSDs or between DISKs.
  2. Enabling this function occupies disk I/O resources and network bandwidth resources of involved nodes. Enable this function in off-peak hours.
  3. The DataNodes specified by the dfs.disk.balancer.top.nodes.number parameter are frequently calculated. Therefore, set the parameter to a small value.
  4. Commands for using the DiskBalancer function on the HDFS client are as follows:
    Table 3 DiskBalancer commands

    Syntax

    Description

    hdfs diskbalancer -report -top <N>

    Set N to an integer greater than 0. This command can be used to query the top N nodes that require disk data balancing in the cluster.

    hdfs diskbalancer -plan <Hostname| IP Address>

    This command can be used to generate a JSON file based on the DataNode. The file contains information about the source disk, target disk, and blocks to be moved. In addition, this command can be used to specify other parameters such as the network bandwidth.

    hdfs diskbalancer -query <Hostname:$dfs.datanode.ipc.port>

    The default port number of the cluster is 9867. This command is used to query the running status of the DiskBalancer task on the current node.

    hdfs diskbalancer -execute <planfile>

    In this command, planfile indicates the JSON file generated in the second command. Use the absolute path.

    hdfs diskbalancer -cancel <planfile>

    This command is used to cancel the running planfile. Use the absolute path.

NOTE:
  • Users running this command on the client must have the supergroup permission. You can use the system user hdfs of the HDFS service. For details about the initial password, contact the system administrator to obtain. Alternatively, you can create a user with the supergroup permission in the cluster and then run the command.
  • Only formats and usage of commands are provided in Table 3. For more parameters to be configured for each command, run the hdfs diskbalancer -help <command> command to view detailed information.
  • When you troubleshoot performance problems during the cluster O&M, check whether the HDFS disk balancing occurs in the event information of the cluster. If yes, check whether DiskBalancer is enabled in the cluster.
  • After the automatic DiskBalancer function is enabled, the ongoing task stops only after the current data balancing is complete. The task cannot be canceled during the balancing.
  • You can manually specify certain nodes for data balancing on the client.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback