El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Decommissioning and Recommissioning an Instance

Updated on 2024-11-29 GMT+08:00

Scenario

Some role instances provide services for external services in distributed and parallel mode. Services independently store information about whether each instance can be used. Therefore, you need to use FusionInsight Manager to recommission or decommission these instances to change the instance running status.

Some instances do not support the recommissioning and decommissioning functions.

NOTE:
The following roles support decommissioning and recommissioning: HDFS DataNode, Yarn NodeManager, Elasticsearch EsNodeN, ClickHouse ClickHouseServer, IoTDB IoTDBServer, Doris BE, and HBase RegionServer.
  • By default, if the number of the DataNodes is less than or equal to that of HDFS replicas, decommissioning cannot be performed. If the number of HDFS replicas is three and the number of DataNodes is less than four in the system, decommissioning cannot be performed. In this case, an error will be reported and force FusionInsight Manager to exit the decommissioning 30 minutes after FusionInsight Manager attempts to perform the decommissioning.
  • You can enable quick decommissioning before decommissioning DataNodes. In this case, when the number of DataNodes meets the value of dfs.namenode.decommission.force.replication.min, the system decommissions the nodes and adds HDFS copies at the same time. If data is written during quick decommissioning, data may be lost. Exercise caution when performing this operation. The following table lists the parameters related to quick decommissioning. You can search for and view the parameters on the HDFS configuration page on FusionInsight Manager.

    dfs.namenode.decommission.force.enabled: Whether to enable quick decommissioning for DataNode. If this parameter is set to true, the function is enabled.

    dfs.namenode.decommission.force.replication.min: minimum number of available copies of a block required for DataNode quick decommissioning. The value ranges from 1 to 3.

  • During MapReduce task execution, files with 10 replicas are generated. Therefore, if the number of DataNode instances is less than 10, decommissioning cannot be performed.
  • If the number of DataNode racks (the number of racks is determined by the number of racks configured for each DataNode) is greater than 1 before the decommissioning, and after some DataNodes are decommissioned, that of the remaining DataNodes changes to 1, the decommissioning will fail. Therefore, before decommissioning DataNode instances, you need to evaluate the impact of decommissioning on the number of racks to adjust the DataNodes to be decommissioned.
  • If multiple DataNodes are decommissioned at the same time, and each of them stores a large volume of data, the DataNodes may fail to be decommissioned due to timeout. To avoid this problem, it is recommended that one DataNode be decommissioned each time and multiple decommissioning operations be performed.
  • Before decommissioning ClickHouseServer, perform the pre-decommissioning check. The restrictions on decommissioning or recommissioning are as follows:
    • Cluster scale

      If a cluster has only one shard, the instance nodes cannot be decommissioned.

      Multiple instance nodes in the same shard must be decommissioned or recommissioned at the same time.

      The cluster shard information can be queried by running the select cluster,shard_num,replica_num,host_name from system.clusters; SQL statement.

    • Cluster storage space

      Before decommissioning, ensure that the disk space of non-decommissioned nodes is sufficient for storing data of all decommissioned nodes. In addition, the non-decommissioned nodes must have about 10% redundant storage space after decommissioning to ensure that the remaining instances can run properly after decommissioning.

    • Cluster status

      If a faulty ClickHouseServer instance node exists among the nodes to be decommissioned and non-decommissioned nodes in the cluster, all instance nodes cannot be decommissioned.

    • Database

      If a database exists only on an instance node to be decommissioned, the instance node cannot be decommissioned. You need to create the database on all ClickHouseServer instance nodes in the cluster.

      Do not create, delete, or rename a database during the decommissioning process.

    • Local non-replication table

      If a local non-replication table exists only on an instance node to be decommissioned, the instance node cannot be decommissioned. You need to create a local non-replication table with the same name on any node that has not been decommissioned.

      For example, the current cluster has two shards, shard 1 has two nodes A and B, and shard 2 has two nodes C and D. The non-replication table test does not carry the ON CLUSTER keyword when it is created, and the table is created only on node A.

      In this case, nodes A and B in shard 1 need cannot be decommissioned. You need to create the table test on node C or D in shard 2 before decommissioning A and B.

    • Replication table

      If a replication table exists only on some instance nodes in a cluster, the instance nodes cannot be decommissioned. You need to manually create the replication table on all instance nodes where the replication table does not exist in the cluster before decommissioning.

      For example, the current cluster has two shards, shard 1 has two nodes A and B, and shard 2 has two nodes C and D. The replication table test does not carry the ON CLUSTER keyword when it is created, and the table is created only on nodes A and B.

      In this case, nodes A and B in shard 1 need cannot be decommissioned. You need to create the table test on nodes C and D in shard 2 before decommissioning A and B.

    • Distributed table

      Decommissioning does not support automatic migration of distributed tables. You are advised to recreate distributed tables on non-decommissioned nodes before decommissioning. Rebuilding distributed tables on non-decommissioned nodes before decommissioning does not affect decommissioning, but may affect subsequent service operations.

    • Materialized view

      Decommissioning does not support automatic migration of materialized views. You are advised to recreate materialized views on non-decommissioned nodes before decommissioning. If the materialized view of a node to be decommissioned does not display the specified aggregation table but uses an embedded table, the node cannot be decommissioned.

    • Configuration synchronization

      Before and after decommissioning or recommissioning, you need to synchronize the configuration to ensure data consistency.

    • Detached data

      If the table on a node to be decommissioned has been detached and data still exists in the detached directory, the node cannot be decommissioned. You need to perform the attach operation to process the data in the detached directory before decommissioning.

    • Distributed table writes

      Before decommissioning, check whether distributed table writing services exist on the service side. If it exists, stop the distributed table writing service before decommissioning. Otherwise, the decommissioning process will loop and fail.

    • Tables and views

      Do not create, delete, or rename tables or views during decommissioning.

  • If the number of IoTDBServers is less than or equal to the number of region copies configured for the cluster (3 by default), decommissioning cannot be performed.
  • Decommissioning or recommissioning constraints for Doris BE nodes
    • After decommissioning, the remaining normal BE nodes must be no less than the copies of any table. Otherwise, decommissioning will fail.
    • BE node storage space

      Before cluster decommissioning, the disk space of non-decommissioned BE nodes in the cluster must be enough to store data of all BE nodes to be decommissioned. About 10% of the storage space of each non-decommissioned BE node must be reserved after decommissioning to ensure that the remaining instances can run properly.

Procedure

  1. Perform the following steps to perform a health check for the DataNodes before decommissioning:

    1. Log in to the client installation node as a client user and switch to the client installation directory.
    2. For a security cluster, use user hdfs for permission authentication.
      source bigdata_env               #Configure client environment variables.
      kinit hdfs                       #Configure kinit authentication.
      Password for hdfs@HADOOP.COM:    #Enter the login password of user hdfs.
    3. Run the hdfs fsck / -list-corruptfileblocks command, and check the returned result.
      • If "has 0 CORRUPT files" is displayed, go to 2.
      • If the result does not contain "has 0 CORRUPT files" and the name of the damaged file is returned, go to 1.d.
    4. Run the hdfs dfs -rm Name of the damaged file command to delete the damaged file.
      NOTE:

      Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.

  2. Log in to FusionInsight Manager.
  3. Choose Cluster > Services.
  4. Click the specified service name on the service management page. On the displayed page, click the Instance tab.
  5. Select the specified role instance to be decommissioned.
  6. Select Decommission or Recommission from the More drop-down list.

    In the displayed dialog box, enter the password of the current login user and click OK.

    Select I confirm to decommission these instances and accept the consequence of service performance deterioration and click OK to perform the corresponding operation.
    NOTE:

    During the instance decommissioning, if the service corresponding to the instance is restarted in the cluster using another browser, FusionInsight Manager displays a message indicating that the instance decommissioning is stopped, but the operating status of the instance is displayed as Started. In this case, the instance has been decommissioned on the background. You need to decommission the instance again to synchronize the operating status.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback