Help Center > > User Guide> High-Risk Operations Overview

High-Risk Operations Overview

Updated at: Mar 25, 2021 GMT+08:00

Forbidden Operations

Table 1 lists forbidden operations during cluster operation and maintenance.

Table 1 Forbidden operations

Category

Risk

Deleting ZooKeeper data directories

HDFS, Yarn, HBase, and Hive depend on ZooKeeper, which stores metadata. This operation has adverse impact on normal operating of related components.

Switching between active and standby JDBCServer nodes frequently

This operation may interrupt services.

Deleting Phoenix system tables and data (SYSTEM.CATALOG, SYSTEM.STATS, SYSTEM.SEQUENCE, and SYSTEM. FUNCTION)

This operation will cause service operation failures.

Manuallying modify Hive metadatabase data (hivemeta)

This operation may cause Hive data parse errors. As a result, Hive cannot provide services.

Changing permission on the Hive private file directory hdfs:///tmp/hive-scratch

This operation may cause unavailable Hive services.

Changing broker.id in the Kafka configuration file

This operation may cause invalid node data.

Modifying the host name of nodes

Instances and upper-layer components on the host cannot provide services properly. The fault cannot be rectified.

Reinstalling the OS of a node

This operation will cause MRS cluster exceptions, leaving MRS clusters in abnormal status.

Using private images

This operation will cause MRS cluster exceptions, leaving MRS clusters in abnormal status.

The following tables list the high-risk operations during the operation and maintenance of each component.

High-Risk Operations on a Cluster

Table 2 High-Risk Operations on a Cluster

Operation

Risk

Risk Level

Workaround

Check Item

Modifying the file directory or file permissions of user omm without permission

This operation will lead to MRS service unavailability.

▲▲▲▲▲

Do not perform this operation.

Check whether the MRS cluster service is available.

Binding an EIP

This operation exposes the Master node where MRS Manager of the cluster resides to the public network, increasing the risk of network attacks from the Internet.

▲▲▲▲▲

Ensure that the bound EIP is a trusted public IP address.

None

Enabling security group rules for port 22 of a cluster

This operation increases the risk of exploiting vulnerability of port 22.

▲▲▲▲▲

Configure a security group rule for port 22 to allow only trusted IP addresses to access the port. You are not advised to configure the inbound rule to allow 0.0.0.0 to access the port.

None

Deleting a cluster or deleting cluster data

Data will get lost.

▲▲▲▲▲

Before deleting the data, confirm the necessity of the operation and ensure that the data has been backed up.

None

Scaling in a cluster

Data will get lost.

▲▲▲▲▲

Before scaling in the cluster, confirm the necessity of the operation and ensure that the data has been backed up.

None

Detaching or formatting a data disk

Data will get lost.

▲▲▲▲▲

Before performing this operation, confirm the necessity of the operation and ensure that the data has been backed up.

None

Manager High-Risk Operations

Table 3 Manager High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Change the OMS password.

This operation will restart all processes of OMSServer, which has adverse impact on cluster maintenance and management.

▲▲▲

Before the change, confirm that the operation is needed. Ensure that there is no other maintenance and management operations when the operation is performed.

Check whether all alarms are cleared and whether cluster maintenance and management operations are normal.

Import the certificate.

This operation will restart OMS processes and the entire cluster, which has adverse impact on cluster maintenance and management and services.

▲▲▲

Before the change, confirm that the operation is needed. Ensure that there is no other maintenance and management operations when the operation is performed.

Check whether all alarms are cleared, whether cluster maintenance and management operations are normal, and whether services are normal.

Perform an upgrade.

This operation will restart Manager and the entire cluster, which has adverse impact on cluster maintenance and management and services.

Strictly manage the user who is eligible to assign the cluster management permission to prevent security risks.

▲▲▲

Ensure that there is no other maintenance and management operations when the operation is performed.

Check whether all alarms are cleared, whether cluster maintenance and management operations are normal, and whether services are normal.

Restore the OMS.

This operation will restart Manager and the entire cluster, which has adverse impact on cluster maintenance and management and services.

▲▲▲

Before the operation, confirm that the operation is needed. Ensure that there is no other maintenance and management operations when the operation is performed.

Check whether all alarms are cleared, whether cluster maintenance and management operations are normal, and whether services are normal.

Change an IP address.

This operation will restart Manager and the entire cluster, which has adverse impact on cluster maintenance and management and services.

▲▲▲

Ensure that there is no other maintenance and management operations when the operation is performed and that the new IP address is correct.

Check whether all alarms are cleared, whether cluster maintenance and management operations are normal, and whether services are normal.

Change log levels.

If the log level is changed to DEBUG, Manager responds slowly.

▲▲

Before the change, confirm that the operation is needed. Change the level value to the default value.

None

Replacing a Control Node

This operation will interrupt services deployed on the node. If the node also serves as a management node, the operation will restart all OMS processes, affecting the cluster management and maintenance.

▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether uncleared alarms exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Replacing a management node

This operation will interrupt services deployed on the node. As a result, OMS processes will be restarted, affecting the cluster management and maintenance.

▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether uncleared alarms exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Selecting Restart upper-layer services. during the restart of a lower-layer service

This operation will interrupt the upper-layer service, affecting the management, maintenance, and services of the cluster.

▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether uncleared alarms exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Modifying the OLDAP port

This operation will restart the LdapServer and Kerberos services and all associated services, affecting service running.

▲▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

None

User delete the supergroup group.

Deleting the supergroup group decreases user rights, affecting service access.

▲▲▲▲▲

Before the change, confirm the rights to be added. Ensure that the required rights have been added before deleting the supergroup rights to which the user is bound, ensuring service continuity.

None

Restart a service.

Services will be interrupted during the restart. If you select and restart the upper-layer service, the upper-layer services that depend on the service will be interrupted.

▲▲▲

Confirm the necessity of restarting the system before the operation.

Check whether alarms that are not cleared exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Change the default SSH port No.

After the default port (22) is changed, functions such as cluster creation, service/instance adding, host adding, and host reinstallation cannot be used, and results of cluster health check items for node mutual trust, omm/ommdba user password expiration, and others are incorrect.

▲▲▲

Change the SSH port to the default value 22 before performing related operations.

None

DBService High-Risk Operations

Table 4 DBService High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Changing the DBService password

The services need to be restarted for the password change to take effect. The services are unavailable during the restart.

▲▲▲▲

Confirm the necessity of changing the password, and ensure no other O&M operations are performed when the password is changed.

Check whether there are alarms that are not cleared and whether the cluster management and maintenance are normal.

Restoring DBService Data

After the data is restored, the data generated between the backup point in time and the restoration point in time is lost. After the data is restored, the configuration of the components that depend on DBService may expire and these components need to be restarted.

▲▲▲▲

Confirm the necessity of restoring data, and ensure no other O&M operations are performed when data is restored.

Check whether there are alarms that are not cleared and whether the cluster management and maintenance are normal.

Performing active/standby DBService switchover

During the DBServer switchover, DBService is unavailable.

▲▲

Confirm the necessity of performing active/standby DBService switchover, and ensure no other O&M operations are performed during the switchover.

None

Changing the DBService floating IP address

The DBService needs to be restarted for the change to take effect. The DBService is unavailable during the restart. If the floating IP address has been used, the configuration will fail, and the DBService will fail to be started.

▲▲▲▲

Strictly follow the configuration guide, and make sure that the new floating IP address is valid.

Check whether DBService is properly started.

Flink High-Risk Operations

Table 5 Flink High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Changing the log level

If the log level is modified to DEBUG, the task running performance is affected.

▲▲

Before the modification, confirm the necessity of the operation and change it back to the default log level in time.

None

Modifying file permissions

Tasks may fail.

▲▲▲

Confirm the necessity of the operation before the modification.

Check whether related service operations are normal.

Flume High-Risk Operations

Table 6 Flume High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify the Flume instance start parameter GC_OPTS.

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Change the default value of dfs.replication from 3 to 1.

This operation will have the following impacts:

1. The storage reliability deteriorates. If the disk becomes faulty, data will be lost.

2. NameNode fails to be restarted, and the HDFS service is unavailable.

▲▲▲▲

When modifying related configuration items, check the parameter description carefully. Ensure that there are more than two replicas for data storage.

Check whether the default replica number is not 1 and whether the HDFS service is normal.

HBase High-Risk Operations

Table 7 HBase High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify encryption configuration.

  • hbase.regionserver.wal.encryption
  • hbase.crypto.keyprovider.parameters.uri
  • hbase.crypto.keyprovider.parameters.encryptedtext

This operation may cause service start abnormality.

▲▲▲▲

Strictly follow the prompt information when modifying related configuration items, which are associated. Ensure that new values are valid.

Check whether services can be started properly.

Change the value of hbase.regionserver.wal.encryption to false or switch encryption algorithm from AES to SMS4.

This operation may cause start failures and data loss.

▲▲▲▲

When HFile and WAL are encrypted using an encryption algorithm and a table is created, do not close or switch the encryption algorithm randomly.

If an encryption table (ENCRYPTION=>AES/SMS4) is not created, you can only switch the encryption algorithm.

None

Modify HBase instance start parameter GC_OPTS and HBASE_HEAPSIZE.

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid. GC_OPTS does not conflict with HBASE_HEAPSIZE.

Check whether services can be started properly.

Use OfflineMetaRepair tool

This operation may cause service start abnormality.

▲▲▲▲

This command can be used only when HBase is offline and cannot be used in data migration scenarios.

Check whether HBase services can be started properly.

HDFS High-Risk Operations

Table 8 HDFS High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Change HDFS NameNode data storage directory dfs.namenode.name.dir

and data configuration directory dfs.datanode.data.dir.

This operation may cause service start abnormality.

▲▲▲▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Use the -delete parameter when you run the hadoop distcp command.

During DistCP copying, files that do not exist in the source cluster but exist in the destination cluster are deleted from the destination cluster.

▲▲

When using DistCP, determine whether to retain the redundant files in the destination cluster. Exercise caution when using the -delete parameter.

After DistCP copying is complete, check whether the data in the destination cluster is retained or deleted according to the parameter settings.

Modify the HDFS instance start parameter GC_OPTS, HADOOP_HEAPSIZE, and GC_PROFILE.

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid. GC_OPTS does not conflict with HADOOP_HEAPSIZE.

Check whether services can be started properly.

Change the default value of dfs.replication from 3 to 1.

This operation will have the following impacts:

1. The storage reliability deteriorates. If the disk becomes faulty, data will be lost.

2. NameNode fails to be restarted, and the HDFS service is unavailable.

▲▲▲▲

When modifying related configuration items, check the parameter description carefully. Ensure that there are more than two replicas for data storage.

Check whether the default replica number is not 1 and whether the HDFS service is normal.

Change the RPC channel encryption mode of each module in Hadoop.

This operation causes service faults and service exceptions.

▲▲▲▲▲

Strictly follow the configuration guide, and make sure that the modified value is valid.

Check whether HDFS and other services that depend on HDFS can properly start and provide services.

Hive High-Risk Operations

Table 9 Hive High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify the Hive instance start parameter GC_OPTS.

This operation may cause Hive instance start failures.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Delete all MetaStore instances.

This operation may cause Hive metadata loss. As a result, Hive cannot provide services.

▲▲▲

Do not perform this operation unless ensure that Hive table information can be discarded.

Check whether services can be started properly.

Delete or modify files corresponding to Hive tables over HDFS interfaces or HBase interfaces.

This operation may cause Hive service data loss or tampering.

▲▲

Do not perform this operation unless ensure that the data can be discarded or that the operation meets service requirements.

Check whether Hive data is complete.

Delete or modify files corresponding to Hive tables or directory access permission over HDFS interfaces or HBase interfaces.

This operation may cause unavailable service scenarios.

▲▲▲

Do not perform this operation.

Check whether related service operations are normal.

Delete or modify hdfs:///apps/templeton/hive-3.1.0.tar.gz over HDFS interfaces.

WebHCat fails to perform services due to this operation.

▲▲

Do not perform this operation.

Check whether related service operations are normal.

Export table data to overwrite the data at the local. For example, export the data of t1 to /opt/dir.

insert overwrite local directory '/opt/dir' select * from t1;

This operation will delete target directories. Incorrect setting may cause software or OS startup failures.

▲▲▲▲▲

Ensure that the path where the data is written does not contain any files or do not use the key word overwrite in the command.

Check whether files in the target path are lost.

Direct different databases, tables, or partition files to the same path, for example, default warehouse path /user/hive/warehouse.

The creation operation may cause disordered data. After a database, table, or partition is deleted, other object data will be lost.

▲▲▲▲▲

Do not perform this operation.

Check whether files in the target path are lost.

Kafka High-Risk Operations

Table 10 Kafka High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Delete Topic

This operation may delete existing topics and data.

▲▲▲

Kerberos authentication is used to ensure that authenticated users have operation permissions. Ensure that topic names are correct.

Check whether topics are processed properly.

Delete data directories.

This operation may cause service information loss.

▲▲▲

Do not delete data directories manually.

Check whether data directories are normal.

Modify data directory content (file and folder creation).

This operation may cause the Broker instance of the node faults.

▲▲▲

Do not create or modify files or folders in the data directories manually.

Check whether data directories are normal.

Modify the disk auto-adaptation function using the disk.adapter.enable parameter.

This operation adjusts the topic data retention period when the disk usage reaches the threshold. Historical data that does not fall within the storage retention may be deleted.

▲▲▲

If the retention period of some topics cannot be adjusted, add this topic to the value of disk.adapter.topic.blacklist.

Observe the data storage period on the Kafka topic monitoring page.

Modify data directory log.dirs configuration.

Incorrect operation may cause process faults.

▲▲▲

Ensure that the added or modified data directories are empty and that the directory permissions are right.

Check whether data directories are normal.

Start or stop basic components independently.

This operation has adverse impact on the basic functions of some services. As a result, service failures occur.

▲▲▲

Do not start or stop ZooKeeper, Kerberos, and LDAP basic components independently. Select related services when performing this operation.

Check whether services are operating normally.

Restart or stop services.

This operation may interrupt services.

▲▲

Restart or stop services if necessary.

Check whether services are operating normally.

Modify configuration parameters.

This operation requires service restart for configuration to take effect.

▲▲

Modify configuration if necessary.

Check whether services are operating normally.

Deleting or modifying metadata

Modifying or deleting Kafka metadata on ZooKeeper may cause the Kafka topic or service unavailability.

▲▲▲

Do not delete or modify Kafka metadata stored on ZooKeeper.

Check whether the Kafka topics or Kafka service is available.

Deleting metadata backup files

After Kafka metadata backup files are modified and used to restore Kafka metadata, Kafka topics or the Kafka service may be unavailable.

▲▲▲

Do not modify metadata backup files.

Check whether the Kafka topics or Kafka service is available.

KrbServe High-Risk Operations

Table 11 KrbServe High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify the KADMIN_PORT parameter of KrbServer.

After this parameter is modified, if the KrbServer service and its associated services are not restarted in a timely manner, the configuration of KrbClient in the cluster is abnormal and the service running is affected.

▲▲▲▲▲

After this parameter is modified, restart the KerberosServer service and all its associated services.

None

Modify the kdc_ports parameter of KrbServer.

After this parameter is modified, if the KrbServer service and its associated services are not restarted in a timely manner, the configuration of KrbClient in the cluster is abnormal and the service running is affected.

▲▲▲▲▲

After this parameter is modified, restart the KrbServer service and all its associated services.

None

Modify the KPASSWD_PORT parameter of KrbServer.

After this parameter is modified, if the KrbServer service and its associated services are not restarted in a timely manner, the configuration of KrbClient in the cluster is abnormal and the service running is affected.

▲▲▲▲▲

After this parameter is modified, restart the KrbServer service and all its associated services.

None

Modify the domain name of Manager system.

After the domain name is modified, if the KrbServer service and its associated services are not restarted in a timely manner, the configuration of KrbClient in the cluster is abnormal and the service running is affected.

▲▲▲▲▲

After this parameter is modified, restart the KrbServer service and all its associated services.

None

Configuring cross-cluster mutual trust relationships

This operation will restart the KrbServer service and all associated services, affecting the management and maintenance and services of the cluster.

▲▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether alarms that are not cleared exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

LdapServer High-Risk Operations

Table 12 LdapServer High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify the LDAP_SERVER_PORT parameter of LdapServer.

After this parameter is modified, if the LdapServer service and its associated services are not restarted in a timely manner, the configuration of LdapClient in the cluster is abnormal and the service running is affected.

▲▲▲▲▲

After this parameter is modified, restart the LdapServerservice and all its associated services.

None

Restoring LdapServer data

This operation will restart FusionInsight Manager and the entire cluster, affecting the management and maintenance and services of the cluster.

▲▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether alarms that are not cleared exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Replacing the Node where LdapServer resides

This operation will interrupt services deployed on the node. If the node is a management node, the operation will restart all OMS processes, affecting the cluster management and maintenance.

▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

Check whether alarms that are not cleared exist, and whether the management and maintenance of the cluster are normal and whether services are normal.

Changing the password of LdapServer

The LdapServer and Kerberos services need to be restarted during the password change, affecting the management, maintenance, and services of the cluster.

▲▲▲▲

Before performing the operation, ensure that the operation is necessary, and that no other management and maintenance operations are performed at the same time.

None

Restarting the node causes LdapServer data damage

Restarting the node without stopping the LdapServer service may cause LdapServer data damage.

▲▲▲▲▲

Restore LdapServer using LdapServer backup data

None

Loader High-Risk Operations

Table 13 Loader High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Change the floating IP address of a Loader instance (loader.float.ip).

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether the Loader UI can be connected properly.

Modify the Loader instance start parameter LOADER_GC_OPTS.

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Clear table contents when adding data to HBase.

This operation will clear original data in the target table.

▲▲

Ensure that the contents in the target table can be cleared before the operation.

Check whether the contents in the target table can be cleared before the operation.

Spark High-Risk Operations

Spark high-risk operations apply to MRS 2.1.0 and earlier versions.

Table 14 Spark High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modifying the configuration item (spark.yarn.queue, spark.driver.extraJavaOptions)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (spark.yarn.cluster.driver.extraJavaOptions)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (spark.eventLog.dir)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (SPARK_DAEMON_JAVA_OPTS)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Deleting all JobHistory2x instances

The event logs of historical applications are lost.

▲▲

Reserve at least one JobHistory instance.

Check whether historical application information is included in JobHistory.

Deleting or modifying /user/spark/lib/6.5.1/spark-assembly-1.5.1-hadoop3.1.1.zip

JDBCServer fails to be started and service functions are abnormal.

▲▲▲

Delete /user/spark2x/jars/8.1.0/spark-archive-2x.zip, and wait for 10-15 minutes until the .zip package is automatically restored.

Check whether the services are started properly.

Spark2x High-Risk Operations

Table 15 Spark2x High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modifying the configuration item (spark.yarn.queue)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (spark.driver.extraJavaOptions)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (spark.yarn.cluster.driver.extraJavaOptions)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (spark.eventLog.dir)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Modifying the configuration item (SPARK_DAEMON_JAVA_OPTS)

Services fail to be started.

▲▲

When modifying related configuration items, ensure that the new values are valid.

Check whether the services are started properly.

Deleting all JobHistory2x instances

The event logs of historical applications are lost.

▲▲

Reserve at least one JobHistory2x instance.

Check whether historical application information is included in JobHistory2x.

Deleting or modifying /user/spark2x/jars/8.1.0/spark-archive-2x.zip

JDBCServer2x fails to be started and service functions are abnormal.

▲▲▲

Delete /user/spark2x/jars/8.1.0/spark-archive-2x.zip, and wait for 10-15 minutes until the .zip package is automatically restored.

Check whether the services are started properly.

Storm High-Risk Operations

Table 16 Storm High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Modify the following plug-in related configuration items:

storm.scheduler

nimbus.authorizer

storm.thrift.transport

nimbus.blobstore.class

nimbus.topology.validator

storm.principal.tolocal

This operation may cause service startup abnormality.

▲▲▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that the class names exist and are valid.

Check whether services can be started properly.

Modify the following startup parameters of Storm instances:

GC_OPTS

NIMBUS_GC_OPTS

SUPERVISOR_GC_OPTS

UI_GC_OPTS

LOGVIEWER_GC_OPTS

This operation may cause service startup abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Modify the configuration parameter resource.aware.scheduler.user.pool of the user's resource pool.

Services cannot run properly.

▲▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that resources allocated to each user are appropriate and valid.

Check whether services can be started and run properly

Changing data directories

If this operation is not properly performed, services may be abnormal and unavailable.

▲▲▲▲

Do not manually change data directories.

Check whether the related data directories are normal.

Restarting services or instances

The service will be interrupted for a short period of time, and ongoing operations will be interrupted.

▲▲▲

Restart services or instances when necessary.

Check whether the service is running properly and whether interrupted operations are restored.

Synchronizing configurations (by restarting the required service)

The service will be restarted, resulting in temporary service interruption. If Supervisor is restarted, ongoing operations will be interrupted for a short period of time.

▲▲▲

Modify configurations when necessary.

Check whether the service is running properly and interrupted operations are restored.

Stopping services or instances

The service will be stopped, and related operations will be interrupted.

▲▲▲

Stop services when necessary.

Check whether the services are properly stopped.

Deleting or modifying metadata

If Nimbus metadata is deleted, services are abnormal and ongoing operations are lost.

▲▲▲▲▲

Do not manually delete Nimbus metadata files.

Check whether Nimbus metadata files are normal.

Modifying file permissions

If permissions on the metadata and log directories are incorrectly modified, service exceptions may occur.

▲▲▲▲

Do not manually modify file permissions.

Check whether the permissions on the data and log directories are correct.

Deleting topologies

Topologies in use will be deleted.

▲▲▲▲

Delete topologies when necessary.

Check whether the topologies are successfully deleted.

Yarn High-Risk Operations

Table 17 Yarn High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Delete or change data directories

yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs

This operation may cause service information loss.

▲▲▲

Do not delete data directories manually.

Check whether data directories are normal.

ZooKeeper High-Risk Operations

Table 18 Zookeeper High-Risk Operations

Operation

Risk

Risk Level

Workaround

Check Item

Delete or change ZooKeeper data directories.

This operation may cause service information loss.

▲▲▲

Follow the capacity expansion guide to change the ZooKeeper data directories.

Check whether services and associated components are started properly.

Modify the ZooKeeper instance start parameter GC_OPTS.

This operation may cause service start abnormality.

▲▲

Strictly follow the prompt information when modifying related configuration items. Ensure that new values are valid.

Check whether services can be started properly.

Modify the znode ACL information in ZooKeeper.

If znode permission is modified in ZooKeeper, other users may have no permission to access the znode and some system functions are abnormal.

▲▲▲▲

During the modification, strictly follow the ZooKeeper Configuration Guide and ensure that other components can use ZooKeeper properly after ACL information modification.

Check that other components that depend on ZooKeeper can properly start and provide services.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel