Performing Rolling Restart

After modifying the configuration items of a big data component, you need to restart the corresponding service to make new configurations take effect. If you use a normal restart mode, all services or instances are restarted concurrently, which may cause service interruption. To ensure that services are not affected during service restart, you can restart services or instances in batches by rolling restart. For instances in active/standby mode, a standby instance is restarted first and then an active instance is restarted. Rolling restart takes longer than normal restart.

Table 1 provides services and instances that support or do not support rolling restart in the MRS cluster.

**Table 1** Services and instances that support or do not support rolling restart
Service	Instance	Support Rolling Restart
Alluxio	AlluxioJobMaster	Yes
Alluxio	AlluxioMaster	Yes
ClickHouse	ClickHouseServer	Yes
ClickHouse	ClickHouseBalancer	Yes
CDL	CDLConnector	Yes
CDL	CDLService	Yes
Flink	FlinkResource	No
Flink	FlinkServer	No
Flume	Flume	Yes
Flume	MonitorServer	Yes
Guardian	TokenServer	Yes
HBase	HMaster	Yes
	RegionServer
	ThriftServer
	RESTServer
HetuEngine	HSBroker	Yes
	HSConsole
	HSFabric
	QAS
HDFS	NameNode	Yes
	Zkfc
	JournalNode
	HttpFS
	DataNode
Hive	MetaStore	Yes
	WebHCat
	HiveServer
Hue	Hue	No
Impala	Impalad	No
	StateStore
	Catalog
IoTDB	IoTDBServer	Yes
Kafka	Broker	Yes
Kafka	KafkaUI	No
Kudu	KuduTserver	Yes
Kudu	KuduMaster	Yes
Loader	Sqoop	No
MapReduce	JobHistoryServer	Yes
Oozie	oozie	No
Presto	Coordinator	Yes
Presto	Worker	Yes
Ranger	RangerAdmin	Yes
	UserSync
	TagSync
Spark	JobHistory	Yes
	JDBCServer
	SparkResource
Storm	Nimbus	Yes
	UI
	Supervisor
	Logviewer
Tez	TezUI	No
Yarn	ResourceManager	Yes
Yarn	NodeManager	Yes
Zookeeper	Quorumpeer	Yes

Restrictions

Perform a rolling restart during off-peak hours.
- Otherwise, a rolling restart failure may occur. For example, if the throughput of Kafka is high (over 100 MB/s) during the Kafka rolling restart, the Kafka rolling restart may fail.
- For example, if the requests per second of each RegionServer on the native interface exceed 10,000 during the HBase rolling restart, you need to increase the number of handles to prevent a RegionServer restart failure caused by heavy loads during the restart.
Before the restart, check the number of current requests of HBase. If the number of requests of each RegionServer on the native interface exceeds 10,000, increase the number of handles to prevent a failure.
If the number of Core nodes in a cluster is less than six, services may be affected for a short period of time.
Preferentially perform a rolling instance or service restart and select Only restart instances whose configurations have expired.

Performing a Rolling Service Restart

Choose Clusters > Active Clusters and click a cluster name to go to the cluster details page.
Click Components and select a service for which you want to perform a rolling restart.
On the Service Status tab page, click More and select Rolling-restart Service.

Figure 1 Service status (MRS 1.9.2 is used as an example)
The Rolling-restart Service page is displayed. Select Only restart instances whose configurations have expired and click OK to perform rolling restart for the service.

Figure 2 Performing a rolling service restart
After the rolling restart task is complete, click Finish.

Figure 3 Finishing the rolling service restart

Performing a Rolling Instance Restart

Choose Clusters > Active Clusters and click a cluster name to go to the cluster details page.
Click Components and select a service for which you want to perform a rolling restart.
On the Instance tab page, select the instance to be restarted. Click More and select Rolling-restart Instance.

Figure 4 Performing a rolling instance restart
After you enter the administrator password, the Rolling-restart Instance page is displayed. Select Only restart instances whose configurations have expired and click OK to perform rolling restart for the instance.
After the rolling restart task is complete, click Finish.

Perform a Rolling Cluster Restart

Choose Clusters > Active Clusters and click a cluster name to go to the cluster details page.
In the upper right corner of the page, choose Management Operations > Perform Rolling Cluster Restart.

Figure 5 Performing a Rolling Restart of a Cluster (Using MRS 1.9.2 as an Example)
The Rolling-restart Cluster page is displayed. Select Only restart instances whose configurations have expired and click OK to perform rolling restart for the cluster.
After the rolling restart task is complete, click Finish.

Rolling Restart Parameter Description

Table 2 describes rolling restart parameters.

**Table 2** Rolling restart parameter description
Parameter	Description
Only restart instances whose configurations have expired	Specifies whether to restart only the modified instances in a cluster.
Enable rack strategy	Whether to enable the concurrent rack rolling restart strategy. This parameter takes effect only for roles that meet the rack rolling restart strategy. (The roles support rack awareness, and instances of the roles belong to two or more racks.) NOTE: This parameter is configurable only when a rolling restart is performed on HDFS and YARN in MRS 3.x or later.
Data Node Instances to Be Batch Restarted	Specifies the number of instances that are restarted in each batch when the batch rolling restart strategy is used. The default value is 1. The value ranges from 1 to 20. This parameter is valid only for data nodes.
Batch Interval	Specifies the interval between two batches of instances for rolling restart. The default value is 0. The value ranges from 0 to 2147483647. The unit is second. Note: Setting the batch interval parameter can increase the stability of the big data component process during the rolling restart. You are advised to set this parameter to a non-default value, for example, 10.
Decommissioning Timeout Interval	Decommissioning interval for role instances during a rolling restart.
Batch Fault Tolerance Threshold	Specifies the tolerance times when the rolling restart of instances fails to be executed in batches. The default value is 0, which indicates that the rolling restart task ends after any batch of instances fails to be restarted. The value ranges from 0 to 2147483647.

Procedure in a Typical Scenario

Choose Clusters > Active Clusters and click a cluster name to go to the cluster details page.
Click Components and select HBase. The HBase service page is displayed.
Click the Service Configuration tab, modify an HBase parameter, and save the configuration as prompted.

In versions earlier than MRS 3.x, do not select Restart the affected services or instances. This option indicates a normal restart. If you select this option, all services or instances will be restarted, which may cause service interruption.
After saving the configurations, click Finish.
Click the Service Status tab.
On the Service Status tab page, click More and select Rolling-restart Service.

Figure 6 Service status - rolling restart (using MRS 1.9.2 as an example)
After you enter the administrator password, the Rolling-restart Service page is displayed. Select Only restart instances whose configurations have expired and click OK to perform rolling restart.

Figure 7 Configuring the rolling service restart
After the rolling restart task is complete, click Finish.

Figure 8 Finishing the rolling service restart