MRS 3.2.0-LTS.1 Patch Description

Basic Information About MRS 3.2.0-LTS.1.6

**Table 1** Basic information
Patch Version	MRS 3.2.0-LTS.1.6
Release Date	2024-02-04
Pre-Installation Operations	If an MRS cluster node is faulty or the network is disconnected, isolate the node first. Otherwise, the patch installation will fail.
New Features	CDL supports uppercase letters of table fields. Sharding key can be specified when flink writes data to the NetEase DDB. Flink supports writing upsertkafka monitoring data into influxdb. The message retention time and message stacking time of Flink stream read of Hudi can be monitored. Flink supports the ignoreDelete feature. YARN NodeManager supports graceful decommissioning. Kafka supports data encryption. Spark supports subquery fields without aggregate functions. (Set spark.sql.legacy.correlated.scalar.query.enabled to true.) Spark supports view and chart permission control. (Add spark.ranger.plugin.viewaccesscontrol.enable to the custom parameters of JDBCserver and set it to true.) Add spark.ranger.plugin.viewaccesscontrol.enable=true to the Spark2x/spark/conf/spark-defaults.conf configuration file in the client directory and restart the JDBCserver instance.
Resolved Issues	List of issues resolved in MRS 3.2.0-LTS.1.6: No data is written into the Hudi table when the start time parameter is added to a CDL job. Residual connector threads cannot be killed when CDL is restarted. The error logs of the Clickhouse balance are not rolled back. It takes a long time to import ClickHouse data to the database. The ClickHouse TTL does not take effect. The Hive client cannot be connected due to too many ClickHouse and ZooKeeper connections. OOM is reported when columns are added to the base table of the ClickHouse materialized view. Executor ages jobs based on the job end time. The UDF fails to be updated because the package name of the UDF uploaded on the Flink UI is not changed. The pagination display is abnormal after UDFs are uploaded to Flink. The displayed time for Flink stream read of Hudi data is 8 hours shorter than the actual time. An error is reported for a kafka connection that uses FlinkServer to create SASL_SSL authentication. When Flink interconnects with Guardian and OBS is enabled, Flink jobs occasionally fail to be restarted. Jobs submitted by DataArts Studio cannot be restored from checkpoints. When FlinkServer is used to submit a job, a message is displayed indicating that the job fails to be submitted, but the job on YARN is in RUNNING state. The StackOverFlow error is reported when a Flink SQL job is submitted using DataArts Studio. The value of Flink security.ssl.encrypt.enabled is changed to TRUE and cannot be changed, which affects the task startup speed. When a DataArts Studio job fails and is retried, the job is consumed from the beginning instead of being started from the checkpoint. An error is reported when a normal cluster uses REST APIs to call FlinkServer jobs or actions. The ClickHouse job cannot be submitted when the Flink JAR package is written. The hint parameters of Flink Joins cannot be deleted. The Flink job error information is incorrect, and the number of error lines displayed in the log is inconsistent with the number of SQL error lines of the job. Hudi data fails to be written to a table in Flink streaming, and the Hudi job fails to be submitted and executed. The Flinkserver SQL verification fails. Keytab and principal cannot be configured by -yD or -D on the Flink client. When the Flink operator chain is opened, the sink operator is captured by the source operator in exception, but the job TM does not fail. The specific job name is not displayed in MRS real-time alarm notifications. The Flume custom time interceptor does not take effect. Single-Node capacity expansion fails occasionally due to NodeManager startup failure during autoscaling. In security mode, the HDFS user cannot be used to deliver balancing tasks. The available resource metrics of the YARN resource pool are abnormal. As a result, auto scaling is triggered abnormally. Disks of the NodeManager node are full. As a result, members in the resource pool are migrated to the default resource pool. An alarm indicating that the number of dead DataNodes exceeds the threshold cannot be automatically cleared. An alarm indicating that the service is unavailable is misreported when the HBase health check script times out. After a column is added using HetuEngine alter, no data is available in the old partition when data is inserted. When HetuEngine or Flink writes data, there is a possibility that HetuEngine reads no data from the RO table after compaction. The HDFS log collection package is missing. HetuEngine fails to execute SQL statements when using HSFabric to connect JDBC. An error is reported when the Hive on tez overwrite partitioning result table is empty. Hive now can integrate with the DataArts metadata synchronization plug-in package. Hive Jobs frequently report errors after the last access time of Hive metadata is configured. The batch deletion parameter does not take effect in Hive alter table test drop partition (partition <'xxxx');. Parameters such as &useSSL=false do not take effect when Hive JDBC is connected. In high-concurrency scenarios, Hive on Spark jobs fail due to JAR package submission timeout. Duplicate bucket IDs exist when CDL writes data to the Hudi table. Hudi compaction runs faster than clean. As a result, data cannot be read. The "xx is not a Parquet file" exception is reported when Spark reads Hudi because abnormal files are not cleared after a compaction task fails. When a Spark job reads an upstream database table, Executor reports an error indicating that files in the OBS .schema directory of the table cannot be found. The Hudi compaction schedule is optimized. A plan is generated based on the last compaction action. By default, Hudi retains 5 GB compressed files in archive. By default, clean does not move data to the recycle bin in Hudi OBS. Hudi archives more clean and rollback operations to reduce the number of metadata files. Archiving cannot be triggered if clean is not executed for Hudi COW. The table directory is deleted when the Hudi uses Spark to generate data in the batch table and supplement table. The namespaces generated for the Hudi table are inconsistent. As a result, the update fails. The call command is added to Hudi to clear invalid metadata files. Instant data is deleted when the Hudi DELETE_EMPTY_INSTANT function is abnormal. Data is lost in Flink append mode. Historical data is not cleared after being stored for 30 days (default retention time). The real-time monitoring data on FusionInsight Manager is empty after the time zone is changed to UTC. The global user policy content under Tenant Resources cannot be displayed on multiple pages on FusionInsight Manager. The yarn on cce button is added to the FusionInsight Manager page. The Tomcat configuration needs to be manually modified when a new cluster accesses the FusionInsight Manager page through a private line. FusionInsight Manager cannot be interconnected with the O&M page. False alarms are reported in an MRS cluster. The monitoring metrics are abnormal when the timeframe of the queried data spans multiple days. The CPU usage of the kernel space on FusionInsight Manager is incorrect. When a user switches to the distribution chart on the host cable management monitoring page of FusionInsight Manager and then change the time frame, the default value range of the icon is incorrect. The GaussDB process of the active OMS occupies a large amount of memory. After the custom configuration of the FusionInsight Manager component is added, the added custom configuration of instances is not displayed. The PMS monitoring process keeps restarting. The alarm indicating that the trust relationship between nodes is invalid is falsely reported during node scale-out. Data synchronization between the active and standby FusionInsight Manager nodes is abnormal. An alarm indicating that the number of dead DataNodes exceeds the threshold cannot be automatically cleared. Alarms reported in the FusionInsight Manager auto scaling scenario need to be optimized. The node network communication exception alarm in the manager scale-in scenario needs to be optimized. An alarm indicating that the service is unavailable is reported when the controller is restarted. A help document link is added in case that the administrator account of FusionInsight Manager is forgotten. The RangeAdmin instance fails to be started after external Ranger metadata is configured for a cluster. An error is reported when the quit command is executed to exit the Spark client. The driver port conflicts with the thriftserver port occasionally when there are multiple Spark tenants. Resources are not released for spark JDBC tasks that have been idle for more than 30 minutes. After the SSL of ZooKeeper is enabled and a Spark job is submitted, ZooKeeper fails to be connected and the task execution times out. Spark fails to connect to JDBCServer. When Spark grants only the view permission but not the table permission, Hive can query views but SparkSQL cannot.
Compatibility with Other Patches	The MRS 3.2.0-LTS.1.6 patch package contains all patches for fixing every single issue of MRS 3.2.0-LTS.1.
Impact of Patch Installation	See Impact of Patch Installation.

Impact of Patch Installation

If you need to add a service after installing a patch for MRS 3.2.0-LTS.1, uninstall the patch, add the service, and reinstall the patch.
After the patch is installed for MRS 3.2.0-LTS.1, do not reinstall hosts or software on the management plane.
After the patch is installed for MRS 3.2.0-LTS.1, if the IoTDB component is installed in the cluster, you need to disable the metric reporting function of the component when interconnecting with CES.
After the patch is installed for MRS 3.2.0-LTS.1, you also need to upgrade the client that is downloaded and installed again.
During MRS 3.2.0-LTS.1 patch installation, OMS automatically restarts, which affects cluster management operations, such as job submission and cluster scaling. Install patch in off-peak hours.
After the patch is installed for MRS 3.2.0-LTS.1, the steps for upgrading the client and upgrading the ZIP package cannot be skipped. Otherwise, the patches of components such as Spark, HDFS, and Flink cannot be used, and jobs submitted on the Spark client will fail to run.
After the MRS 3.2.0-LTS.1.6 patch is installed or uninstalled, restart the Flink, YARN, HDFS, MapReduce, Ranger, HetuEngine, Flume, Hive, Kafka, and Spark2x services on FusionInsight Manager to apply the patch. During restart, some services may be unavailable for a short period. To ensure service continuity, restart the components in off-peak hours. Before uninstalling the patch, log in to FusionInsight Manager, and choose System > Third-Party AD to disable AD interconnection.

When upgrading from MRS 3.2.0-LTS.1.4 to MRS 3.2.0-LTS.1.5 by applying patches, you only need to restart the components that are patched in MRS 3.2.0-LTS.1.5. However, if you are upgrading across versions, you need to restart all components that are patched in the cumulative patches.

If a client is manually installed inside or outside the cluster, you need to upgrade or roll back the client.
1. Log in to the active node of the cluster.
  cd /opt/Bigdata/patches/{Patch version}/download/
  
  In all operations, replace {Patch version} with that used in the real-life project. For example, if the installed patch is MRS_3.2.0-LTS.1.1, the value of {Patch version} is MRS_3.2.0-LTS.1.1.
2. Copy the patch installation package to the /opt/ directory on the client node.
  scp patch.tar.gz {IP address of the client node}:/opt/
  
  Example:
  
  scp patch.tar.gz 127.0.0.1:/opt/
3. Log in to the node where the client is deployed.
  Example:
  
  ssh 127.0.0.1
4. Run the following commands to create a patch directory and decompress the patch package:
  mkdir /opt/{Patch version}
  
  tar -zxf /opt/patch.tar.gz -C /opt/{Patch version}
5. Upgrade/Rollback a patch.
  - Upgrade the patch on the client node.
    Log in to the node where the client is deployed.
    
    cd /opt/{Patch version}/client
    
    sh upgrade_client.sh upgrade {Client installation directory}
    
    Example:
    
    sh upgrade_client.sh upgrade /opt/client/
  - Roll back the patch on the client node (after the patch is uninstalled).
    Log in to the node where the client is deployed.
    
    cd /opt/{Patch version}/client
    
    sh upgrade_client.sh rollback {Client installation directory}
    
    The following shows an example.
    
    sh upgrade_client.sh rollback /opt/client/
If the Spark service is installed on MRS 3.2.0-LTS.1, upgrade the ZIP package in HDFS on the active OMS node after the patch is installed.
1. Log in to the active node of the cluster.
  su - omm
  
  cd /opt/Bigdata/patches/{Patch version}/client/
  
  source /opt/Bigdata/client/bigdata_env
2. Authenticate users who have permissions on HDFS on a cluster in security mode.
  kinit {Service user}
3. Upgrade the package in HDFS.
  sh update_hdfs_file.sh
4. (Optional) Roll back upgrade after the patch is uninstalled.
  sh rollback_hdfs_file.sh
5. Restart the JDBCServer2x instance of Spark on FusionInsight Manager.