Updated on 2024-04-11 GMT+08:00

MRS 3.2.0-LTS.1 Patch Description

Basic information about MRS 3.2.0-LTS.1.6

Table 1 Basic information

Patch Version

MRS 3.2.0-LTS.1.6

Release Date

2024-02-04

Pre-Installation Operations

If an MRS cluster node is faulty or the network is disconnected, isolate the node first. Otherwise, the patch installation will fail.

New Features

  • CDL supports uppercase letters of table fields.
  • Sharding key is specified when flink writes data to the NetEase DDB.
  • Flink supports writing upsertkafka monitoring data into influxdb.
  • Flink stream read Hudi supports monitoring of the message retention time and message stacking time.
  • Flink supports the ignoreDelete feature.
  • Yarn NodeManager supports graceful decommissioning.
  • Kafka supports data encryption.
  • Spark supports subquery fields without aggregate functions. (Set spark.sql.legacy.correlated.scalar.query.enabled to true.)
  • Spark supports view and chart permission control. (Add spark.ranger.plugin.viewaccesscontrol.enable to the custom parameter of JDBCserver and set it to true.) Add spark.ranger.plugin.viewaccesscontrol.enable=true to the Spark2x/spark/conf/spark-defaults.conf configuration file in the client directory and restart the JDBCserver instance.

Resolved Issues

List of resolved issues in MRS 3.2.0-LTS.1.6:

  • No data is written into the Hudi table when the start time parameter is added to a CDL job.
  • Residual connector threads cannot be killed when CDL is restarted.
  • The error logs of the Clickhouse balance are not rolled back.
  • It takes a long time to import ClickHouse data to the database.
  • The ClickHouse TTL does not take effect.
  • The Hive client cannot be connected due to too many ClickHouse and ZooKeeper connections.
  • OOM is reported when columns are added to the base table of the ClickHouse materialized view.
  • Executor optimizes the job aging sequence to aging based on the job end time.
  • The UDF fails to be updated because the package name of the UDF uploaded on Flink UI is not changed.
  • The UI paging display is abnormal after UDFs are uploaded to Flink.
  • The time displayed when Flink streams read Hudi data is 8 hours shorter than the actual time.
  • An Error Is Reported When FlinkServer Is Used to Create a Kafka Connection with SASL_SSL Authentication
  • When Flink interconnects with Guardian and OBS is enabled, Flink jobs occasionally fail to be restarted.
  • Resolved the problem that jobs submitted by DGC cannot be restored from checkpoints.
  • When FlinkServer is used to submit a job, a message is displayed indicating that the job fails to be submitted, but the job on Yarn is in RUNNING state.
  • The StackOverFlow error is reported when a Flink-SQL job is submitted using DGC.
  • The value of Flink security.ssl.encrypt.enabled is changed to TRUE, which affects the task startup speed.
  • When a DGC job fails and is retried, the job is consumed from the beginning instead of being started from the checkpoint.
  • An error is reported when a common cluster uses REST APIs to invoke FlinkServer jobs or actions
  • The ClickHouse job cannot be submitted when the Flink JAR package is written.
  • The function of deleting hint parameters in the Flink Join state does not take effect.
  • The Flink job error information is incorrect, and the number of error lines displayed in the log is inconsistent with the number of SQL error lines of the job.
  • The Flink stream fails to read the Hudi job when data is written to the table, and the Hudi job fails to be submitted and executed.
  • The Flinkserver SQL verification fails.
  • Keytab and principal cannot be configured by -yD or -D on the Flink client.
  • When the Flink operator chain is opened, the sink operator exception is captured by the source operator, but the TM job does not fail.
  • The specific job name is not displayed in the MRS real-time task alarm notification.
  • The Flume customized time interceptor does not take effect.
  • Single-Node capacity expansion fails occasionally due to NodeManager startup failure during autoscaling.
  • In security mode, the HDFS user cannot be used to deliver balance tasks.
  • The available resource metrics of the Yarn resource pool are abnormal. As a result, auto scaling is triggered abnormally.
  • Rectify the fault that the disk of the NodeManager node is full. As a result, members in the resource pool are migrated to the default resource pool.
  • The alarm indicating that the number of dead DataNodes exceeds the threshold cannot be automatically cleared.
  • A false alarm indicating that the service is unavailable is reported when invoking the HBase health check script times out.
  • After a column is added using HetuEngine alter, no data is available in the old partition when data is inserted.
  • When HetuEngine or Flink writes data, there is a possibility that HetuEngine reads no data from the RO table after compaction.
  • The HDFS log collection package is missing.
  • HetuEngine fails to execute SQL statements when using HSFabric to connect to JDBC.
  • An error is reported when the Hive on tez overwrite partition table result is empty.
  • Hive integrates the DataAtrs metadata synchronization plug-in package.
  • Hive Jobs frequently report errors after the last access time of Hive metadata is configured.
  • The batch deletion parameter does not take effect in "Hive alter table test drop partition (partition <'xxxx');".
  • Parameters such as &useSSL=false do not take effect when Hive JDBC is connected.
  • In high-concurrency scenarios, Hive on Spark jobs fail due to JAR package submission timeout.
  • Duplicate bucket IDs exist when CDL writes data to the Hudi table.
  • Hudi compaction runs faster than clean. As a result, data cannot be read.
  • The "xx is not a Parquet file" exception is reported when Spark reads Hudi because abnormal files are not cleared after a compaction task fails.
  • When a Spark job reads an upstream database table, Excutor reports an error indicating that files in the OBS .schema directory of the table cannot be found.
  • The Hudi compaction schedule is optimized. A plan is generated based on the last compaction action.
  • By default, Hudi retains 5 GB archived compressed files.
  • By default, Hudi OBS is not moved to the recycle bin.
  • Hudi archive archives more clean and rollback operations to reduce the number of metadata files.
  • Archiving cannot be triggered if clean is not executed for Hudi Cow.
  • The table directory is deleted when the Hudi uses Spark to generate data in the batch supplement table.
  • The namespaces generated for the Hudi table are inconsistent. As a result, the update fails.
  • The call command is added to Hudi to clear invalid metadata files.
  • Instant data is deleted when the Hudi DELETE_EMPTY_INSTANT function is abnormal.
  • Data Is Lost in Flink append mode
  • Historical data is not cleared after being stored for 30 days by default.
  • The real-time monitoring data on FusionInsight Manager is empty after the time zone is changed to UTC.
  • The global user policy content under Tenant Resources on FusionInsight Manager is not displayed on multiple pages.
  • The yarn on cce button is added to the Manager page.
  • The Tomcat configuration needs to be manually modified when a new cluster accesses the Manager page through a private line.
  • Interconnecting Manager with the O&M Page
  • A false alarm is generated in an MRS cluster.
  • The monitoring metrics are displayed abnormally when the Manager queries data across 00:00.
  • The CPU usage of the kernel space on FusionInsight Manager is incorrect.
  • After a user switches to the distribution chart on the host cable management monitoring page of FusionInsight Manager, the default value range of the icon is incorrect when the user switches to the distribution chart.
  • The GaussDB process of the active OMS occupies a large amount of memory.
  • After the customized configuration of the Manager component is complete, the customized configuration of the added instance is not displayed.
  • The PMS process keeps restarting.
  • The alarm indicating that the trust relationship between nodes is invalid is falsely reported during node scale-out.
  • Data synchronization between the active and standby Manager nodes is abnormal.
  • The alarm indicating that the number of dead DataNodes exceeds the threshold cannot be automatically cleared.
  • Alarms Reported in the Manager Auto Scaling Scenario Are Optimized
  • Optimizing the Node Network Communication Exception Alarm in the Manager Scale-in Scenario
  • An Alarm Indicating that the Service Is Unavailable Is Reported When the Controller Is Restarted
  • The administrator account of MRS Manager forgets to add the redirection support document.
  • RangeAdmin Instance Fails to Be Started After External Ranger Metadata Is Configured for a Cluster
  • An error is reported when the quit command is executed to exit the Spark client.
  • The driver port conflicts with the thriftserver port occasionally in Spark multi-tenant mode.
  • Resources Are Not Released for Idle Spark JDBC Tasks That Have Been Running for More Than 30 Minutes
  • After the SSL of ZooKeeper is enabled and a Spark job is submitted, ZooKeeper fails to be connected. As a result, the task execution times out.
  • Spark Fails to Connect to JDBCServer
  • When Spark grants only the view permission but not the table permission, Hive can query views but SparkSQL cannot query views.

Compatibility with Other Patches

The MRS 3.2.0-LTS.1.6 patch package contains all patches for fixing single-point issues in MRS 3.2.0-LTS.1.

Impact of Patch Installation

See Impact of Patch Installation.

Impact of Patch Installation

  • During MRS 3.2.0-LTS.1 patch installation, OMS automatically restarts, which affects cluster management operations, such as job submission and cluster scaling. Install patch in off-peak hours.
  • After the MRS 3.2.0-LTS.1.6 patch is installed or uninstalled, restart the Flink, Yarn, HDFS, MapReduce, Ranger, HetuEngine, Flume, Hive, Kafka, and Spark2x services on FusionInsight Manager for the patch to take effect. During restart, some services may be unavailable for a short period. To ensure service continuity, restart the components in off-peak hours. Before uninstalling the patch, log in to FusionInsight Manager, and choose System > Third-Party AD to disable AD interconnection.
  • If a client is manually installed inside or outside the cluster, you need to upgrade or roll back the client.
    1. Log in to the active node of the cluster.

      cd /opt/Bigdata/patches/{Patch version}/download/

      In all operations, replace {Patch version} with that used in the real-life project. For example, if the installed patch is MRS_3.2.0-LTS.1.1, the value of {Patch version} is MRS_3.2.0-LTS.1.1.

    2. Copy the patch installation package to the /opt/ directory on the client node.

      scp patch.tar.gz {IP address of the client node}:/opt/

      The following shows an example.

      scp patch.tar.gz 127.0.0.1:/opt/

    3. Log in to the node where the client is deployed.

      The following shows an example.

      ssh 127.0.0.1

    4. Run the following commands to create a patch directory and decompress the patch package:

      mkdir /opt/{Patch version}

      tar -zxf /opt/patch.tar.gz -C /opt/{Patch version}

    5. Upgrade/Rollback a patch.
      • Upgrade the patch on the client node.

        Log in to the node where the client is deployed.

        cd /opt/{Patch version}/client

        sh upgrade_client.sh upgrade {Client installation directory}

        The following shows an example.

        sh upgrade_client.sh upgrade /opt/client/

      • Roll back the patch on the client node (after the patch is uninstalled).

        Log in to the node where the client is deployed.

        cd /opt/{Patch version}/client

        sh upgrade_client.sh rollback {Client installation directory}

        The following shows an example.

        sh upgrade_client.sh rollback /opt/client/

  • If the Spark service is installed on MRS 3.2.0-LTS.1, upgrade the ZIP package in HDFS on the active OMS node after the patch is installed.
    1. Log in to the active node of the cluster.

      su - omm

      cd /opt/Bigdata/patches/{Patch version}/client/

      source /opt/Bigdata/client/bigdata_env

    2. Authenticate users who have permissions on HDFS on a cluster in security mode.

      kinit {Service user}

    3. Upgrade the package in HDFS.

      sh update_hdfs_file.sh

    4. (Optional) Roll back upgrade after the patch is uninstalled.

      sh rollback_hdfs_file.sh