Timeout Reported When Adding the Hive Table Field

Issue

An error message is reported when adding the Hive table fields.

Symptom

Hive executes ALTER TABLE table_name ADD COLUMNS(column_name string) CASCADE on tables that contain more than 10,000 partitions. The error information is as follows:

Timeout when executing method: alter_table_with_environment_context; 600525ms exceeds 600000ms

Cause Analysis

The MetaStore client connection times out. The default timeout interval for the connection between the MetaStore client and server is 600 seconds. On FusionInsight Manager, increase the value of hive.metastore.client.socket.timeout to 3600s.

Another error is reported:

Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. java.net.SocketTimeoutException: Read timed out

JDBC connection timeout interval of the MetaStore metadata. The default value is 60 ms.

Increase the value of socketTimeout in javax.jdo.option.ConnectionURL to 60000. The initial error is still reported.
```
Timeout when executing method: alter_table_with_environment_context;3600556ms exceeds 3600000ms
```
Increase the values of parameters such as hive.metastore.batch.retrieve.max, hive.metastore.batch.retrieve.table.partition.max, and dbservice.database.max.connections. The problem persists.
It is suspected that the problem is caused by the GaussDB because adding a field will traverse each partition to execute getPartitionColumnStatistics and alterPartition.
Run the gsql -p 20051 -U omm -W dbserverAdmin@123 -d hivemeta command as user omm to log in to the Hive metabase.
Run select * from pg_locks. No lock wait is found.

Run select * from pg_stat_activity;. It is found that the process execution takes a long time.

SELECT 'org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics'AS NUCLEUS_TYPE,A0.AVG_COL_LEN,A0."COLUMN_NAME",A0.COLUMN_TYPE,A0.DB_NAME,A0.BIG_DECIMAL_HIGH_VALUE,A0.BIG_DECIMAL_LOW_VALUE,A0.DOUBLE_HIGH_VALUE,A0.DOUBLE_LOW_VALUE,A0.LAST_ANALYZED,A0.LONG_HIGH_VALUE,A0.LONG_LOW_VALUE,A0.MAX_COL_LEN,A0.NUM_DISTINCTS,A0.NUM_FALSES,A0.NUM_NULLS,A0.NUM_TRUES,A0.PARTITION_NAME,A0."TABLE_NAME",A0.CS_ID,A0.PARTITION_NAMEAS NUCORDER0 FROM PART_COL_STATS A0 WHERE A0."TABLE_NAME" = '$1' ANDA0.DB_NAME = '$2' AND A0.PARTITION_NAME = '$3' AND((((((A0."COLUMN_NAME" = '$4') OR (A0."COLUMN_NAME" ='$5')) OR (A0."COLUMN_NAME" = '$6')) OR (A0."COLUMN_NAME" ='$7')) OR (A0."COLUMN_NAME" = '$8')) OR (A0."COLUMN_NAME" ='$9')) ORDER BY NUCORDER0;

Click to enlarge

Run the gs_guc reload -c log_min_duration_statement=100 -D /srv/BigData/dbdata_service/data/ command to start SQL recording. It is found that the execution duration of the Run select * from pg_sta... statement is 700 ms, and more than 10,000 commands are executed because there are more than 10,000 partitions.
Add explain (analyze, verbose, timing, costs, buffers) before the SQL statement to analyze the execution plan. It is found that the entire table needs to be scanned during execution.
Check the index. It is found that the index does not meet the leftmost match rule.

Procedure

Rebuild an index.

su – omm
gsql -p 20051 -U omm -W dbserverAdmin@123 -d hivemeta
DROP INDEX PCS_STATS_IDX;
CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS(DB_NAME, TABLE_NAME, COLUMN_NAME, PARTITION_NAME);
CREATE INDEX SDS_N50 ON SDS(CD_ID);

Check the execution plan again. It is found that the statement can be indexed and executed within 5 ms (the original execution time is 700 ms). Add fields to the Hive table again. The fields can be added successfully.