Creating Hive Table Partitions to for Faster Queries
Scenario
During the Select query, Hive generally scans the entire table, which is time-consuming. To improve query efficiency, create table partitions based on service requirements and query dimensions.
Procedure
- For versions earlier than MRS 3.x:
Log in to the MRS console. In the left navigation pane, choose Active Clusters and click a cluster name. Choose Nodes > Node. The ECS page is displayed. Click Remote Login to log in to the Hive node.
For MRS 3.x and later:
Log in to the node where the Hive client has been installed as user root.
- Run the following command to go to the client installation directory, for example, /opt/client:
cd /opt/client
- Run the source bigdata_env command to configure environment variables for the client.
- Run the following command on the client for login:
kinit Username
- Run the following command to log in to the client tool:
beeline
- Select the static or dynamic partition.
- Static partition:
Manually enter a partition name, and use the keyword PARTITIONED BY to specify partition column name and data type when creating a table. During application development, use the ALTER TABLE ADD PARTITION statement to add a partition and use the LOAD DATA INTO PARTITION statement to load data to the partition, which supports only static partitions.
- Dynamic partition: Use a query command to insert results to a partition of a table. The partition can be a dynamic partition.
The dynamic partition can be enabled on the client tool by running the following command:
set hive.exec.dynamic.partition=true;
The default mode of the dynamic partition is strict. That is, at least a column must be specified as a static partition, under which dynamic sub-partitions can be created. You can run the following command to enable a completely dynamic partition:
set hive.exec.dynamic.partition.mode=nonstrict;
- The dynamic partition may cause a DML statement to create a large number of partitions and new mapping folders, which deteriorates system performance.
- If there are a large number of files, it takes a long time to run a SQL statement. You can run the set mapreduce.input.fileinputformat.list-status.num-threads = 100; statement before running a SQL statement to shorten the time. The parameter mapreduce.input.fileinputformat.list-status.num-threads can be set only after being added to the Hive whitelist.
- Static partition:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.