Impala Application Development Suggestions
Deploy Coordinators and Executors Separately, with Two to Five Coordinators for Each Cluster Depending on the Cluster Scale
The Coordinator caches metadata, parses SQL execution plans, and responds to client requests, and it mainly uses JVM memory. The Executor reads and writes data and calculates operators, and it mainly uses off-heap memory. The memory usage can be effectively improved after a splits. In addition, all SQL execution statistics are recorded in Coordinators. After splits, you can access several Coordinators to obtain the SQL execution status of the entire cluster, reducing the O&M pressure.
Configure Inclusive Queues for Core Services and Set Mem_limit and Exec_time_limit_s to Avoid Large Queries
Resource queues help prevent one service from taking away resources needed by another service. For details, see Enabling and Configuring a Dynamic Resource Pool for Impala.
Enable OBS Local Cache
OBS provides local cache that meets your data storage demands, improving the read speed. For example, you can configure a single-disk 100 GB local cache with data_cache=/srv/BigData/data1/impala:100 GB.
Enable HDFS Short-Circuit Read
HDFS allows you to enable short-circuit read to improve read speed. For details, see https://impala.apache.org/docs/build/html/topics/impala_config_performance.html.
Run Invalidate metadata <table> After Table Structure Is Changed, and When Data Is Imported to the Database or Lake, Refresh Changed Tables/partitions to Update the Impala Metadata
If a table is created or modified on a non-Impala engine (such as Hive and Spark), you need to run the Invalidate metadata <table> command on Impala to synchronize table schema information. Full metadata is synchronized only when the table is queried. For adding partitions and inserting data, you can run the refresh command to incrementally update metadata.
Run compute increment stats <table_name> Periodically to Update Common Table Statistics for Faster Query
Impala estimates the resources consumed by queries based on table statistics. Accurate statistics help Impala properly parse execution plans and allocate resources.
Merge Small Files Periodically to Reduce the Number of Files in a Single Table and Improve the Metadata Loading Speed
The amount of Impala metadata increases as the number of partitions and files grows. Too many partitions can consume excessive memory, leading to slower metadata updates and reduced query performance due to increased file scans.
Set the Storage Type to ORC or Parquet When Creating a Table
Columnar storage formats like ORC and Parquet enable faster reads and higher compression ratios, reducing data storage needs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot