Doris Data Query Rules

This topic describes the rules and suggestions for querying Doris data.

Data Query Rules

[Mandatory] Doris does not support foreign table query because foreign tables are unstable.
[Mandatory] Modify the query to a subquery if the in conditions exceed 2,000.
[Mandatory] Refrain from using the REST API for executing numerous SQL queries. It is intended solely for cluster maintenance.
[Optional] When INSERT INTO SELECT statements exceed 100 million, divide them into smaller batches for execution. To expedite data import during idle cluster resources, adjust concurrency settings.
For example, setting parallel_fragment_exec_instance_num to 8, which is half the CPU cores on a single Backend (BE), is recommended.
[Mandatory] For result sets larger than 50,000, utilize JDBC Catalog or the outfile method for export to prevent overloading Frontend (FE) resources and ensure cluster stability.
- Employ pagination (offset limit) with an ORDER BY clause for interactive queries.
- Consider outfile or export methods when exporting data to third parties.
[Mandatory] Utilize Colocation Join for joining more than two tables with over 300 million records.
[Mandatory] Avoid using SELECT * for querying large tables with hundreds of millions of records and specify the required fields instead.
- Use the SQL Block method to forbid broad queries.
- For high-concurrency point queries, consider enabling row storage (version 2.x).
- Use PreparedStatement for queries.
[Mandatory] Specify bucket conditions when querying tables with over 100 million records.
[Optional] void using OR as a JOIN condition.
[Optional] Narrow down the data range before sorting large datasets (whose records exceeds 500 million) to enhance performance.
For example, instead of FROM table ORDER BY datetime DESC LIMIT 10, use FROM table WHERE datetime='2023-10-20' ORDER BY datetime DESC LIMIT 10.