Doris Table Creation Rules

This topic describes the rules and suggestions for creating Doris tables.

Table Creation Rules

[Mandatory] Ensure each bucket's data size is between 100 MB and 3 GB when defining buckets for table creation. A single partition's maximum data size should not exceed 5000 GB.
[Mandatory] Implement a bucket policy when a table's data records surpass 500 million.
[Mandatory] Limit bucket columns to one or two. Balance data distribution and query throughput to avoid data skew, which can impact overall balance and efficiency. Optimize query performance by tailoring bucket columns to evenly distributed data that is frequently used in query conditions.
[Mandatory] Avoid dynamic partitions for datasets smaller than 20 million records to prevent excessive bucket creation.
Set the replication factor for a new table to a minimum of two (with three as the default). Single backups are not recommended.
[Optional] Limit materialized views to six.
For substantial historical data with small, unbalanced sizes or low query likelihood, do as follows:
- Create dedicated historical partitions (yearly or monthly) to store all relevant data.
- Historical partition creation example: FROM ("2000-01-01") TO ("2022-01-01") INTERVAL 1 YEAR.
[Optional] For data records ranging from 10 million to 200 million, bypass partitioning in favor of direct bucket policy application. Doris defaults to a standard partition in the absence of user-defined partitions.
[Optional] If bucket fields experience over 30% data skew, switch from hash to random bucketing policies. DISTRIBUTED BY RANDOM BUCKETS 10 ...
[Optional] Prioritize the most frequently queried column as the first field when creating a table to leverage the prefix index's quick query feature. For high-cardinality columns outside the bucket, choose those with extensive query lengths. Note that the prefix index is limited to 36 bits; overly long columns may not utilize this feature.
[Optional] Utilize inverted indexes or Bloom filters for fuzzy matches or equivalent/in conditions on large datasets. Bitmap indexes are best suited for orthogonal queries with low-cardinality columns.
[Optional] Manually create buckets based on service requirements. Do not use the AUTO policy.

[Recommendation] For 2.1 and later versions, if a large amount of data is frequently imported to the database, MOR is recommended. If the query performance is preferentially considered and a long time for importing data to the database is acceptable, MOW is recommended. MOW is not recommended for 2.0 and earlier versions.

Parent topic: Doris Usage Specifications

Previous topic: Doris Usage Specifications

Next topic: Doris Data Change Rules