Updated on 2022-12-16 GMT+08:00

Data Distribution in a Distributed System

Background

DWS uses a two-layer data layout mechanism achieve high-performance query and import of PB-level data. At the first layer, users can specify a data distribution policy (hash distribution or replication distribution) when creating a table. When data is written to the system, the system determines the node where the data is stored based on the corresponding distribution policy. At the second layer, the node partitions its stored data according to partitioning rules.