Help Center/ GaussDB/ Developer Guide(Distributed_V2.0-8.x)/ FAQs/ What Is Data Skew and How Do I Check Data Skew?
Updated on 2025-05-29 GMT+08:00

What Is Data Skew and How Do I Check Data Skew?

Answer: Data skew indicates that data is unevenly distributed on multiple DNs. For a hash table, an inappropriate distribution key may cause data skew or poor efficiency on certain DNs. Therefore, you need to periodically check or monitor the table to ensure that data is evenly distributed on each DN. Run the following statement to check the number of tuples on each DN:

gaussdb=# SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM tablename GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;

If tuple numbers vary greatly (several times or tenfold) in each DN, a data skew occurs. Change the data distribution key. ALTER TABLE cannot change distribution keys. Therefore, you need to rebuild a table when changing its distribution keys.