Updated on 2025-07-22 GMT+08:00

TPC-DS Data Generation

  1. Log in to the ECS and run the following commands to create a directory for storing TPC-DS tools:

    mkdir -p /data1/script/tpcds-kit/tpcds1000X
    mkdir -p /data2/script/tpcds-kit/tpcds1000X

  2. Obtain the latest TPC-DS data construction tool dsdgen from the Official website and use SFTP to upload the tool to the /data1/script/tpcds-kit directory on the ECS.
  3. Decompress the TPC-DS package and compile the data construction tool dsdgen.

    Replace tpcds_3.2.0.zip with the actual software package name.

    Replace DSGen-software-code-3.2.0rc1 with the actual name of the decompressed folder.

    cd /data1/script/tpcds-kit && unzip tpcds_3.2.0.zip
    cd DSGen-software-code-3.2.0rc1/tools && make

  4. Go to the /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools directory and run the following commands to generate data:

    for c in {1..5};do (./dsdgen -scale 1000 -dir /data1/script/tpcds-kit/tpcds1000X -TERMINATE N -parallel 10 -child ${c} -force Y > /dev/null 2>&1 &);done
    for c in {6..10};do (./dsdgen -scale 1000 -dir /data2/script/tpcds-kit/tpcds1000X -TERMINATE N -parallel 10 -child ${c} -force Y > /dev/null 2>&1 &);done

    Parameter description:

    • -scale specifies the data scale. In this example, the value is 1000.
    • -dir specifies the directory for storing the generated data file. In this example, the value is /data1/script/tpcds-kit/tpcds1000X/data2/script/tpcds-kit/tpcds1000X.
    • -TERMINATE: indicates whether a separator is required at the end of each record.
    • -parallel specifies the number of shards. In this example, the value is 10.
    • -child specifies a shard sequence. It does not need to be changed.

  5. Run the following commands to check the data file generation progress. You can also run the ps ux|grep dsdgen command to check whether the process for generating data files exits.

    du -sh /data1/script/tpcds-kit/tpcds1000X/*.dat
    du -sh /data2/script/tpcds-kit/tpcds1000X/*.dat