Help Center/
GaussDB(DWS)/
Performance White Paper/
Test Methods/
Constructing Data for TPC-H and TPC-CDS Tests/
TPC-DS Data Construction
Updated on 2022-07-26 GMT+08:00
TPC-DS Data Construction
- Log in to the ECS and run the following command to create a directory for storing the TPC-DS tool:
1 2
mkdir -p /data1/script/tpcds-kit/tpcds1000X mkdir -p /data2/script/tpcds-kit/tpcds1000X
- Obtain the latest TPC-DS data construction tool dsdgen from the Official website and use SFTP to upload the tool to the /data1/script/tpcds-kit directory on the ECS.
- Run the following commands to decompress the TPC-DS package and compile the package to generate the data construction tool dsdgen:
- Replace tpcds_3.2.0.zip with the actual software package name.
- Replace DSGen-software-code-3.2.0rc1 with the actual name of the decompressed folder.
1 2
cd /data1/script/tpcds-kit && unzip tpcds_3.2.0.zip cd DSGen-software-code-3.2.0rc1/tools && make
- Go to the /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools directory and run the following commands to generate data:
- Because of the large size of the TPC-DS data, the size of a single table is also large. Therefore, data is generated in shards.
- The total size of TPC-DS 1000X data file is about 930 GB. Make sure that the ECS disk space is sufficient.
- Because the generated data is large, it takes a long time to import data if only one GDS is started. You are advised to generate data on two data disks evenly. In the following example, shards 1 to 5 are stored in /data1/script/tpcds-kit/tpcds1000X, and shards 6 to 10 are stored in /data2/script/tpcds-kit/tpcds1000X.
1 2
for c in {1..5};do (./dsdgen -scale 1000 -dir /data1/script/tpcds-kit/tpcds1000X -TERMINATE N -parallel 10 -child ${c} -force Y > /dev/null 2>&1 &);done for c in {6..10};do (./dsdgen -scale 1000 -dir /data2/script/tpcds-kit/tpcds1000X -TERMINATE N -parallel 10 -child ${c} -force Y > /dev/null 2>&1 &);done
Where,
- -scale specifies the data scale. In this example, the value is 1000.
- -dir specifies the directories where the generated data files are stored. In this example, the directories are /data1/script/tpcds-kit/tpcds1000X and /data2/script/tpcds-kit/tpcds1000X.
- -TERMINATE indicates whether a separator is required at the end of each record.
- -parallel specifies the number of shards. In this example, the value is 10.
- -child specifies a shard sequence. It does not need to be changed.
- Run the following commands to check the data file generation progress: You can also run the ps ux|grep dsdgen command to check whether the file generation process stops.
1 2
du -sh /data1/script/tpcds-kit/tpcds1000X/*.dat du -sh /data2/script/tpcds-kit/tpcds1000X/*.dat
Parent topic: Constructing Data for TPC-H and TPC-CDS Tests
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot