TPC-H Data Construction
- Obtain TPC-H tools from the Official website.
- Log in to the ECS and run the following command to create a directory for storing the TPC-H tools:
1 2
mkdir -p /data1/script/tpch-kit/tpch1000X mkdir -p /data2/script/tpch-kit/tpch1000X
- Use SFTP to upload the obtained TPC-H tools to the /data1/script/tpch-kit directory of the ECS and run the following command to decompress the tools:
1
cd /data1/script/tpch-kit && unzip tpch_v3.0.0.zip
- Run the following command to compile and generate the data construction tool dbgen:
Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.
- Modifying makefile.suite
1 2 3 4 5 6 7 8 9 10 11 12
#Change the parameters of makefile.suite as follows (line 103 to line 111): CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE = POSTGRESQL # The specified parameter of the program does not contain postgresql. Modify tpcd.h to add the POSTGRESQL script. MACHINE = LINUX WORKLOAD = TPCH
- Modifying tpcd.h
1 2 3 4 5 6 7 8 9
//Add the following statements to the tpcd.h file: #ifdef POSTGRESQL #define GEN_QUERY_PLAN "EXPLAIN" #define START_TRAN "BEGIN TRANSACTION" #define END_TRAN "COMMIT;" #define SET_OUTPUT "" #define SET_ROWCOUNT "LIMIT %d\n" #define SET_DBASE "" #endif /* POSTGRESQL */
1
cd TPC-H_Tools_v3.0.0/dbgen && make
- Modifying makefile.suite
- Log in to the ECS and run the following commands to generate data for the TPC-H 1000X test. In this example, TPC-H 1000X data is generated on two data disks synchronously.
The total size of TPC-H 1000X data file is about 1100 GB. Make sure that the ECS disk space is sufficient.
- Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:
1
for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} –f > /dev/null 2>&1 &);done
- Copy the dbgen script.
1
cp -r /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen
Go to the /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:
1
for c in {6..10};do (./dbgen -s 1000 -C 10-S ${c} –f > /dev/null 2>&1 &);done
Where,
- -s specifies the data scale. In this example, the value is 1000.
- -C specifies the number of chunks. In this example, the value is 10.
- -S specifies the sequence number of the current chunk. You do not need to change the value.
- Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:
- Run the following commands to check the data file generation progress: You can also run the ps ux|grep dsdgen command to check whether the file generation process stops.
1 2
du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl* du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl*
- Copy data to a specified directory.
1 2
mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl* /data1/script/tpch-kit/tpch1000X mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl* /data2/script/tpch-kit/tpch1000X
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot