TPC-H Data Generation
- Obtain TPC-H tools from the official website.
- Log in to the ECS and run the following commands to create directories for storing the TPC-H tool:
mkdir -p /data1/script/tpch-kit/tpch1000X mkdir -p /data2/script/tpch-kit/tpch1000X
- Upload the obtained TPC-H tool to the /data1/script/tpch-kit directory on the ECS and run the following command to decompress the tool:
Replace tpch_3.0.1.zip with the actual software package name.
cd /data1/script/tpch-kit && unzip tpch_v3.0.1.zip
- Compile and generate the data construction tool dbgen.
Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.
- Modify the makefile.suite file.
#Change the parameters of makefile.suite as follows (from line 103 to line 111): CC = gcc # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata) # SQLSERVER, SYBASE, ORACLE, VECTORWISE # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, # SGI, SUN, U2200, VMS, LINUX, WIN32 # Current values for WORKLOAD are: TPCH DATABASE = POSTGRESQL #The specified parameter of the program does not contain postgresql, which indicates the PostgreSQL script. Add this parameter in tpcd.h to add this script. MACHINE = LINUX WORKLOAD = TPCH
- Modify the tpcd.h file.
//Add the following statements to the tpcd.h file: #ifdef POSTGRESQL #define GEN_QUERY_PLAN "EXPLAIN" #define START_TRAN "BEGIN TRANSACTION" #define END_TRAN "COMMIT;" #define SET_OUTPUT "" #define SET_ROWCOUNT "LIMIT %d\n" #define SET_DBASE "" #endif /* POSTGRESQL */ $ cd TPC-H_Tools_v3.0.1/dbgen $ cp makefile.suite makefile $ make -f makefile $ cp -R /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/ /data2/script/tpch-kit/
- Modify the makefile.suite file.
- Log in to the ECS and generate TPC-H 1000X data. In this example, TPC-H 1000X data is generated on two data disks synchronously.
The total size of TPC-H 1000X data files is about 1,100 GB. Make sure that the ECS disk space is sufficient.
- Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done
- Go to the /data2/script/tpch-kit/ TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
for c in {6..10};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done
Parameter description:
- -s specifies the data scale. In this example, the value is 1000.
- -C specifies the number of chunks. In this example, the value is 10.
- -S specifies the sequence number of the current chunk. You do not need to change the value.
- Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
- Run the following commands to check the data file generation progress. You can also run the ps ux|grep dbgen command to check whether the process for generating data files exits.
du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl*
- Transfer TPC-H 1000X data to a specified directory.
mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data1/script/tpch-kit/tpch1000X mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data2/script/tpch-kit/tpch1000X
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot