TPC-H Data Construction

Obtain TPC-H tools from the Official website.
Log in to the ECS and run the following command to create a directory for storing the TPC-H tools:
1 2

mkdir -p /data1/script/tpch-kit/tpch1000X mkdir -p /data2/script/tpch-kit/tpch1000X
Use SFTP to upload the obtained TPC-H tools to the /data1/script/tpch-kit directory of the ECS and run the following command to decompress the tools:
1

cd /data1/script/tpch-kit && unzip tpch_v3.0.0.zip

Run the following command to compile and generate the data construction tool dbgen:

Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.

Modifying makefile.suite

#Change the parameters of makefile.suite as follows (line 103 to line 111):

CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH

DATABASE = POSTGRESQL # The specified parameter of the program does not contain postgresql. Modify tpcd.h to add the POSTGRESQL script.
MACHINE = LINUX
WORKLOAD = TPCH

Modifying tpcd.h

//Add the following statements to the tpcd.h file:
#ifdef POSTGRESQL
#define GEN_QUERY_PLAN  "EXPLAIN"
#define START_TRAN      "BEGIN TRANSACTION"
#define END_TRAN        "COMMIT;"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "LIMIT %d\n"
#define SET_DBASE       ""
#endif /* POSTGRESQL */

1	cd TPC-H_Tools_v3.0.0/dbgen && make

Log in to the ECS and run the following commands to generate data for the TPC-H 1000X test. In this example, TPC-H 1000X data is generated on two data disks synchronously.

The total size of TPC-H 1000X data file is about 1100 GB. Make sure that the ECS disk space is sufficient.

Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:

1	for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} –f > /dev/null 2>&1 &);done

Copy the dbgen script.

1	cp -r /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen

Go to the /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:

1	for c in {6..10};do (./dbgen -s 1000 -C 10-S ${c} –f > /dev/null 2>&1 &);done

Where,

-s specifies the data scale. In this example, the value is 1000.
-C specifies the number of chunks. In this example, the value is 10.
-S specifies the sequence number of the current chunk. You do not need to change the value.

Run the following commands to check the data file generation progress: You can also run the ps ux|grep dsdgen command to check whether the file generation process stops.

1 2	du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/.tbl du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/.tbl

Copy data to a specified directory.

1 2	mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/.tbl /data1/script/tpch-kit/tpch1000X mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/.tbl /data2/script/tpch-kit/tpch1000X