TPC-H Data Generation

Obtain TPC-H tools from the official website.
Log in to the ECS and run the following commands to create directories for storing the TPC-H tool:
```
mkdir -p /data1/script/tpch-kit/tpch1000X
mkdir -p /data2/script/tpch-kit/tpch1000X
```
Upload the obtained TPC-H tool to the /data1/script/tpch-kit directory on the ECS and run the following command to decompress the tool:

Replace tpch_3.0.1.zip with the actual software package name.
```
cd /data1/script/tpch-kit && unzip tpch_v3.0.1.zip
```

Compile and generate the data construction tool dbgen.

Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.

Modify the makefile.suite file.

#Change the parameters of makefile.suite as follows (from line 103 to line 111):
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE = POSTGRESQL     #The specified parameter of the program does not contain postgresql, which indicates the PostgreSQL script. Add this parameter in tpcd.h to add this script.
MACHINE = LINUX
WORKLOAD = TPCH

Modify the tpcd.h file.

//Add the following statements to the tpcd.h file:
#ifdef POSTGRESQL
#define GEN_QUERY_PLAN  "EXPLAIN"
#define START_TRAN      "BEGIN TRANSACTION"
#define END_TRAN        "COMMIT;"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "LIMIT %d\n"
#define SET_DBASE       ""
#endif /* POSTGRESQL */
$ cd TPC-H_Tools_v3.0.1/dbgen
$ cp makefile.suite makefile
$ make -f makefile
$ cp -R /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/ /data2/script/tpch-kit/

Log in to the ECS and generate TPC-H 1000X data. In this example, TPC-H 1000X data is generated on two data disks synchronously.

The total size of TPC-H 1000X data files is about 1,100 GB. Make sure that the ECS disk space is sufficient.
1. Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
```
for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done
```
2. Go to the /data2/script/tpch-kit/ TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
```
for c in {6..10};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done
```
  Parameter description:
  - -s specifies the data scale. In this example, the value is 1000.
  - -C specifies the number of chunks. In this example, the value is 10.
  - -S specifies the sequence number of the current chunk. You do not need to change the value.

Run the following commands to check the data file generation progress. You can also run the ps ux|grep dbgen command to check whether the process for generating data files exits.
```
du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl*
du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl*
```

Transfer TPC-H 1000X data to a specified directory.

mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data1/script/tpch-kit/tpch1000X
mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data2/script/tpch-kit/tpch1000X

Parent topic: TPC-H Test Process

Previous topic: TPC-H Test Data

Next topic: Creating a Table and Importing TPC-H Data