Updated on 2025-07-22 GMT+08:00

TPC-H Data Generation

  1. Obtain TPC-H tools from the official website.
  2. Log in to the ECS and run the following commands to create directories for storing the TPC-H tool:

    mkdir -p /data1/script/tpch-kit/tpch1000X
    mkdir -p /data2/script/tpch-kit/tpch1000X

  3. Upload the obtained TPC-H tool to the /data1/script/tpch-kit directory on the ECS and run the following command to decompress the tool:

    Replace tpch_3.0.1.zip with the actual software package name.

    cd /data1/script/tpch-kit && unzip tpch_v3.0.1.zip

  4. Compile and generate the data construction tool dbgen.

    Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.

    1. Modify the makefile.suite file.
      #Change the parameters of makefile.suite as follows (from line 103 to line 111):
      CC      = gcc
      # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
      #                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
      # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
      #                                  SGI, SUN, U2200, VMS, LINUX, WIN32
      # Current values for WORKLOAD are:  TPCH
      DATABASE = POSTGRESQL     #The specified parameter of the program does not contain postgresql, which indicates the PostgreSQL script. Add this parameter in tpcd.h to add this script.
      MACHINE = LINUX
      WORKLOAD = TPCH
    2. Modify the tpcd.h file.
      //Add the following statements to the tpcd.h file:
      #ifdef POSTGRESQL
      #define GEN_QUERY_PLAN  "EXPLAIN"
      #define START_TRAN      "BEGIN TRANSACTION"
      #define END_TRAN        "COMMIT;"
      #define SET_OUTPUT      ""
      #define SET_ROWCOUNT    "LIMIT %d\n"
      #define SET_DBASE       ""
      #endif /* POSTGRESQL */
      $ cd TPC-H_Tools_v3.0.1/dbgen
      $ cp makefile.suite makefile
      $ make -f makefile
      $ cp -R /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/ /data2/script/tpch-kit/

  5. Log in to the ECS and generate TPC-H 1000X data. In this example, TPC-H 1000X data is generated on two data disks synchronously.

    The total size of TPC-H 1000X data files is about 1,100 GB. Make sure that the ECS disk space is sufficient.

    1. Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
      for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done
    2. Go to the /data2/script/tpch-kit/ TPC-H_Tools_v3.0.1/dbgen directory and run the following command:
      for c in {6..10};do (./dbgen -s 1000 -C 10 -S ${c} -f > /dev/null 2>&1 &);done

      Parameter description:

      • -s specifies the data scale. In this example, the value is 1000.
      • -C specifies the number of chunks. In this example, the value is 10.
      • -S specifies the sequence number of the current chunk. You do not need to change the value.

  1. Run the following commands to check the data file generation progress. You can also run the ps ux|grep dbgen command to check whether the process for generating data files exits.

    du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl*
    du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl*

  2. Transfer TPC-H 1000X data to a specified directory.

    mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data1/script/tpch-kit/tpch1000X
    mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.1/dbgen/*.tbl* /data2/script/tpch-kit/tpch1000X