Updated on 2022-07-26 GMT+08:00

TPC-H Data Construction

  1. Obtain TPC-H tools from the Official website.
  2. Log in to the ECS and run the following command to create a directory for storing the TPC-H tools:

    1
    2
    mkdir -p /data1/script/tpch-kit/tpch1000X
    mkdir -p /data2/script/tpch-kit/tpch1000X
    

  3. Use SFTP to upload the obtained TPC-H tools to the /data1/script/tpch-kit directory of the ECS and run the following command to decompress the tools:

    1
    cd /data1/script/tpch-kit && unzip tpch_v3.0.0.zip
    

  4. Run the following command to compile and generate the data construction tool dbgen:

    Before compilation, modify the makefile.suite and tpcd.h files in the dbgen directory.

    1. Modifying makefile.suite
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      #Change the parameters of makefile.suite as follows (line 103 to line 111):
      
      CC      = gcc
      # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
      #                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
      # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
      #                                  SGI, SUN, U2200, VMS, LINUX, WIN32
      # Current values for WORKLOAD are:  TPCH
      
      DATABASE = POSTGRESQL # The specified parameter of the program does not contain postgresql. Modify tpcd.h to add the POSTGRESQL script.
      MACHINE = LINUX
      WORKLOAD = TPCH
      
    2. Modifying tpcd.h
      1
      2
      3
      4
      5
      6
      7
      8
      9
      //Add the following statements to the tpcd.h file:
      #ifdef POSTGRESQL
      #define GEN_QUERY_PLAN  "EXPLAIN"
      #define START_TRAN      "BEGIN TRANSACTION"
      #define END_TRAN        "COMMIT;"
      #define SET_OUTPUT      ""
      #define SET_ROWCOUNT    "LIMIT %d\n"
      #define SET_DBASE       ""
      #endif /* POSTGRESQL */
      
    1
    cd TPC-H_Tools_v3.0.0/dbgen && make
    

  1. Log in to the ECS and run the following commands to generate data for the TPC-H 1000X test. In this example, TPC-H 1000X data is generated on two data disks synchronously.

    The total size of TPC-H 1000X data file is about 1100 GB. Make sure that the ECS disk space is sufficient.

    1. Go to the /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:
      1
      for c in {1..5};do (./dbgen -s 1000 -C 10 -S ${c} –f > /dev/null 2>&1 &);done
      
    2. Copy the dbgen script.
      1
      cp -r  /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen  /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen
      

      Go to the /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen directory and run the following command:

      1
      for c in {6..10};do (./dbgen -s 1000 -C 10-S ${c} –f > /dev/null 2>&1 &);done
      

      Where,

      • -s specifies the data scale. In this example, the value is 1000.
      • -C specifies the number of chunks. In this example, the value is 10.
      • -S specifies the sequence number of the current chunk. You do not need to change the value.

  2. Run the following commands to check the data file generation progress: You can also run the ps ux|grep dsdgen command to check whether the file generation process stops.

    1
    2
    du -sh /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl*
    du -sh /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl*
    

  3. Copy data to a specified directory.

    1
    2
    mv /data1/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl* /data1/script/tpch-kit/tpch1000X
    mv /data2/script/tpch-kit/TPC-H_Tools_v3.0.0/dbgen/*.tbl* /data2/script/tpch-kit/tpch1000X