Help Center/ GaussDB(DWS)/ Best Practices/ Table Optimization Practices/ Step 1: Creating an Initial Table and Loading Sample Data
Updated on 2024-03-13 GMT+08:00

Step 1: Creating an Initial Table and Loading Sample Data

Supported Regions

Table 1 Regions and OBS bucket names

Region

OBS Bucket

EU-Dublin

dws-demo-eu-west-101

Create a group of tables without specifying their storage modes, distribution keys, distribution modes, or compression modes. Load sample data into these tables.

  1. (Optional) Create a cluster.

    If a cluster is available, skip this step. For details about how to create a cluster, see Creating a GaussDB(DWS) 2.0 Cluster.

    Connect to the cluster and test the connection. For details, see Methods of Connecting to a Cluster.

    This practice uses an 8-node cluster as an example. You can also use a four-node cluster to perform the test.

  2. Create an SS test table store_sales.

    Before you create this table, delete existing SS tables first (if any) using the DROP TABLE command. For example, to delete the store_sales table, run the following command:

    1
    DROP TABLE store_sales;
    

    Do not configure the storage mode, distribution key, distribution mode, or compression mode when you create this table.

    Run the CREATE TABLE command to create the 11 tables in Figure 3. This section only provides the syntax for creating the store_sales table. To create all tables, copy the syntax in Creating an Initial Table.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    CREATE TABLE store_sales
    (
        ss_sold_date_sk           integer                       ,
        ss_sold_time_sk           integer                       ,
        ss_item_sk                integer               not null,
        ss_customer_sk            integer                       ,
        ss_cdemo_sk               integer                       ,
        ss_hdemo_sk               integer                       ,
        ss_addr_sk                integer                       ,
        ss_store_sk               integer                       ,
        ss_promo_sk               integer                       ,
        ss_ticket_number          bigint               not null,
        ss_quantity               integer                       ,
        ss_wholesale_cost         decimal(7,2)                  ,
        ss_list_price             decimal(7,2)                  ,
        ss_sales_price            decimal(7,2)                  ,
        ss_ext_discount_amt       decimal(7,2)                  ,
        ss_ext_sales_price        decimal(7,2)                  ,
        ss_ext_wholesale_cost     decimal(7,2)                  ,
        ss_ext_list_price         decimal(7,2)                  ,
        ss_ext_tax                decimal(7,2)                  ,
        ss_coupon_amt             decimal(7,2)                  ,
        ss_net_paid               decimal(7,2)                  ,
        ss_net_paid_inc_tax       decimal(7,2)                  ,
        ss_net_profit             decimal(7,2)                  
    ) ;
    

  3. Load sample data into these tables.

    An OBS bucket provides sample data used for this practice. The bucket can be read by all authenticated cloud users. Perform the following operations to load the sample data:

    1. Create a foreign table for each table.

      GaussDB(DWS) uses the foreign data wrappers (FDWs) provided by PostgreSQL to import data in parallel. To use FDWs, create FDW tables first (also called foreign tables). This section only provides the syntax for creating the obs_from_store_sales_001 foreign table corresponding to the store_sales table. To create all foreign tables, copy the syntax in Creating a Foreign Table.

      • Note that <obs_bucket_name> in the following statement indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Table 1. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
      • The columns of the foreign table must be the same as that of the corresponding ordinary table. In this example, store_sales and obs_from_store_sales_001 should have the same columns.
      • The foreign table syntax obtains the sample data used for this practice from the OBS bucket. To load other sample data, modify SERVER gsmpp_server OPTIONS as needed. For details, see About Parallel Data Import from OBS.
      • // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      CREATE FOREIGN TABLE obs_from_store_sales_001
      (
          ss_sold_date_sk           integer                       ,
          ss_sold_time_sk           integer                       ,
          ss_item_sk                integer               not null,
          ss_customer_sk            integer                       ,
          ss_cdemo_sk               integer                       ,
          ss_hdemo_sk               integer                       ,
          ss_addr_sk                integer                       ,
          ss_store_sk               integer                       ,
          ss_promo_sk               integer                       ,
          ss_ticket_number          bigint               not null,
          ss_quantity               integer                       ,
          ss_wholesale_cost         decimal(7,2)                  ,
          ss_list_price             decimal(7,2)                  ,
          ss_sales_price            decimal(7,2)                  ,
          ss_ext_discount_amt       decimal(7,2)                  ,
          ss_ext_sales_price        decimal(7,2)                  ,
          ss_ext_wholesale_cost     decimal(7,2)                  ,
          ss_ext_list_price         decimal(7,2)                  ,
          ss_ext_tax                decimal(7,2)                  ,
          ss_coupon_amt             decimal(7,2)                  ,
          ss_net_paid               decimal(7,2)                  ,
          ss_net_paid_inc_tax       decimal(7,2)                  ,
          ss_net_profit             decimal(7,2)                  
      )
      -- Configure OBS server information and data format details.
      SERVER gsmpp_server
      OPTIONS (
      LOCATION 'obs://<obs_bucket_name>/tpcds/store_sales',
      FORMAT 'text',
      DELIMITER '|',
      ENCODING 'utf8',
      NOESCAPING 'true',
      ACCESS_KEY 'access_key_value_to_be_replaced',
      SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
      REJECT_LIMIT 'unlimited',
      CHUNKSIZE '64'
      )
      -- If create foreign table failed,record error message
      WITH err_obs_from_store_sales_001;
      
    2. Set ACCESS_KEY and SECRET_ACCESS_KEY parameters as needed in the foreign table creation statement, and run this statement in a client tool to create a foreign table.

      For the values of ACCESS_KEY and SECRET_ACCESS_KEY, see Creating Access Keys (AK and SK).

    3. Import data.
      Create the insert.sql script containing the following statements and execute it:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      \timing on
      \parallel on 4
      INSERT INTO store_sales SELECT * FROM obs_from_store_sales_001;
      INSERT INTO date_dim SELECT * FROM obs_from_date_dim_001;
      INSERT INTO store SELECT * FROM obs_from_store_001;
      INSERT INTO item SELECT * FROM obs_from_item_001;
      INSERT INTO time_dim SELECT * FROM obs_from_time_dim_001;
      INSERT INTO promotion SELECT * FROM obs_from_promotion_001;
      INSERT INTO customer_demographics SELECT * from obs_from_customer_demographics_001 ;
      INSERT INTO customer_address SELECT * FROM obs_from_customer_address_001 ;
      INSERT INTO household_demographics SELECT * FROM obs_from_household_demographics_001;
      INSERT INTO customer SELECT * FROM obs_from_customer_001;
      INSERT INTO income_band SELECT * FROM obs_from_income_band_001;
      \parallel off
      

      Information similar to the following is displayed:

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      SET
      Timing is on.
      SET
      Time: 2.831 ms
      Parallel is on with scale 4.
      Parallel is off.
      INSERT 0 402
      Time: 1820.909 ms
      INSERT 0 73049
      Time: 2715.275 ms
      INSERT 0 86400
      Time: 2377.056 ms
      INSERT 0 1000
      Time: 4037.155 ms
      INSERT 0 204000
      Time: 7124.190 ms
      INSERT 0 7200
      Time: 2227.776 ms
      INSERT 0 1920800
      Time: 8672.647 ms
      INSERT 0 20
      Time: 2273.501 ms
      INSERT 0 1000000
      Time: 11430.991 ms
      INSERT 0 1981703
      Time: 20270.750 ms
      INSERT 0 287997024
      Time: 341395.680 ms
      total time: 341584  ms
      
    4. Calculate the total time spent in creating the 11 tables. The result will be recorded as the loading time in the benchmark table in 1 in the next section.
    5. Run the following command to verify that each table is loaded correctly and records lines into the table:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      SELECT COUNT(*) FROM store_sales;
      SELECT COUNT(*) FROM date_dim;
      SELECT COUNT(*) FROM store;
      SELECT COUNT(*) FROM item;
      SELECT COUNT(*) FROM time_dim;
      SELECT COUNT(*) FROM promotion;
      SELECT COUNT(*) FROM customer_demographics;
      SELECT COUNT(*) FROM customer_address;
      SELECT COUNT(*) FROM household_demographics;
      SELECT COUNT(*) FROM customer;
      SELECT COUNT(*) FROM income_band;
      

      The number of rows in each SS table is as follows:

      Table name

      Number of Rows

      Store_Sales

      287997024

      Date_Dim

      73049

      Store

      402

      Item

      204000

      Time_Dim

      86400

      Promotion

      1000

      Customer_Demographics

      1920800

      Customer_Address

      1000000

      Household_Demographics

      7200

      Customer

      1981703

      Income_Band

      20

  4. Run the ANALYZE command to update statistics.

    1
    ANALYZE;
    

    If ANALYZE is returned, the execution is successful.

    1
    ANALYZE
    

    The ANALYZE statement collects statistics about table content in databases, which will be stored in the PG_STATISTIC system catalog. Then, the query optimizer uses the statistics to work out the most efficient execution plan.

    After executing batch insertions and deletions, you are advised to run the ANALYZE statement on the table or the entire library to update statistics.