Using GaussDB(DWS) to Analyze the Supply Chain Requirements of a Company
This practice describes how to load the sample data set from OBS to a data warehouse cluster and perform data queries. This example comprises multi-table analysis and theme analysis in the data analysis scenario.
In this example, a standard TPC-H-1x data set of 1 GB size has been generated on GaussDB(DWS), and has been uploaded to the tpch folder of an OBS bucket. All Huawei Cloud accounts have been granted the read-only permission to access the OBS bucket. Users can easily import the data set using their accounts.
General Procedure
This practice takes about 60 minutes. The process is as follows:
Supported Regions
Table 1 describes the regions where OBS data has been uploaded.
Region |
OBS Bucket |
---|---|
CN North-Beijing1 |
dws-demo-cn-north-1 |
CN North-Beijing2 |
dws-demo-cn-north-2 |
CN North-Beijing4 |
dws-demo-cn-north-4 |
CN North-Ulanqab1 |
dws-demo-cn-north-9 |
CN East-Shanghai1 |
dws-demo-cn-east-3 |
CN East-Shanghai2 |
dws-demo-cn-east-2 |
CN South-Guangzhou |
dws-demo-cn-south-1 |
CN South-Guangzhou-InvitationOnly |
dws-demo-cn-south-4 |
CN-Hong Kong |
dws-demo-ap-southeast-1 |
AP-Singapore |
dws-demo-ap-southeast-3 |
AP-Bangkok |
dws-demo-ap-southeast-2 |
LA-Santiago |
dws-demo-la-south-2 |
AF-Johannesburg |
dws-demo-af-south-1 |
LA-Mexico City1 |
dws-demo-na-mexico-1 |
LA-Mexico City2 |
dws-demo-la-north-2 |
RU-Moscow2 |
dws-demo-ru-northwest-2 |
LA-Sao Paulo1 |
dws-demo-sa-brazil-1 |
Scenario Description
Understand the basic functions of GaussDB(DWS) and how to import data. Analyze the order data of a company and its suppliers as follows:
- Analyze the revenue brought by suppliers in a region to the company. The statistics can be used to determine whether a local allocation center needs to be established in a given region.
- Analyze the relationship between parts and suppliers to obtain the number of suppliers for parts based on the specified contribution conditions. The information can be used to determine whether suppliers are sufficient for large order quantities when the task is urgent.
- Analyze the revenue loss of small orders. You can query the average annual revenue loss if there are no small orders. Filter out small orders that are lower than 20% of the average supply volume, and calculate the total amount of those small orders to figure out the average annual revenue loss.
Making Preparations
- You have registered a GaussDB(DWS) account and checked the account status before using GaussDB(DWS). The account cannot be in arrears or frozen.
- You have obtained the AK and SK of the account.
- A cluster has been created and connected using Data Studio. For details, see Using GaussDB(DWS) to Query Vehicle Routes at Traffic Checkpoints in Seconds.
Step 1: Importing Sample Data
After connecting to the cluster using the SQL client tool, perform the following operations in the SQL client tool to import the TPC-H sample data and perform data queries.
- Create a database table.
The TPC-H sample data consists of eight database tables whose associations are shown in Figure 1.
Execute the following statements to create tables in the gaussdb database.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
CREATE SCHEMA tpch; SET current_schema = tpch; DROP TABLE if exists region; CREATE TABLE REGION ( R_REGIONKEY INT NOT NULL , R_NAME CHAR(25) NOT NULL , R_COMMENT VARCHAR(152) ) with (orientation = column, COMPRESSION=MIDDLE) distribute by replication; DROP TABLE if exists nation; CREATE TABLE NATION ( N_NATIONKEY INT NOT NULL, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INT NOT NULL, N_COMMENT VARCHAR(152) ) with (orientation = column,COMPRESSION=MIDDLE) distribute by replication; DROP TABLE if exists supplier; CREATE TABLE SUPPLIER ( S_SUPPKEY BIGINT NOT NULL, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INT NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(S_SUPPKEY); DROP TABLE if exists customer; CREATE TABLE CUSTOMER ( C_CUSTKEY BIGINT NOT NULL, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INT NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(C_CUSTKEY); DROP TABLE if exists part; CREATE TABLE PART ( P_PARTKEY BIGINT NOT NULL, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE BIGINT NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(P_PARTKEY); DROP TABLE if exists partsupp; CREATE TABLE PARTSUPP ( PS_PARTKEY BIGINT NOT NULL, PS_SUPPKEY BIGINT NOT NULL, PS_AVAILQTY BIGINT NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(PS_PARTKEY); DROP TABLE if exists orders; CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL, O_CUSTKEY BIGINT NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL , O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL , O_SHIPPRIORITY BIGINT NOT NULL, O_COMMENT VARCHAR(79) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(O_ORDERKEY); DROP TABLE if exists lineitem; CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY BIGINT NOT NULL, L_SUPPKEY BIGINT NOT NULL, L_LINENUMBER BIGINT NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL , L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL ) with (orientation = column,COMPRESSION=MIDDLE) distribute by hash(L_ORDERKEY);
- Create a foreign table, which is used to identify and associate the source data on OBS.
- <obs_bucket_name> indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Supported Regions. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
- , and replace <Access_Key_Id> and <Secret_Access_Key> with the value obtained in Making Preparations.
- Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
- If the message "ERROR: schema "xxx" does not exist Position" is displayed when you create a foreign table, the schema does not exist. Perform the previous step to create a schema.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
CREATE SCHEMA tpchobs; SET current_schema='tpchobs'; DROP FOREIGN table if exists region; CREATE FOREIGN TABLE REGION ( like tpch.region ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/region.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists nation; CREATE FOREIGN TABLE NATION ( like tpch.nation ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/nation.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists supplier; CREATE FOREIGN TABLE SUPPLIER ( like tpch.supplier ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/supplier.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists customer; CREATE FOREIGN TABLE CUSTOMER ( like tpch.customer ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/customer.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists part; CREATE FOREIGN TABLE PART ( like tpch.part ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/part.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists partsupp; CREATE FOREIGN TABLE PARTSUPP ( like tpch.partsupp ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/partsupp.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists orders; CREATE FOREIGN TABLE ORDERS ( like tpch.orders ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/orders.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' ); DROP FOREIGN table if exists lineitem; CREATE FOREIGN TABLE LINEITEM ( like tpch.lineitem ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/tpch/lineitem.tbl', format 'text', delimiter '|', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on' );
- Copy and execute the following statements to import the foreign table data to the corresponding database table.
Run the insert command to import the data in the OBS foreign table to the GaussDB(DWS) database table. The database kernel concurrently imports the OBS data at a high speed to GaussDB(DWS).
1 2 3 4 5 6 7 8
INSERT INTO tpch.lineitem SELECT * FROM tpchobs.lineitem; INSERT INTO tpch.part SELECT * FROM tpchobs.part; INSERT INTO tpch.partsupp SELECT * FROM tpchobs.partsupp; INSERT INTO tpch.customer SELECT * FROM tpchobs.customer; INSERT INTO tpch.supplier SELECT * FROM tpchobs.supplier; INSERT INTO tpch.nation SELECT * FROM tpchobs.nation; INSERT INTO tpch.region SELECT * FROM tpchobs.region; INSERT INTO tpch.orders SELECT * FROM tpchobs.orders;
It takes 10 minutes to import data.
Step 2: Performing Multi-Table Analysis and Theme Analysis
The following uses standard TPC-H query as an example to demonstrate how to perform basic data query on GaussDB(DWS).
Before querying data, run the Analyze command to generate statistics related to the database table. The statistics data is stored in system table PG_STATISTIC and is useful when you run the planner, which provides you with an efficient query execution plan.
The following are querying examples:
- Querying revenue of a supplier in a region (TPCH-Q5)
By executing the TPCH-Q5 query statement, you can query the revenue statistics of a spare parts supplier in a region. The revenue is calculated based on sum( l_extendedprice * (1 - l_discount)). The statistics can be used to determine whether a local allocation center needs to be established in a given region.
Copy and execute the following TPCH-Q5 statement for query. This statement features multi-table join query with GROUP BY, ORDER BY, and AGGREGATE.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
SET current_schema='tpch'; SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= '1994-01-01'::date and o_orderdate < '1994-01-01'::date + interval '1 year' group by n_name order by revenue desc;
- Querying relationships between spare parts and suppliers (TPCH-Q16)
By executing the TPCH-Q16 query statement, you can obtain the number of suppliers that can supply spare parts with the specified contribution conditions. This information can be used to determine whether there are sufficient suppliers when the order quantity is large and the task is urgent.
Copy and execute the following TPCH-Q16 statement for query. The statement features multi-table connection operations with group by, sort by, aggregate, deduplicate, and NOT IN subquery.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
SET current_schema='tpch'; SELECT p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt FROM partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size limit 100;
- Querying revenue loss of small orders (TPCH-Q17)
You can query the average annual revenue loss if there are no small orders. Filter out small orders that are lower than the 20% of the average supply volume, and calculate the total amount of those small orders to figure out the average annual revenue loss.
Copy and execute the following TPCH-Q17 statement for query. The statement features multi-table connection operations with aggregate and aggregate subquery.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SET current_schema='tpch'; SELECT sum(l_extendedprice) / 7.0 as avg_yearly FROM lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot