Using GaussDB(DWS) to Analyze the Operational Status of a Retail Department Store
Background
In this practice, the daily business data of each retail store is loaded from OBS to the corresponding table in the data warehouse cluster for summarizing and querying KPIs. This data includes store turnover, customer flow, monthly sales ranking, monthly customer flow conversion rate, monthly price-rent ratio, and sales per unit area. This example demonstrates the multidimensional query and analysis of GaussDB(DWS) in the retail scenario.
The sample data has been uploaded to the retail-data folder in an OBS bucket, and all HUAWEI CLOUD accounts have been granted the read-only permission to access the OBS bucket.
General Procedure
This practice takes about 60 minutes. The process is as follows:
Supported Regions
Table 1 describes the regions where OBS data has been uploaded.
Region |
OBS Bucket |
---|---|
CN North-Beijing1 |
dws-demo-cn-north-1 |
CN North-Beijing2 |
dws-demo-cn-north-2 |
CN North-Beijing4 |
dws-demo-cn-north-4 |
CN North-Ulanqab1 |
dws-demo-cn-north-9 |
CN East-Shanghai1 |
dws-demo-cn-east-3 |
CN East-Shanghai2 |
dws-demo-cn-east-2 |
CN South-Guangzhou |
dws-demo-cn-south-1 |
CN South-Guangzhou-InvitationOnly |
dws-demo-cn-south-4 |
CN-Hong Kong |
dws-demo-ap-southeast-1 |
AP-Singapore |
dws-demo-ap-southeast-3 |
AP-Bangkok |
dws-demo-ap-southeast-2 |
LA-Santiago |
dws-demo-la-south-2 |
AF-Johannesburg |
dws-demo-af-south-1 |
LA-Mexico City1 |
dws-demo-na-mexico-1 |
LA-Mexico City2 |
dws-demo-la-north-2 |
RU-Moscow2 |
dws-demo-ru-northwest-2 |
LA-Sao Paulo1 |
dws-demo-sa-brazil-1 |
Preparations
- You have registered a GaussDB(DWS) account, and the account is not in arrears or frozen.
- You have obtained the AK and SK of the account.
- A cluster has been created and connected using Data Studio. For details, see Step 1: Creating a Cluster and Step 2: Using Data Studio to Connect to a Cluster.
Step 1: Importing Sample Data from the Retail Department Store
After connecting to the cluster using the SQL client tool, perform the following operations in the SQL client tool to import the sample data from retail department stores and perform queries.
- Execute the following statement to create the retail database:
1
CREATE DATABASE retail encoding 'utf8' template template0;
- Perform the following steps to switch to the new database:
- In the Object Browser window of the Data Studio client, right-click the database connection and choose Refresh from the shortcut menu. Then, the new database is displayed.
- Right-click the name of the new database retail and choose Connect to DB from the shortcut menu.
- Right-click the name of the new database retail and choose Open Terminal from the shortcut menu. The SQL command window for connecting to the specified database is displayed. Perform the following steps in the window.
- Create a database table.
The sample data consists of 10 database tables whose associations are shown in Figure 1.
Copy and execute the following statements to switch to create a database table of retail department store information.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
CREATE SCHEMA retail_data; SET current_schema='retail_data'; DROP TABLE IF EXISTS STORE; CREATE TABLE STORE ( ID INT, STORECODE VARCHAR(10), STORENAME VARCHAR(100), FIRMID INT, FLOOR INT, BRANDID INT, RENTAMOUNT NUMERIC(18,2), RENTAREA NUMERIC(18,2) ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS POS; CREATE TABLE POS( ID INT, POSCODE VARCHAR(20), STATUS INT, MODIFICATIONDATE DATE ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS BRAND; CREATE TABLE BRAND ( ID INT, BRANDCODE VARCHAR(10), BRANDNAME VARCHAR(100), SECTORID INT ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS SECTOR; CREATE TABLE SECTOR( ID INT, SECTORCODE VARCHAR(10), SECTORNAME VARCHAR(20), CATEGORYID INT ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS CATEGORY; CREATE TABLE CATEGORY( ID INT, CODE VARCHAR(10), NAME VARCHAR(20) ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS FIRM; CREATE TABLE FIRM( ID INT, CODE VARCHAR(4), NAME VARCHAR(40), CITYID INT, CITYNAME VARCHAR(10), CITYCODE VARCHAR(20) ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS DATE; CREATE TABLE DATE( ID INT, DATEKEY DATE, YEAR INT, MONTH INT, DAY INT, WEEK INT, WEEKDAY INT ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS PAYTYPE; CREATE TABLE PAYTYPE( ID INT, CODE VARCHAR(10), TYPE VARCHAR(10), SIGNDATE DATE ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION; DROP TABLE IF EXISTS SALES; CREATE TABLE SALES( ID INT, POSID INT, STOREID INT, DATEKEY INT, PAYTYPE INT, TOTALAMOUNT NUMERIC(18,2), DISCOUNTAMOUNT NUMERIC(18,2), ITEMCOUNT INT, PAIDAMOUNT NUMERIC(18,2) ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY HASH(ID); DROP TABLE IF EXISTS FLOW; CREATE TABLE FLOW ( ID INT, STOREID INT, DATEKEY INT, INFLOWVALUE INT ) WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY HASH(ID);
- Create a foreign table, which is used to identify and associate the source data on OBS.
- <obs_bucket_name> indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Supported Regions. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
- , and replace <Access_Key_Id> and <Secret_Access_Key> with the value obtained in Preparations.
- Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
- If the message "ERROR: schema "xxx" does not exist Position" is displayed when you create a foreign table, the schema does not exist. Perform the previous step to create a schema.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
CREATE SCHEMA retail_obs_data; SET current_schema='retail_obs_data'; DROP FOREIGN table if exists SALES_OBS; CREATE FOREIGN TABLE SALES_OBS ( like retail_data.SALES ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/sales', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists FLOW_OBS; CREATE FOREIGN TABLE FLOW_OBS ( like retail_data.flow ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/flow', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists BRAND_OBS; CREATE FOREIGN TABLE BRAND_OBS ( like retail_data.brand ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/brand', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists CATEGORY_OBS; CREATE FOREIGN TABLE CATEGORY_OBS ( like retail_data.category ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/category', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists DATE_OBS; CREATE FOREIGN TABLE DATE_OBS ( like retail_data.date ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/date', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists FIRM_OBS; CREATE FOREIGN TABLE FIRM_OBS ( like retail_data.firm ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/firm', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists PAYTYPE_OBS; CREATE FOREIGN TABLE PAYTYPE_OBS ( like retail_data.paytype ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/paytype', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists POS_OBS; CREATE FOREIGN TABLE POS_OBS ( like retail_data.pos ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/pos', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists SECTOR_OBS; CREATE FOREIGN TABLE SECTOR_OBS ( like retail_data.sector ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/sector', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' ); DROP FOREIGN table if exists STORE_OBS; CREATE FOREIGN TABLE STORE_OBS ( like retail_data.store ) SERVER gsmpp_server OPTIONS ( encoding 'utf8', location 'obs://<obs_bucket_name>/retail-data/store', format 'csv', delimiter ',', access_key '<Access_Key_Id>', secret_access_key '<Secret_Access_Key>', chunksize '64', IGNORE_EXTRA_DATA 'on', header 'on' );
- Copy and execute the following statements to import the foreign table data to the cluster:
1 2 3 4 5 6 7 8 9 10
INSERT INTO retail_data.store SELECT * FROM retail_obs_data.STORE_OBS; INSERT INTO retail_data.sector SELECT * FROM retail_obs_data.SECTOR_OBS; INSERT INTO retail_data.paytype SELECT * FROM retail_obs_data.PAYTYPE_OBS; INSERT INTO retail_data.firm SELECT * FROM retail_obs_data.FIRM_OBS; INSERT INTO retail_data.flow SELECT * FROM retail_obs_data.FLOW_OBS; INSERT INTO retail_data.category SELECT * FROM retail_obs_data.CATEGORY_OBS; INSERT INTO retail_data.date SELECT * FROM retail_obs_data.DATE_OBS; INSERT INTO retail_data.pos SELECT * FROM retail_obs_data.POS_OBS; INSERT INTO retail_data.brand SELECT * FROM retail_obs_data.BRAND_OBS; INSERT INTO retail_data.sales SELECT * FROM retail_obs_data.SALES_OBS;
It takes some time to import data.
- Copy and execute the following statement to create the v_sales_flow_details view:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SET current_schema='retail_data'; CREATE VIEW v_sales_flow_details AS SELECT FIRM.ID FIRMID, FIRM.NAME FIRNAME, FIRM. CITYCODE, CATEGORY.ID CATEGORYID, CATEGORY.NAME CATEGORYNAME, SECTOR.ID SECTORID, SECTOR.SECTORNAME, BRAND.ID BRANDID, BRAND.BRANDNAME, STORE.ID STOREID, STORE.STORENAME, STORE.RENTAMOUNT, STORE.RENTAREA, DATE.DATEKEY, SALES.TOTALAMOUNT, DISCOUNTAMOUNT, ITEMCOUNT, PAIDAMOUNT, INFLOWVALUE FROM SALES INNER JOIN STORE ON SALES.STOREID = STORE.ID INNER JOIN FIRM ON STORE.FIRMID = FIRM.ID INNER JOIN BRAND ON STORE.BRANDID = BRAND.ID INNER JOIN SECTOR ON BRAND.SECTORID = SECTOR.ID INNER JOIN CATEGORY ON SECTOR.CATEGORYID = CATEGORY.ID INNER JOIN DATE ON SALES.DATEKEY = DATE.ID INNER JOIN FLOW ON FLOW.DATEKEY = DATE.ID AND FLOW.STOREID = STORE.ID;
Step 2: Performing Operations Status Analysis
The following uses standard query of retail information from department stores as an example to demonstrate how to perform basic data query on GaussDB(DWS).
Before querying data, run the Analyze command to generate statistics related to the database table. The statistics data is stored in system table PG_STATISTIC and is useful when you run the planner, which provides you with an efficient query execution plan.
The following are querying examples:
- Querying the monthly sales revenue of each store
Copy and execute the following statements to query the total revenue of each store in a certain month:
1 2 3 4 5 6 7 8
SET current_schema='retail_data'; SELECT DATE_TRUNC('month',datekey) AT TIME ZONE 'UTC' AS __timestamp, SUM(paidamount) AS sum__paidamount FROM v_sales_flow_details GROUP BY DATE_TRUNC('month',datekey) AT TIME ZONE 'UTC' ORDER BY SUM(paidamount) DESC;
- Querying the sales revenue and price-rent ratio of each store
Copy and execute the following statement to query the sales revenue and price-rent ratio of each store:
1 2 3 4 5 6 7 8 9 10
SET current_schema='retail_data'; SELECT firname AS firname, storename AS storename, SUM(paidamount) AS sum__paidamount, AVG(RENTAMOUNT)/SUM(PAIDAMOUNT) AS rentamount_sales_rate FROM v_sales_flow_details GROUP BY firname, storename ORDER BY SUM(paidamount) DESC;
- Analyzing the sales revenue of each city
Copy and execute the following statement to analyze and query the sales revenue of all provinces:
1 2 3 4 5 6 7
SET current_schema='retail_data'; SELECT citycode AS citycode, SUM(paidamount) AS sum__paidamount FROM v_sales_flow_details GROUP BY citycode ORDER BY SUM(paidamount) DESC;
- Analyzing and comparing the price-rent ratio and customer flow conversion rate of each store
1 2 3 4 5 6 7 8 9
SET current_schema='retail_data'; SELECT brandname AS brandname, firname AS firname, SUM(PAIDAMOUNT)/AVG(RENTAREA) AS sales_rentarea_rate, SUM(ITEMCOUNT)/SUM(INFLOWVALUE) AS poscount_flow_rate, AVG(RENTAMOUNT)/SUM(PAIDAMOUNT) AS rentamount_sales_rate FROM v_sales_flow_details GROUP BY brandname, firname ORDER BY sales_rentarea_rate DESC;
- Analyzing brands in the retail industry
1 2 3 4 5 6 7 8
SET current_schema='retail_data'; SELECT categoryname AS categoryname, brandname AS brandname, SUM(paidamount) AS sum__paidamount FROM v_sales_flow_details GROUP BY categoryname, brandname ORDER BY sum__paidamount DESC;
- Querying daily sales information of each brand
1 2 3 4 5 6 7 8 9 10 11
SET current_schema='retail_data'; SELECT brandname AS brandname, DATE_TRUNC('day', datekey) AT TIME ZONE 'UTC' AS __timestamp, SUM(paidamount) AS sum__paidamount FROM v_sales_flow_details WHERE datekey >= '2016-01-01 00:00:00' AND datekey <= '2016-01-30 00:00:00' GROUP BY brandname, DATE_TRUNC('day', datekey) AT TIME ZONE 'UTC' ORDER BY sum__paidamount ASC LIMIT 50000;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot