Help Center/ GaussDB(DWS)/ Best Practices/ Sample Data Analysis/ Operations Status Analysis of a Retail Department Store
Updated on 2024-03-13 GMT+08:00

Operations Status Analysis of a Retail Department Store

Background

In this practice, the daily business data of each retail store is loaded from OBS to the corresponding table in the data warehouse cluster for summarizing and querying KPIs. This data includes store turnover, customer flow, monthly sales ranking, monthly customer flow conversion rate, monthly price-rent ratio, and sales per unit area. This example demonstrates the multidimensional query and analysis of GaussDB(DWS) in the retail scenario.

The sample data has been uploaded to the retail-data folder in an OBS bucket, and all HUAWEI CLOUD accounts have been granted the read-only permission to access the OBS bucket.

General Procedure

This practice takes about 60 minutes. The process is as follows:

  1. Preparations
  2. Step 1: Importing Sample Data from the Retail Department Store
  3. Step 2: Performing Operations Status Analysis

Supported Regions

Table 1 Regions and OBS bucket names

Region

OBS Bucket

EU-Dublin

dws-demo-eu-west-101

Preparations

Step 1: Importing Sample Data from the Retail Department Store

After connecting to the cluster using the SQL client tool, perform the following operations in the SQL client tool to import the sample data from retail department stores and perform queries.

  1. Execute the following statement to create the retail database:

    1
    CREATE DATABASE retail encoding 'utf8' template template0; 
    

  2. Perform the following steps to switch to the new database:

    1. In the Object Browser window of the Data Studio client, right-click the database connection and choose Refresh from the shortcut menu. Then, the new database is displayed.
    2. Right-click the name of the new database retail and choose Connect to DB from the shortcut menu.
    3. Right-click the name of the new database retail and choose Open Terminal from the shortcut menu. The SQL command window for connecting to the specified database is displayed. Perform the following steps in the window.

  3. Create a database table.

    The sample data consists of 10 database tables whose associations are shown in Figure 1.

    Figure 1 Sample data tables of retail department stores
    Copy and execute the following statements to switch to create a database table of retail department store information.
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    CREATE SCHEMA retail_data;
    SET current_schema='retail_data';
    
    DROP TABLE IF EXISTS STORE;
    CREATE TABLE STORE (
            ID INT, 
            STORECODE VARCHAR(10), 
            STORENAME VARCHAR(100), 
            FIRMID INT, 
            FLOOR INT, 
            BRANDID INT, 
            RENTAMOUNT NUMERIC(18,2), 
            RENTAREA NUMERIC(18,2)
    ) 
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS POS;
    CREATE TABLE POS(
            ID INT, 
            POSCODE VARCHAR(20), 
            STATUS INT, 
            MODIFICATIONDATE DATE
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS BRAND;
    CREATE TABLE BRAND (
            ID INT, 
            BRANDCODE VARCHAR(10), 
            BRANDNAME VARCHAR(100), 
            SECTORID INT
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS SECTOR;
    CREATE TABLE SECTOR(
            ID INT, 
            SECTORCODE VARCHAR(10), 
            SECTORNAME VARCHAR(20), 
            CATEGORYID INT
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS CATEGORY;
    CREATE TABLE CATEGORY(
            ID INT, 
            CODE VARCHAR(10), 
            NAME VARCHAR(20)
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS FIRM;
    CREATE TABLE FIRM(
            ID INT, 
            CODE VARCHAR(4), 
            NAME VARCHAR(40), 
            CITYID INT, 
            CITYNAME VARCHAR(10),
            CITYCODE VARCHAR(20)
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS DATE;
    CREATE TABLE DATE(
            ID INT, 
            DATEKEY DATE, 
            YEAR INT, 
            MONTH INT, 
            DAY INT, 
            WEEK INT, 
            WEEKDAY INT
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS PAYTYPE;
    CREATE TABLE PAYTYPE(
            ID INT, 
            CODE VARCHAR(10), 
            TYPE VARCHAR(10), 
            SIGNDATE DATE
    )
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY REPLICATION;
    
    DROP TABLE IF EXISTS SALES;
    CREATE TABLE SALES(
             ID INT, 
             POSID INT, 
             STOREID INT, 
             DATEKEY INT, 
             PAYTYPE INT, 
             TOTALAMOUNT NUMERIC(18,2),
             DISCOUNTAMOUNT NUMERIC(18,2), 
             ITEMCOUNT INT, 
             PAIDAMOUNT NUMERIC(18,2)
    ) 
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY HASH(ID);
    
    DROP TABLE IF EXISTS FLOW;
    CREATE TABLE FLOW (
             ID INT, 
             STOREID INT, 
             DATEKEY INT, 
             INFLOWVALUE INT
    ) 
    WITH (ORIENTATION = COLUMN, COMPRESSION=MIDDLE) DISTRIBUTE BY HASH(ID);
    

  4. Create a foreign table, which is used to identify and associate the source data on OBS.

    • <obs_bucket_name> indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Supported Regions. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
    • , and replace <Access_Key_Id> and <Secret_Access_Key> with the value obtained in Preparations.
    • // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
    • If the message "ERROR: schema "xxx" does not exist Position" is displayed when you create a foreign table, the schema does not exist. Perform the previous step to create a schema.
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    CREATE SCHEMA retail_obs_data;
    SET current_schema='retail_obs_data';
    DROP FOREIGN table if exists SALES_OBS;
    CREATE FOREIGN TABLE SALES_OBS
    (
            like retail_data.SALES
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/sales',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    DROP FOREIGN table if exists FLOW_OBS;
    CREATE FOREIGN TABLE FLOW_OBS
    (
            like retail_data.flow
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/flow',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    DROP FOREIGN table if exists BRAND_OBS;
    CREATE FOREIGN TABLE BRAND_OBS
    (
            like retail_data.brand
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/brand',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    
    DROP FOREIGN table if exists CATEGORY_OBS;
    CREATE FOREIGN TABLE CATEGORY_OBS
    (
           like retail_data.category
    )
    SERVER gsmpp_server 
    OPTIONS (
           encoding 'utf8',
           location 'obs://<obs_bucket_name>/retail-data/category',
           format 'csv',
           delimiter ',',
           access_key '<Access_Key_Id>',
           secret_access_key '<Secret_Access_Key>',
           chunksize '64',
           IGNORE_EXTRA_DATA 'on',
           header 'on'
    );
    
    DROP FOREIGN table if exists DATE_OBS;
    CREATE FOREIGN TABLE DATE_OBS
    (
            like retail_data.date
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/date',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    DROP FOREIGN table if exists FIRM_OBS;
    CREATE FOREIGN TABLE FIRM_OBS
    (
            like retail_data.firm
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/firm',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    
    DROP FOREIGN table if exists PAYTYPE_OBS;
    CREATE FOREIGN TABLE PAYTYPE_OBS
    (
            like retail_data.paytype
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/paytype',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    
    DROP FOREIGN table if exists POS_OBS;
    CREATE FOREIGN TABLE POS_OBS
    (
            like retail_data.pos
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/pos',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    DROP FOREIGN table if exists SECTOR_OBS;
    CREATE FOREIGN TABLE SECTOR_OBS
    (
            like retail_data.sector
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/retail-data/sector',
            format 'csv',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on',
            header 'on'
    );
    
    
    DROP FOREIGN table if exists STORE_OBS;
    CREATE FOREIGN TABLE STORE_OBS
    (
             like retail_data.store
    )
    SERVER gsmpp_server 
    OPTIONS (
             encoding 'utf8',
             location 'obs://<obs_bucket_name>/retail-data/store',
             format 'csv',
             delimiter ',',
             access_key '<Access_Key_Id>',
             secret_access_key '<Secret_Access_Key>',
             chunksize '64',
             IGNORE_EXTRA_DATA 'on',
             header 'on'
    );
    

  5. Copy and execute the following statements to import the foreign table data to the cluster:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    INSERT INTO retail_data.store SELECT * FROM retail_obs_data.STORE_OBS;
    INSERT INTO retail_data.sector SELECT * FROM retail_obs_data.SECTOR_OBS;
    INSERT INTO retail_data.paytype SELECT * FROM retail_obs_data.PAYTYPE_OBS;
    INSERT INTO retail_data.firm SELECT * FROM retail_obs_data.FIRM_OBS;
    INSERT INTO retail_data.flow SELECT * FROM retail_obs_data.FLOW_OBS;
    INSERT INTO retail_data.category SELECT * FROM retail_obs_data.CATEGORY_OBS;
    INSERT INTO retail_data.date SELECT * FROM retail_obs_data.DATE_OBS;
    INSERT INTO retail_data.pos SELECT * FROM retail_obs_data.POS_OBS;
    INSERT INTO retail_data.brand SELECT * FROM retail_obs_data.BRAND_OBS;
    INSERT INTO retail_data.sales SELECT * FROM retail_obs_data.SALES_OBS;
    

    It takes some time to import data.

  6. Copy and execute the following statement to create the v_sales_flow_details view:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    SET current_schema='retail_data';
    CREATE VIEW v_sales_flow_details AS 
    SELECT 
    FIRM.ID FIRMID, FIRM.NAME FIRNAME, FIRM. CITYCODE,
    CATEGORY.ID CATEGORYID, CATEGORY.NAME CATEGORYNAME, 
    SECTOR.ID SECTORID, SECTOR.SECTORNAME,
    BRAND.ID BRANDID, BRAND.BRANDNAME,
    STORE.ID STOREID, STORE.STORENAME, STORE.RENTAMOUNT, STORE.RENTAREA,
    DATE.DATEKEY, SALES.TOTALAMOUNT, DISCOUNTAMOUNT, ITEMCOUNT, PAIDAMOUNT, INFLOWVALUE
    FROM SALES
    INNER JOIN STORE ON SALES.STOREID = STORE.ID
    INNER JOIN FIRM ON STORE.FIRMID = FIRM.ID
    INNER JOIN BRAND ON STORE.BRANDID = BRAND.ID
    INNER JOIN SECTOR ON BRAND.SECTORID = SECTOR.ID
    INNER JOIN CATEGORY ON SECTOR.CATEGORYID = CATEGORY.ID
    INNER JOIN DATE ON SALES.DATEKEY = DATE.ID
    INNER JOIN FLOW ON FLOW.DATEKEY = DATE.ID AND FLOW.STOREID = STORE.ID;
    

Step 2: Performing Operations Status Analysis

The following uses standard query of retail information from department stores as an example to demonstrate how to perform basic data query on GaussDB(DWS).

Before querying data, run the Analyze command to generate statistics related to the database table. The statistics data is stored in system table PG_STATISTIC and is useful when you run the planner, which provides you with an efficient query execution plan.

The following are querying examples:

  • Querying the monthly sales revenue of each store

    Copy and execute the following statements to query the total revenue of each store in a certain month:

    1
    2
    3
    4
    5
    6
    7
    8
    SET current_schema='retail_data';
    SELECT DATE_TRUNC('month',datekey) 
    AT TIME ZONE 'UTC' AS __timestamp,
    SUM(paidamount)
    AS sum__paidamount
    FROM v_sales_flow_details
    GROUP BY DATE_TRUNC('month',datekey) AT TIME ZONE 'UTC'
    ORDER BY SUM(paidamount) DESC;
    
  • Querying the sales revenue and price-rent ratio of each store

    Copy and execute the following statement to query the sales revenue and price-rent ratio of each store:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    SET current_schema='retail_data';
    SELECT firname AS firname,
    storename AS storename,
    SUM(paidamount)
    AS sum__paidamount,
    AVG(RENTAMOUNT)/SUM(PAIDAMOUNT)
    AS rentamount_sales_rate
    FROM v_sales_flow_details
    GROUP BY firname, storename
    ORDER BY SUM(paidamount) DESC;
    
  • Analyzing the sales revenue of each city

    Copy and execute the following statement to analyze and query the sales revenue of all provinces:

    1
    2
    3
    4
    5
    6
    7
    SET current_schema='retail_data';
    SELECT citycode AS citycode,
    SUM(paidamount)
    AS sum__paidamount
    FROM v_sales_flow_details
    GROUP BY citycode
    ORDER BY SUM(paidamount) DESC;
    
  • Analyzing and comparing the price-rent ratio and customer flow conversion rate of each store
    1
    2
    3
    4
    5
    6
    7
    8
    9
    SET current_schema='retail_data';
    SELECT brandname AS brandname,
    firname AS firname,
    SUM(PAIDAMOUNT)/AVG(RENTAREA) AS sales_rentarea_rate,
    SUM(ITEMCOUNT)/SUM(INFLOWVALUE) AS poscount_flow_rate,
    AVG(RENTAMOUNT)/SUM(PAIDAMOUNT) AS rentamount_sales_rate
    FROM v_sales_flow_details
    GROUP BY brandname,  firname
    ORDER BY sales_rentarea_rate DESC;
    
  • Analyzing brands in the retail industry
    1
    2
    3
    4
    5
    6
    7
    8
    SET current_schema='retail_data';
    SELECT categoryname AS categoryname,
    brandname AS brandname,
    SUM(paidamount) AS sum__paidamount
    FROM v_sales_flow_details
    GROUP BY categoryname,
    brandname
    ORDER BY sum__paidamount DESC;
    
  • Querying daily sales information of each brand
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    SET current_schema='retail_data';
    SELECT brandname AS brandname,
    DATE_TRUNC('day', datekey) AT TIME ZONE 'UTC' AS __timestamp,
    SUM(paidamount) AS sum__paidamount
    FROM v_sales_flow_details
    WHERE datekey >= '2016-01-01 00:00:00'
    AND datekey <= '2016-01-30 00:00:00'
    GROUP BY brandname,
    DATE_TRUNC('day', datekey) AT TIME ZONE 'UTC'
    ORDER BY sum__paidamount ASC
    LIMIT 50000;