Halaman ini belum tersedia dalam bahasa lokal Anda. Kami berusaha keras untuk menambahkan lebih banyak versi bahasa. Terima kasih atas dukungan Anda.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Best Practices/ Data Analytics/ Using GaussDB(DWS) to Analyze the Supply Chain Requirements of a Company

Using GaussDB(DWS) to Analyze the Supply Chain Requirements of a Company

Updated on 2024-11-08 GMT+08:00

This practice describes how to load the sample data set from OBS to a data warehouse cluster and perform data queries. This example comprises multi-table analysis and theme analysis in the data analysis scenario.

NOTE:

In this example, a standard TPC-H-1x data set of 1 GB size has been generated on GaussDB(DWS), and has been uploaded to the tpch folder of an OBS bucket. All Huawei Cloud accounts have been granted the read-only permission to access the OBS bucket. Users can easily import the data set using their accounts.

General Procedure

This practice takes about 60 minutes. The process is as follows:

  1. Making Preparations
  2. Step 1: Importing Sample Data
  3. Step 2: Performing Multi-Table Analysis and Theme Analysis

Supported Regions

Table 1 describes the regions where OBS data has been uploaded.

Table 1 Regions and OBS bucket names

Region

OBS Bucket

CN North-Beijing1

dws-demo-cn-north-1

CN North-Beijing2

dws-demo-cn-north-2

CN North-Beijing4

dws-demo-cn-north-4

CN North-Ulanqab1

dws-demo-cn-north-9

CN East-Shanghai1

dws-demo-cn-east-3

CN East-Shanghai2

dws-demo-cn-east-2

CN South-Guangzhou

dws-demo-cn-south-1

CN South-Guangzhou-InvitationOnly

dws-demo-cn-south-4

CN-Hong Kong

dws-demo-ap-southeast-1

AP-Singapore

dws-demo-ap-southeast-3

AP-Bangkok

dws-demo-ap-southeast-2

LA-Santiago

dws-demo-la-south-2

AF-Johannesburg

dws-demo-af-south-1

LA-Mexico City1

dws-demo-na-mexico-1

LA-Mexico City2

dws-demo-la-north-2

RU-Moscow2

dws-demo-ru-northwest-2

LA-Sao Paulo1

dws-demo-sa-brazil-1

Scenario Description

Understand the basic functions of GaussDB(DWS) and how to import data. Analyze the order data of a company and its suppliers as follows:

  1. Analyze the revenue brought by suppliers in a region to the company. The statistics can be used to determine whether a local allocation center needs to be established in a given region.
  2. Analyze the relationship between parts and suppliers to obtain the number of suppliers for parts based on the specified contribution conditions. The information can be used to determine whether suppliers are sufficient for large order quantities when the task is urgent.
  3. Analyze the revenue loss of small orders. You can query the average annual revenue loss if there are no small orders. Filter out small orders that are lower than 20% of the average supply volume, and calculate the total amount of those small orders to figure out the average annual revenue loss.

Making Preparations

Step 1: Importing Sample Data

After connecting to the cluster using the SQL client tool, perform the following operations in the SQL client tool to import the TPC-H sample data and perform data queries.

  1. Create a database table.

    The TPC-H sample data consists of eight database tables whose associations are shown in Figure 1.

    Figure 1 TPC-H data tables
    Execute the following statements to create tables in the gaussdb database.
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    CREATE SCHEMA tpch; 
    SET current_schema = tpch;
    
    DROP TABLE if exists region;
    CREATE TABLE REGION
    (
            R_REGIONKEY  INT NOT NULL , 
            R_NAME       CHAR(25) NOT NULL ,
            R_COMMENT    VARCHAR(152)
    )
    with (orientation = column, COMPRESSION=MIDDLE)
    distribute by replication;
    
    DROP TABLE if exists nation;
    CREATE TABLE NATION
    (
            N_NATIONKEY  INT NOT NULL, 
            N_NAME       CHAR(25) NOT NULL,
            N_REGIONKEY  INT NOT NULL,
            N_COMMENT    VARCHAR(152)
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by replication;
    
    DROP TABLE if exists supplier;
    CREATE TABLE SUPPLIER
    (
            S_SUPPKEY     BIGINT NOT NULL,
            S_NAME        CHAR(25) NOT NULL,
            S_ADDRESS     VARCHAR(40) NOT NULL,
            S_NATIONKEY   INT NOT NULL,
            S_PHONE       CHAR(15) NOT NULL,
            S_ACCTBAL     DECIMAL(15,2) NOT NULL,
            S_COMMENT     VARCHAR(101) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(S_SUPPKEY);
    
    DROP TABLE if exists customer;
    CREATE TABLE CUSTOMER
    (
            C_CUSTKEY     BIGINT NOT NULL,
            C_NAME        VARCHAR(25) NOT NULL,
            C_ADDRESS     VARCHAR(40) NOT NULL, 
            C_NATIONKEY   INT NOT NULL, 
            C_PHONE       CHAR(15) NOT NULL, 
            C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
            C_MKTSEGMENT  CHAR(10) NOT NULL, 
            C_COMMENT     VARCHAR(117) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(C_CUSTKEY);
    
    DROP TABLE if exists part;
    CREATE TABLE PART
    (
            P_PARTKEY     BIGINT NOT NULL, 
            P_NAME        VARCHAR(55) NOT NULL, 
            P_MFGR        CHAR(25) NOT NULL, 
            P_BRAND       CHAR(10) NOT NULL, 
            P_TYPE        VARCHAR(25) NOT NULL,
            P_SIZE        BIGINT NOT NULL,
            P_CONTAINER   CHAR(10) NOT NULL,
            P_RETAILPRICE DECIMAL(15,2) NOT NULL,
            P_COMMENT     VARCHAR(23) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(P_PARTKEY);
    
    DROP TABLE if exists partsupp;
    CREATE TABLE PARTSUPP
    (
            PS_PARTKEY     BIGINT NOT NULL,
            PS_SUPPKEY     BIGINT NOT NULL, 
            PS_AVAILQTY    BIGINT NOT NULL,
            PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL, 
            PS_COMMENT     VARCHAR(199) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(PS_PARTKEY);
    
    DROP TABLE if exists orders;
    CREATE TABLE ORDERS
    (
            O_ORDERKEY       BIGINT NOT NULL,
            O_CUSTKEY        BIGINT NOT NULL, 
            O_ORDERSTATUS    CHAR(1) NOT NULL, 
            O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
            O_ORDERDATE      DATE NOT NULL , 
            O_ORDERPRIORITY  CHAR(15) NOT NULL, 
            O_CLERK          CHAR(15) NOT NULL , 
            O_SHIPPRIORITY   BIGINT NOT NULL,
            O_COMMENT        VARCHAR(79) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(O_ORDERKEY);
    
    DROP TABLE if exists lineitem;
    CREATE TABLE LINEITEM
    (
            L_ORDERKEY    BIGINT NOT NULL,
            L_PARTKEY     BIGINT NOT NULL, 
            L_SUPPKEY     BIGINT NOT NULL,
            L_LINENUMBER  BIGINT NOT NULL,
            L_QUANTITY    DECIMAL(15,2) NOT NULL, 
            L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
            L_DISCOUNT    DECIMAL(15,2) NOT NULL,
            L_TAX         DECIMAL(15,2) NOT NULL, 
            L_RETURNFLAG  CHAR(1) NOT NULL,
            L_LINESTATUS  CHAR(1) NOT NULL,
            L_SHIPDATE    DATE NOT NULL, 
            L_COMMITDATE  DATE NOT NULL ,
            L_RECEIPTDATE DATE NOT NULL, 
            L_SHIPINSTRUCT CHAR(25) NOT NULL, 
            L_SHIPMODE     CHAR(10) NOT NULL, 
            L_COMMENT      VARCHAR(44) NOT NULL
    )
    with (orientation = column,COMPRESSION=MIDDLE)
    distribute by hash(L_ORDERKEY);
    

  2. Create a foreign table, which is used to identify and associate the source data on OBS.

    NOTICE:
    • <obs_bucket_name> indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Supported Regions. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
    • , and replace <Access_Key_Id> and <Secret_Access_Key> with the value obtained in Making Preparations.
    • Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
    • If the message "ERROR: schema "xxx" does not exist Position" is displayed when you create a foreign table, the schema does not exist. Perform the previous step to create a schema.
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    CREATE SCHEMA tpchobs;
    SET current_schema='tpchobs';
    DROP FOREIGN table if exists region;
    CREATE FOREIGN TABLE REGION
    (
            like tpch.region
    )                    
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/region.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
     
    DROP FOREIGN table if exists nation;
    CREATE FOREIGN TABLE NATION
    (
            like tpch.nation
    )
    SERVER gsmpp_server 
    OPTIONS (
             encoding 'utf8',
             location 'obs://<obs_bucket_name>/tpch/nation.tbl',
             format 'text',
             delimiter '|',
             access_key '<Access_Key_Id>',
             secret_access_key '<Secret_Access_Key>',
             chunksize '64',
             IGNORE_EXTRA_DATA 'on'
    );
     
    DROP FOREIGN table if exists supplier;
    CREATE FOREIGN TABLE SUPPLIER
    (
            like tpch.supplier
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/supplier.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
     
    DROP FOREIGN table if exists customer;
    CREATE FOREIGN TABLE CUSTOMER
    (
            like tpch.customer
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/customer.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    DROP FOREIGN table if exists part;
    CREATE FOREIGN TABLE PART
    (
            like tpch.part
    
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/part.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    DROP FOREIGN table if exists partsupp;
    CREATE FOREIGN TABLE PARTSUPP
    (
            like tpch.partsupp
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/partsupp.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    DROP FOREIGN table if exists orders;
    CREATE FOREIGN TABLE ORDERS
    (
            like tpch.orders
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/orders.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    DROP FOREIGN table if exists lineitem;
    CREATE FOREIGN TABLE LINEITEM
    (
            like tpch.lineitem
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/tpch/lineitem.tbl',
            format 'text',
            delimiter '|',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    

  3. Copy and execute the following statements to import the foreign table data to the corresponding database table.

    Run the insert command to import the data in the OBS foreign table to the GaussDB(DWS) database table. The database kernel concurrently imports the OBS data at a high speed to GaussDB(DWS).

    1
    2
    3
    4
    5
    6
    7
    8
    INSERT INTO tpch.lineitem SELECT * FROM tpchobs.lineitem;
    INSERT INTO tpch.part SELECT * FROM tpchobs.part;
    INSERT INTO tpch.partsupp SELECT * FROM tpchobs.partsupp;
    INSERT INTO tpch.customer SELECT * FROM tpchobs.customer;
    INSERT INTO tpch.supplier SELECT * FROM tpchobs.supplier;
    INSERT INTO tpch.nation SELECT * FROM tpchobs.nation;
    INSERT INTO tpch.region SELECT * FROM tpchobs.region;
    INSERT INTO tpch.orders SELECT * FROM tpchobs.orders;
    

    It takes 10 minutes to import data.

Step 2: Performing Multi-Table Analysis and Theme Analysis

The following uses standard TPC-H query as an example to demonstrate how to perform basic data query on GaussDB(DWS).

Before querying data, run the Analyze command to generate statistics related to the database table. The statistics data is stored in system table PG_STATISTIC and is useful when you run the planner, which provides you with an efficient query execution plan.

The following are querying examples:

  • Querying revenue of a supplier in a region (TPCH-Q5)

    By executing the TPCH-Q5 query statement, you can query the revenue statistics of a spare parts supplier in a region. The revenue is calculated based on sum( l_extendedprice * (1 - l_discount)). The statistics can be used to determine whether a local allocation center needs to be established in a given region.

    Copy and execute the following TPCH-Q5 statement for query. This statement features multi-table join query with GROUP BY, ORDER BY, and AGGREGATE.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    SET current_schema='tpch';
    SELECT
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
    FROM
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
    where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= '1994-01-01'::date
    and o_orderdate < '1994-01-01'::date + interval '1 year'
    group by
    n_name
    order by
    revenue desc;
    
  • Querying relationships between spare parts and suppliers (TPCH-Q16)

    By executing the TPCH-Q16 query statement, you can obtain the number of suppliers that can supply spare parts with the specified contribution conditions. This information can be used to determine whether there are sufficient suppliers when the order quantity is large and the task is urgent.

    Copy and execute the following TPCH-Q16 statement for query. The statement features multi-table connection operations with group by, sort by, aggregate, deduplicate, and NOT IN subquery.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    SET current_schema='tpch';
    SELECT
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
    FROM
    partsupp,
    part
    where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type not like 'MEDIUM POLISHED%'
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in (
            select
            s_suppkey
            from
            supplier
            where
            s_comment like '%Customer%Complaints%'
    )
    group by
    p_brand,
    p_type,
    p_size
    order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size
    limit 100;
    
  • Querying revenue loss of small orders (TPCH-Q17)

    You can query the average annual revenue loss if there are no small orders. Filter out small orders that are lower than the 20% of the average supply volume, and calculate the total amount of those small orders to figure out the average annual revenue loss.

    Copy and execute the following TPCH-Q17 statement for query. The statement features multi-table connection operations with aggregate and aggregate subquery.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    SET current_schema='tpch';
    SELECT
    sum(l_extendedprice) / 7.0 as avg_yearly
    FROM
    lineitem,
    part
    where
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container = 'MED BOX'
    and l_quantity < (
            select 0.2 * avg(l_quantity)
            from lineitem
            where l_partkey = p_partkey
    );
    

Kami menggunakan cookie untuk meningkatkan kualitas situs kami dan pengalaman Anda. Dengan melanjutkan penelusuran di situs kami berarti Anda menerima kebijakan cookie kami. Cari tahu selengkapnya

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback