Updated on 2023-12-28 GMT+08:00

Checkpoint Vehicle Analysis

This practice shows you how to analyze passing vehicles at checkpoints. In this practice, 890 million data records from checkpoints are loaded to a single database table on GaussDB(DWS) for accurate and fuzzy query, demonstrating the ability of GaussDB(DWS) to perform high-performance query for historical data.

The sample data has been uploaded to the traffic-data folder in an OBS bucket, and all Huawei Cloud accounts have been granted the read-only permission for accessing the OBS bucket.

Supported Regions

Table 1 describes the regions where OBS data has been uploaded.

Table 1 Regions and OBS bucket names

Region

OBS Bucket

CN North-Beijing1

dws-demo-cn-north-1

CN North-Beijing2

dws-demo-cn-north-2

CN North-Beijing4

dws-demo-cn-north-4

CN North-Ulanqab1

dws-demo-cn-north-9

CN East-Shanghai1

dws-demo-cn-east-3

CN East-Shanghai2

dws-demo-cn-east-2

CN South-Guangzhou

dws-demo-cn-south-1

CN South-Guangzhou-InvitationOnly

dws-demo-cn-south-4

CN-Hong Kong

dws-demo-ap-southeast-1

AP-Singapore

dws-demo-ap-southeast-3

AP-Bangkok

dws-demo-ap-southeast-2

LA-Santiago

dws-demo-la-south-2

AF-Johannesburg

dws-demo-af-south-1

LA-Mexico City1

dws-demo-na-mexico-1

LA-Mexico City2

dws-demo-la-north-2

RU-Moscow2

dws-demo-ru-northwest-2

LA-Sao Paulo1

dws-demo-sa-brazil-1

Making Preparations

  • Register a GaussDB(DWS) account and check the account status before using GaussDB(DWS). The account cannot be in arrears or frozen.
  • You have obtained the AK and SK of the account.

Step 1: Creating a Cluster

  1. Log in to the management console.
  2. Click Service List and choose Analytics > GaussDB(DWS).
  3. In the navigation pane on the left, choose Clusters. On the displayed page, click Create Cluster in the upper right corner.
  4. Configure the parameters according to Table 2.

    Table 2 Basic configurations

    Parameter

    Configuration

    Region

    Select CN North-Beijing4 or CN-Hong KongEU-Dublin.

    NOTE:

    CN-Hong Kong is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.

    AZ

    AZ2

    Resource

    Standard Warehouse

    Compute Resource

    ECS

    Storage type

    Cloud SSD

    CPU Architecture

    X86

    Node Flavor

    dws2.m6.4xlarge.8 (16 vCPUs | 128 GB | 2000 GB SSD)

    NOTE:

    If this flavor is sold out, select other AZs or flavors.

    Hot Storage

    100 GB/node

    Nodes

    3

  5. Verify that the information is correct and click Next: Configure Network. Configure the network by referring to Table 3.

    Table 3 Configuring the network

    Parameter

    Configuration

    VPC

    vpc-default

    Subnet

    subnet-default(192.168.0.0/24)

    Security Group

    Automatic creation

    EIP

    Buy now

    Bandwidth

    1Mbit/s

    ELB

    Do not use

  6. Verify that the information is correct and click Next: Configure Advanced Settings. Configure the network by referring to Table 4.

    Table 4 Configuring advanced settings

    Parameter

    Configuration

    Cluster Name

    dws-demo

    Cluster Version

    Use the recommended version, for example, 8.1.3.311.

    Administrator Account

    dbadmin

    Administrator Password

    -

    Confirm Password

    -

    Database Port

    8000

    Enterprise Project

    default

    Advanced Settings

    Default

  7. Click Next: Confirm, confirm the configuration, and click Next.
  8. Wait about 6 minutes. After the cluster is created, click next to the cluster name. On the displayed cluster information page, record the value of Public Network Address, for example, dws-demov.dws.huaweicloud.com.

Step 2: Using Data Studio to Connect to a Cluster

  1. Ensure that JDK 1.8.0 or later has been installed on the client host. Choose PC > Properties > Advanced System Settings > Environment Variables and set JAVA_HOME (for example, C:\Program Files\Java\jdk1.8.0_191). Add ;%JAVA_HOME%\bin to the variable path.
  2. On the Connections page of the GaussDB(DWS) console, download the Data Studio GUI client.
  3. Decompress the downloaded Data Studio software package, go to the decompressed directory, and double-click Data Studio.exe to start the client.
  4. On the Data Studio main menu, choose File > New Connection. In the dialog box that is displayed, configure the connection based on Table 5.

    Table 5 Data Studio software configuration

    Parameter

    Configuration

    Database Type

    GaussDB(DWS)

    Connection Name

    dws-demo

    Host

    dws-demov.dws.huaweicloud.com

    The value of this parameter must be the same as the value of Public Network Address queried in Step 1: Creating a Cluster.

    Host Port

    8000

    Database Name

    gaussdb

    User Name

    dbadmin

    Password

    -

    Enable SSL

    Disable

  5. Click OK.

Step 3: Importing Sample Data

After connecting to the cluster using the SQL client tool, perform the following operations on the SQL client tool to import the sample data from traffic checkpoints and perform data queries.

  1. Execute the following statement to create the traffic database:

    1
    create database traffic encoding 'utf8' template template0; 
    

  2. Perform the following steps to switch to the new database:

    1. In the Object Browser window of the Data Studio client, right-click the database connection and choose Refresh from the shortcut menu. Then, the new database is displayed.
    2. Right-click the name of the new database traffic and choose Connect to DB from the shortcut menu.
    3. Right-click the name of the new database traffic and choose Open Terminal from the shortcut menu. The SQL command window for connecting to the specified database is displayed. Perform the following steps in the window.

  3. Execute the following statements to create a database table for storing vehicle information from traffic checkpoints:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    create schema traffic_data;
    set current_schema= traffic_data;
    drop table if exists GCJL;
    CREATE TABLE GCJL
    (
            kkbh   VARCHAR(20), 
            hphm   VARCHAR(20),
            gcsj   DATE ,
            cplx   VARCHAR(8),
            cllx   VARCHAR(8),
            csys   VARCHAR(8)
    )
    with (orientation = column, COMPRESSION=MIDDLE)
    distribute by hash(hphm);
    

  4. Create a foreign table, which is used to identify and associate the source data on OBS.

    • <obs_bucket_name> indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Supported Regions. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
    • In this practice, the CN-Hong Kong region is used as an example. Enter dws-demo-ap-southeast-1, and replace <Access_Key_Id> and <Secret_Access_Key> with the value obtained in Making Preparations.
    • If the message "ERROR: schema "xxx" does not exist Position" is displayed when you create a foreign table, the schema does not exist. Perform the previous step to create a schema.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    create schema tpchobs;
    set current_schema = 'tpchobs';
    drop FOREIGN table if exists GCJL_OBS;
    CREATE FOREIGN TABLE GCJL_OBS
    (
            like traffic_data.GCJL
    )
    SERVER gsmpp_server 
    OPTIONS (
            encoding 'utf8',
            location 'obs://<obs_bucket_name>/traffic-data/gcxx',
            format 'text',
            delimiter ',',
            access_key '<Access_Key_Id>',
            secret_access_key '<Secret_Access_Key>',
            chunksize '64',
            IGNORE_EXTRA_DATA 'on'
    );
    

  5. Execute the following statement to import data from the foreign table to the database table:

    1
    insert into traffic_data.GCJL select * from tpchobs.GCJL_OBS;
    

    It takes some time to import data.

Step 4: Performing Vehicle Analysis

  1. Performing ANALYZE

    This statement collects statistics related to ordinary tables in databases. The statistics are saved to the system catalog PG_STATISTIC. When you run the planner, the statistics help you develop an efficient query execution plan.

    Execute the following statement to generate the table statistics:

    1
    Analyze;
    
  1. Querying the data volume of the data table

    Execute the following statement to query the number of loaded data records:

    1
    2
    set current_schema= traffic_data;
    Select count(*) from traffic_data.gcjl;
    
  1. Accurate vehicle query

    Run the following statements to query the driving route of a vehicle by the license plate number and time segment. GaussDB(DWS) responds to the request in seconds.

    1
    2
    3
    4
    5
    6
    set current_schema= traffic_data;
    select hphm, kkbh, gcsj
    from traffic_data.gcjl
    where hphm =  'YD38641'
    and gcsj between '2016-01-06' and '2016-01-07'
    order by gcsj desc;
    
  1. Fuzzy vehicle query

    Run the following statements to query the driving route of a vehicle by the license plate number and time segment. GaussDB(DWS) responds to the request in seconds.

    1
    2
    3
    4
    5
    6
    7
    set current_schema= traffic_data;
    select hphm, kkbh, gcsj 
    from traffic_data.gcjl
    where hphm like  'YA23F%'
    and kkbh in('508', '1125', '2120') 
    and gcsj between '2016-01-01' and '2016-01-07'  
    order by hphm,gcsj desc;