Help Center/ Data Lake Insight/ Best Practices/ Data Analysis/ Analyzing E-commerce BI Reports
Updated on 2024-05-29 GMT+08:00

Analyzing E-commerce BI Reports

Application Scenarios

As a self-operated e-commerce company in China, the X mall has developed hundreds of millions of loyal users and accumulated massive amounts of authentic data while maintaining high-speed development. How to use the BI tool to find business opportunities from historical data is a key issue in the precision marketing of big data applications. It is also the core technology required for intelligent upgrade of all e-commerce platforms.

This case uses HUAWEI CLOUD DLI, GaussDB(DWS), and Yonghong BI to analyze data features of users and offerings based on the real user, product, and comment data (anonymized) of the mall, providing high-quality information for marketing decision-making, advertising recommendation, credit rating, brand monitoring, and user behavior prediction.

Process

To use DLI to analyze e-commerce data, perform the following steps:

Step 1: Uploading Data. Upload the data to OBS for data analysis using DLI.

Step 2: Analyzing Data. Use DLI to query the data for analysis.

Data Types

To protect user privacy and data security, all sampled data is anonymized.

  • User data
    Table 1 User data

    Field

    Data Type

    Description

    Value

    user_id

    int

    User ID

    Anonymized

    age

    int

    Age group

    -1 indicates that the user age is unknown.

    gender

    int

    Gender

    • 0: Male
    • 1: Female
    • 2: Confidential

    rank

    Int

    User level

    Sequenced list of user level. The higher the user level, the larger the number.

    register_time

    string

    User registration date

    Unit: day

  • Product data
    Table 2 Product data

    Field

    Data Type

    Description

    Value

    product_id

    int

    Product No.

    Anonymized

    a1

    int

    Attribute 1

    Enumerated value. The value -1 indicates unknown.

    a2

    int

    Attribute 2

    Enumerated value. The value -1 indicates unknown.

    a3

    int

    Attribute 3

    Enumerated value. The value -1 indicates unknown.

    category

    int

    Category ID

    Anonymized

    brand

    int

    Brand ID

    Anonymized

  • Comment data
    Table 3 Comment data

    Field

    Data Type

    Description

    Value

    deadline

    string

    End time

    Unit: day

    product_id

    int

    Product No.

    Anonymized

    comment_num

    int

    Segments of accumulated comment count

    • 0: No comment
    • 1: One comment
    • 2: 2 to 10 comments
    • 3: 11-50 comments
    • 4: More than 50 comments

    has_bad_comment

    int

    Whether there is negative feedback.

    0: No; 1: Yes.

    bad_comment_rate

    float

    Dissatisfaction rate

    Proportion of the negative feedback.

  • Action data
    Table 4 Action data

    Field

    Data Type

    Description

    Value

    user_id

    int

    User ID

    Anonymized

    product_id

    int

    Product No.

    Anonymized

    time

    string

    Time of action

    -

    model_id

    string

    Module ID

    Anonymized

    type

    string

    • Browse (refers to the offering details page)
    • Add to cart
    • Remove from cart
    • Place an order
    • Follow
    • Click

    -

Step 1: Uploading Data

Upload the data to OBS for data analysis using DLI.
  1. Download OBS Browser+. For details about the download address, see Object Storage Service Tool Guide.
  2. Install OBS Browser+. For details about the installation procedure, see Object Storage Service Tool Guide.
  3. Log in to OBS Browser+. OBS Browser+ supports two login modes: AK login (using access keys) or authorization code login. For details about the login procedure, see Object Storage Service Tool Guide.
  4. Upload data using the OBS Browser+.

    On the OBS Browser+ page, click Create Bucket. Select a region and enter a bucket name (for example, DLI-demo). After the bucket is created, return to the bucket list and click DLI-demo. OBS Browser+ supports upload by dragging. You can drag one or more files or folders from a local path to the object list of a bucket or a parallel file system on OBS Browser+. You can even drag a file or folder directly to a specified folder on OBS Browser+.

    Obtain the test data by downloading the Best_Practice_04.zip file, decompressing it, and uploading the Data folder to the root directory of the OBS bucket. The test data directory is as follows:

    • data/JData_User: Data in the user table
    • data/JData_Product:Data in the product table
    • data/JData_Product/JData_Comment: Data in the comment table
    • data/JData_Action: Data the action table

Step 2: Analyzing Data

  1. Creating a Database and a Table
    1. On the top menu bar of the portal page, choose Products > Analytics > Data Lake Insight (DLI).
    2. Create a demo database. On the DLI console, choose Job Management >SQL Jobs. Click the created job on the displayed page to go to the SQL Editor page.
    3. In the left pane of the SQL Editor, select the Databases tab and click to create the demo database. For details, see Figure 1.
      Figure 1 Creating a database

      The default database is a built-in database. You cannot create a database named default.

    4. Choose the demo database, and enter the following SQL statement in the editing box:
      create table user(
        user_id int,
        age int,
        gender int,
        rank int,
        register_time string
      ) USING csv OPTIONS (path "obs://DLI-demo/data/JData_User")

      The file path in the preceding SQL statement is the actual OBS path for storing data.

    5. Click Execute to create the user information table user.
    6. Create the product, comment, and action tables in the same way.
      • Product data
        create table product(
          product_id int,
          a1 int,
          a2 int,
          a3 int,
          category int,
          brand int
        ) USING csv OPTIONS (path "obs://DLI-demo/data/JData_Product")
      • Comment table
        create table comment(
          deadline string,
          product_id int,
          comment_num int,
          has_bad_comment int,
          bad_comment_rate float
        ) USING csv OPTIONS (path "obs://DLI-demo/data/JData_Comment")
      • Action table
        create table action(
          user_id int,
          product_id int,
          time string,
          model_id string,
          type string
        ) USING csv OPTIONS (path "obs://DLI-demo/data/JData_Action");
  2. Querying Data

    You can save common query statements as templates on the Template Management page for later use. For details, see SQL Template Management in Data Lake Insight User Guide.

    • Top 10 products with the most likes
      1. Run the following SQL statement to analyze the top 10 products with the most likes.
        SELECT
          product.brand as brand,
          COUNT(product.brand) as like_count
        from
          action
          JOIN product ON (action.product_id = product.product_id)
        WHERE
          action.type = 'like'
        group by
          brand
        ORDER BY like_count desc
        limit
          10
      2. Click Execute. The execution results are displayed, as shown in Figure 2.
        Figure 2 Querying results
      3. Click to view the result in a chart.
        Figure 3 Graphical results
    • Top 10 worst-rated products
      1. Run the following SQL statement to analyze the top 10 worst-rated products:
        SELECT
          DISTINCT product_id,
          comment_num,
          bad_comment_rate 
        from 
          comment 
        where 
          comment_num > 3 
        order by 
          bad_comment_rate desc 
        limit
          10
      2. Click Execute. The execution results are displayed, as shown in Figure 4.
        Figure 4 Querying results
      3. Click to view the result in a chart.
        Figure 5 Graphical result

    You can also analyze data for age distribution, gender ratio, offering evaluation, purchase number, and browsing statistics of users.