Bu sayfa henüz yerel dilinizde mevcut değildir. Daha fazla dil seçeneği eklemek için yoğun bir şekilde çalışıyoruz. Desteğiniz için teşekkür ederiz.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Step 1: Prepare Data

Updated on 2024-12-12 GMT+08:00

Preparations Before Using DataArts Studio

If you are new to DataArts Studio, register a Huawei account, buy a DataArts Studio instance, create workspaces, and make other preparations. For details, see Buying and Configuring a DataArts Studio Instance. Then you can go to the created workspace and start using DataArts Studio.

Preparing Data Sources

This practice analyzes the data features of the users and products of an e-commerce store. (The data is from BI reports.)

To facilitate demonstration, this practice provides some data used to simulate the original data. To integrate the source data into the cloud, you need to store the sample data in CSV files and upload them to an OBS bucket.

  1. Create CSV files (UTF-8 without BOM), name the files with the corresponding data table names, copy the sample data to different CSV files, and save the files.

    To generate a CSV file in Windows, you can perform the following steps:
    1. Use a text editor (for example, Notepad) to create a .txt document and copy the sample data to the document. Then check the total number of rows and check whether the data of rows is correctly separated. (If the sample data is copied from a PDF document, the data in a single row will be wrapped if the data is too long. In this case, you must manually adjust the data to ensure that it is in a single row.)
    2. Choose File > Save as. In the displayed dialog box, set Save as type to All files (*.*), enter the file name with the .csv suffix for File name, and select the UTF-8 encoding format (without BOM) to save the file in CSV format.

  2. Upload the CSV file to OBS.

    1. Log in to the management console and choose Storage > Object Storage Service to access the OBS console.
    2. Click Create Bucket and set parameters as prompted to create an OBS bucket named fast-demo.
      NOTE:

      To ensure network connectivity, select the same region for OBS bucket as that for the DataArts Studio instance. If an enterprise project is required, select the enterprise project that is the same as that of the DataArts Studio instance.

      For details about how to create a bucket on the OBS console, see Creating a Bucket in Object Storage Service Console Operation Guide.

    3. In the fast-demo OBS bucket, create folders user_data, product_data, comment_data, and action_data, and upload files user_data.csv, product_data.csv, comment_data.csv, and action_data.csv to the corresponding folders.
      NOTE:

      When associating a CSV table with DLI to create an OBS foreign table, you cannot specify the file name and can only specify the file path. Therefore, you need to place CSV tables in different file paths and ensure that each file path contains only the required CSV table.

      For details about how to upload a file on the OBS console, see Uploading a File in Object Storage Service Console Operation Guide.

This practice involves the following sample data: user data (user_data.csv), product data (product_data.csv), comment data (comment_data.csv), and action data (action_data.csv). Descriptions of the data are as follows:
  • user_data.csv:
    user_id,age,gender,rank,register_time
    100001,20,0,1,2021/1/1
    100002,22,1,2,2021/1/2
    100003,21,0,3,2021/1/3
    100004,24,2,5,2021/1/4
    100005,50,2,9,2021/1/5
    100006,20,1,3,2021/1/6
    100007,18,1,1,2021/1/7
    100008,20,1,6,2021/1/8
    100009,60,0,4,2021/1/9
    100010,20,1,1,2021/1/10
    100011,35,0,5,2021/1/11
    100012,20,1,1,2021/1/12
    100013,7,0,1,2021/1/13
    100014,64,0,8,2021/1/14
    100015,20,1,1,2021/1/15
    100016,33,1,7,2021/1/16
    100017,20,0,1,2021/1/17
    100018,15,1,1,2021/1/18
    100019,20,1,9,2021/1/19
    100020,33,0,1,2021/1/20
    100021,20,0,1,2021/1/21
    100022,22,1,5,2021/1/22
    100023,20,1,1,2021/1/23
    100024,20,0,1,2021/1/24
    100025,34,0,7,2021/1/25
    100026,34,1,1,2021/1/26
    100027,20,1,8,2021/1/27
    100028,20,0,1,2021/1/28
    100029,56,0,5,2021/1/29
    100030,20,1,1,2021/1/30
    100031,22,1,8,2021/1/31
    100032,20,0,1,2021/2/1
    100033,32,1,0,2021/2/2
    100034,20,1,1,2021/2/3
    100035,45,0,6,2021/2/4
    100036,20,0,1,2021/2/5
    100037,67,1,4,2021/2/6
    100038,78,0,6,2021/2/7
    100039,11,1,8,2021/2/8
    100040,8,0,0,2021/2/9

    The following table describes the data.

    Table 1 User data description

    Field

    Type

    Description

    Value

    user_id

    int

    User ID

    Anonymized

    age

    int

    Age group

    -1 indicates that the user age is unknown.

    gender

    int

    Gender

    • 0: male
    • 1: female
    • 2: confidential

    rank

    Int

    User level

    The greater the value of this field, the higher the user level.

    register_time

    string

    User registration date

    Unit: day

  • product_data.csv:
    product_id,a1,a2,a3,category,brand
    200001,1,1,1,300001,400001
    200002,2,2,2,300002,400001
    200003,3,3,3,300003,400001
    200004,1,2,3,300004,400001
    200005,3,2,1,300005,400002
    200006,1,1,1,300006,400002
    200007,2,2,2,300007,400002
    200008,3,3,3,300008,400002
    200009,1,2,3,300009,400003
    200010,3,2,1,300010,400003
    200011,1,1,1,300001,400003
    200012,2,2,2,300002,400003
    200013,3,3,3,300003,400004
    200014,1,2,3,300004,400004
    200015,3,2,1,300005,400004
    200016,1,1,1,300006,400004
    200017,2,2,2,300007,400005
    200018,3,3,3,300008,400005
    200019,1,2,3,300009,400005
    200020,3,2,1,300010,400005
    200021,1,1,1,300001,400006
    200022,2,2,2,300002,400006
    200023,3,3,3,300003,400006
    200024,1,2,3,300004,400006
    200025,3,2,1,300005,400007
    200026,1,1,1,300006,400007
    200027,2,2,2,300007,400007
    200028,3,3,3,300008,400007
    200029,1,2,3,300009,400008
    200030,3,2,1,300010,400008
    200031,1,1,1,300001,400008
    200032,2,2,2,300002,400008
    200033,3,3,3,300003,400009
    200034,1,2,3,300004,400009
    200035,3,2,1,300005,400009
    200036,1,1,1,300006,400009
    200037,2,2,2,300007,400010
    200038,3,3,3,300008,400010
    200039,1,2,3,300009,400010
    200040,3,2,1,300010,400010

    The following table describes the data.

    Table 2 Product data description

    Field

    Type

    Description

    Value

    product_id

    int

    Product No.

    Anonymized

    a1

    int

    Attribute 1

    Enumerated value. The value -1 indicates unknown.

    a2

    int

    Attribute 2

    Enumerated value. The value -1 indicates unknown.

    a3

    int

    Attribute 3

    Enumerated value. The value -1 indicates unknown.

    category

    int

    Category ID

    Anonymized

    brand

    int

    Brand ID

    Anonymized

  • comment_data.csv:
    deadline,product_id,comment_num,has_bad_comment,bad_comment_rate
    2021/3/1,200001,4,0,0
    2021/3/1,200002,1,0,0
    2021/3/1,200003,2,2,0.1
    2021/3/1,200004,3,3,0.05
    2021/3/1,200005,1,0,0
    2021/3/1,200006,2,0,0
    2021/3/1,200007,3,2,0.01
    2021/3/1,200008,4,1,0.001
    2021/3/1,200009,4,0,0
    2021/3/1,200010,1,0,0
    2021/3/1,200011,2,2,0.2
    2021/3/1,200012,3,3,0.04
    2021/3/1,200013,1,0,0
    2021/3/1,200014,2,2,0.2
    2021/3/1,200015,3,2,0.05
    2021/3/1,200016,4,1,0.003
    2021/3/1,200017,4,0,0
    2021/3/1,200018,1,0,0
    2021/3/1,200019,2,2,0.3
    2021/3/1,200020,3,3,0.03
    2021/3/1,200021,1,0,0
    2021/3/1,200022,2,5,1
    2021/3/1,200023,3,2,0.07
    2021/3/1,200024,4,1,0.006
    2021/3/1,200025,4,0,0
    2021/3/1,200026,1,0,0
    2021/3/1,200027,2,2,0.4
    2021/3/1,200028,3,3,0.03
    2021/3/1,200029,1,0,0
    2021/3/1,200030,2,5,1
    2021/3/1,200031,3,2,0.02
    2021/3/1,200032,4,1,0.003
    2021/3/1,200033,4,0,0
    2021/3/1,200034,1,0,0
    2021/3/1,200035,2,2,0.5
    2021/3/1,200036,3,3,0.06
    2021/3/1,200037,1,0,0
    2021/3/1,200038,2,1,0.01
    2021/3/1,200039,3,2,0.01
    2021/3/1,200040,4,1,0.009

    The following table describes the data.

    Table 3 Comment data description

    Field

    Type

    Description

    Value

    deadline

    string

    Deadline

    Unit: day

    product_id

    int

    Product No.

    Anonymized

    comment_num

    int

    Segments of the accumulated comment count

    • 0: no comment
    • 1: one comment
    • 2: 2 to 10 comments
    • 3: 11 to 50 comments
    • 4: more than 50 comments

    has_bad_comment

    int

    Whether there are negative comments

    0: no; 1: yes

    bad_comment_rate

    float

    Dissatisfaction rate

    Proportion of negative comments

  • action_data.csv:
    user_id,product_id,time,model_id,type
    100001,200001,2021/1/1,1,view
    100001,200001,2021/1/1,1,add
    100001,200001,2021/1/1,1,delete
    100001,200002,2021/1/2,1,view
    100001,200002,2021/1/2,1,add
    100001,200002,2021/1/2,1,buy
    100001,200002,2021/1/2,1,like
    100002,200003,2021/1/1,1,view
    100002,200003,2021/1/1,1,add
    100002,200003,2021/1/1,1,delete
    100002,200004,2021/1/2,1,view
    100002,200004,2021/1/2,1,add
    100002,200004,2021/1/2,1,buy
    100002,200004,2021/1/2,1,like
    100003,200001,2021/1/1,1,view
    100003,200001,2021/1/1,1,add
    100003,200001,2021/1/1,1,delete
    100004,200002,2021/1/2,1,view
    100005,200002,2021/1/2,1,add
    100006,200002,2021/1/2,1,buy
    100007,200002,2021/1/2,1,like
    100001,200003,2021/1/1,1,view
    100002,200003,2021/1/1,1,add
    100003,200003,2021/1/1,1,delete
    100004,200004,2021/1/2,1,view
    100005,200004,2021/1/2,1,add
    100006,200004,2021/1/2,1,buy
    100007,200004,2021/1/2,1,like
    100001,200005,2021/1/3,1,view
    100001,200005,2021/1/3,1,add
    100001,200005,2021/1/3,1,delete
    100001,200006,2021/1/3,1,view
    100001,200006,2021/1/4,1,add
    100001,200006,2021/1/4,1,buy
    100001,200006,2021/1/4,1,like
    100010,200005,2021/1/3,1,view
    100010,200005,2021/1/3,1,add
    100010,200005,2021/1/3,1,delete
    100010,200006,2021/1/3,1,view
    100010,200006,2021/1/4,1,add
    100010,200006,2021/1/4,1,buy
    100010,200006,2021/1/4,1,like
    100001,200007,2021/1/2,1,buy
    100001,200007,2021/1/2,1,like
    100002,200007,2021/1/1,1,view
    100002,200007,2021/1/1,1,add
    100002,200007,2021/1/1,1,delete
    100002,200007,2021/1/2,1,view
    100002,200007,2021/1/2,1,add
    100002,200008,2021/1/2,1,like
    100002,200008,2021/1/2,1,like
    100003,200008,2021/1/1,1,view
    100003,200008,2021/1/1,1,add
    100003,200008,2021/1/1,1,delete
    100004,200008,2021/1/2,1,view
    100005,200009,2021/1/2,1,like
    100006,200009,2021/1/2,1,buy
    100007,200010,2021/1/2,1,like
    100001,200010,2021/1/1,1,view
    100002,200010,2021/1/1,1,add
    100003,200010,2021/1/1,1,delete
    100004,200010,2021/1/2,1,view
    100005,200010,2021/1/2,1,like
    100006,200010,2021/1/2,1,buy
    100007,200010,2021/1/2,1,like
    100001,200010,2021/1/3,1,view
    100001,200010,2021/1/3,1,add
    100001,200010,2021/1/3,1,delete
    100001,200011,2021/1/3,1,view
    100001,200011,2021/1/4,1,like
    100001,200011,2021/1/4,1,buy
    100001,200011,2021/1/4,1,like
    100010,200012,2021/1/3,1,view
    100011,200012,2021/1/3,1,like
    100011,200012,2021/1/3,1,delete
    100011,200013,2021/1/3,1,view
    100011,200013,2021/1/4,1,like
    100011,200014,2021/1/4,1,buy
    100011,200014,2021/1/4,1,like
    100007,200022,2021/1/2,1,like
    100001,200022,2021/1/1,1,view
    100002,200023,2021/1/1,1,add
    100003,200023,2021/1/1,1,delete
    100004,200023,2021/1/2,1,like
    100005,200024,2021/1/2,1,add
    100006,200024,2021/1/2,1,buy
    100007,200025,2021/1/2,1,like
    100001,200025,2021/1/3,1,view
    100001,200026,2021/1/3,1,like
    100001,200026,2021/1/3,1,delete
    100001,200027,2021/1/3,1,view
    100001,200027,2021/1/4,1,like
    100001,200027,2021/1/4,1,buy
    100001,200028,2021/1/4,1,like
    100010,200029,2021/1/3,1,view
    100011,200030,2021/1/3,1,like
    100011,200031,2021/1/3,1,delete
    100011,200032,2021/1/3,1,view
    100011,200033,2021/1/4,1,like
    100011,200034,2021/1/4,1,buy
    100011,200035,2021/1/4,1,like

    The following table describes the data.

    Table 4 Action data description

    Field

    Type

    Description

    Value

    user_id

    int

    User ID

    Anonymized

    product_id

    int

    Product No.

    Anonymized

    time

    string

    Time of action

    -

    model_id

    string

    Module ID

    Anonymized

    type

    string

    • View (browsing the product details page)
    • Add (adding a product to the shopping cart)
    • Delete (removing a product from the shopping cart)
    • Buy (placing an order)
    • Like (adding a product to the favorite list)

    -

Preparing a Data Lake

This practice uses DLI as the data foundation. To ensure network connectivity between DataArts Studio and DLI, ensure that you select the same region and enterprise project as those of the DataArts Studio instance when creating a DLI queue.

NOTE:
  • The version of the default Spark component of the default DLI queue is not up-to-date, and an error may be reported indicating that a table creation statement cannot be executed. In this case, you are advised to create a queue to run your tasks. To enable the execution of table creation statements in the default queue, contact the customer service or technical support of the DLI service.
  • The default queue default of DLI is only used for trial. It may be occupied by multiple users at a time. Therefore, it is possible that you fail to obtain the resource for related operations. If the execution takes a long time or fails, you are advised to try again during off-peak hours or use a self-built queue to run the job.

After enabling DLI, you need to create a DLI connection in Management Center, create a database through the DataArts Factory module, and run an SQL statement to create an OBS foreign table. The procedure is as follows:

  1. Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
  2. On the DataArts Studio console, locate a workspace and click Management Center.
  3. On the displayed Manage Data Connections page, click Create Data Connection.

    Figure 1 Creating a data connection

  4. Create a DLI data connection. Select DLI for Data Connection Type and set Name to dli.

    Click Test to test the connection. If the test is successful, click OK.

    Figure 2 Creating a data connection

  5. Go to the DataArts Factory page.

    Figure 3 DataArts Factory page

  6. Right-click the DLI connection to create a database named BI for storing data tables. For how to create a database, see Figure 4.

    Figure 4 Creating a database

  7. Create a DLI SQL script used to create data tables by entering DLI SQL statements in the editor.

    Figure 5 Creating a script

  8. In the SQL editor, enter the following SQL statements and click Execute to create data tables. Among them, user, product, comment, and action are OBS foreign tables that store raw data. The data in these files is from CSV files in specified OBS paths. top_like_product and top_bad_comment_product are DLI tables that store analysis results.

    create table user(
      user_id int,
      age int,
      gender int,
      rank int,
      register_time string
    ) USING csv OPTIONS (path "obs://fast-demo/user_data");
    create table product(
      product_id int,
      a1 int,
      a2 int,
      a3 int,
      category int,
      brand int
    ) USING csv OPTIONS (path "obs://fast-demo/product_data");
    create table comment(
      deadline string,
      product_id int,
      comment_num int,
      has_bad_comment int,
      bad_comment_rate float
    ) USING csv OPTIONS (path "obs://fast-demo/comment_data");
    create table action(
      user_id int,
      product_id int,
      time string,
      model_id string,
      type string
    ) USING csv OPTIONS (path "obs://fast-demo/action_data");
    create table top_like_product(brand int, like_count int);
    create table top_bad_comment_product(product_id int, comment_num int, bad_comment_rate float);
    Figure 6 Creating data tables

    The key parameters are as follows:
    • Data Connection: DLI data connection created in Step 4
    • Database: database created in Step 6
    • Resource Queue: The default resource queue default can be used.
      NOTE:
      • The version of the default Spark component of the default DLI queue is not up-to-date, and an error may be reported indicating that a table creation statement cannot be executed. In this case, you are advised to create a queue to run your tasks. To enable the execution of table creation statements in the default queue, contact the customer service or technical support of the DLI service.
      • The default queue default of DLI is only used for trial. It may be occupied by multiple users at a time. Therefore, it is possible that you fail to obtain the resource for related operations. If the execution takes a long time or fails, you are advised to try again during off-peak hours or use a self-built queue to run the job.

  9. After the script is executed successfully, run the following script to check whether the data tables are created successfully.

    SHOW TABLES;
    NOTE:

    After confirming that the data tables are created, you can close the script as it is no longer needed.

Sitemizi ve deneyiminizi iyileştirmek için çerezleri kullanırız. Sitemizde tarama yapmaya devam ederek çerez politikamızı kabul etmiş olursunuz. Daha fazla bilgi edinin

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback