Updated on 2024-11-12 GMT+08:00

Operating Environment and Data Preparation

Preparing the Environment

  • If you are new to DataArts Studio, register a Huawei account, buy a DataArts Studio instance, create workspaces, and make other preparations. For details, see Buying and Configuring a DataArts Studio Instance. Then you can go to the created workspace and start using DataArts Studio.

  • Create an MRS cluster that contains the Hive component on the MRS console. Metadata can be generated from the vertex and edge data sets in the cluster. When creating the MRS cluster, ensure that its network parameter settings (including the region, VPC, subnet, and security group) are consistent with those of the CDM cluster in the DataArts Studio instance so that the MRS cluster can communicate with the CDM cluster through the internal network. Otherwise, you need to manually enable the communication between the MRS cluster and the CDM cluster. In addition, ensure that the two clusters use the same enterprise project.

    During the creation of an MRS cluster, a security group is automatically created. You are advised to create an MRS security cluster first and then buy a DataArts Studio instance, selecting the same VPC and subnet as the MRS cluster and the security group (named in mrs_Cluster name_Random character format) that is automatically created. This ensures that the DataArts Studio instance can communicate with the MRS cluster by default.

    If you already have a DataArts Studio instance before creating an MRS cluster, you need to choose Access Control > Security Groups on the VPC console and add a rule to allow inbound traffic to the security group (named in mrs_Cluster name_Random character format) created by the MRS cluster. For details, see Configuring Security Group Rules.

  • Create a MySQL DB instance on the RDS console to simulate the data source. When creating the MySQL DB instance, ensure that its network parameter settings (including the region, VPC, subnet, and security group) are consistent with those of the CDM cluster in the DataArts Studio instance so that the MySQL DB instance can communicate with the CDM cluster through the internal network. Otherwise, you need to manually enable the communication between the MySQL DB instance and the CDM cluster. In addition, ensure that the MySQL DB instance and the CDM cluster use the same enterprise project.
  • Prepare an OBS bucket to store the generated metadata. The OBS bucket must be in the same region as the CDM cluster in the DataArts Studio instance, and the enterprise project of the OBS bucket must be the same as that of the CDM cluster.
  • Create a graph on the GES console to import graph data for visualized graph analysis. GES must be in the same region as the CDM cluster in the DataArts Studio instance, and the enterprise project of GES must be the same as that of the CDM cluster.

Preparing Data Sources

In this practice, the raw data includes the user table vertex_user, movie table vertex_movie, friend relationship table edge_friends, and movie rating table edge_rate. Figure 1 shows the relationships between them.

Figure 1 Graph data description

To facilitate demonstration, this practice provides some data used to simulate the original data. To integrate the source data into the cloud, you need to store the sample data in CSV files and upload them to an OBS bucket.

  1. Create CSV files (UTF-8 without BOM), name the files with the corresponding data table names, copy the sample data to different CSV files, and save the files.

    To generate a CSV file in Windows, you can perform the following steps:
    1. Use a text editor (for example, Notepad) to create a .txt document and copy the sample data to the document. Then check the total number of rows and check whether the data of rows is correctly separated. (If the sample data is copied from a PDF document, the data in a single row will be wrapped if the data is too long. In this case, you must manually adjust the data to ensure that it is in a single row.)
    2. Choose File > Save as. In the displayed dialog box, set Save as type to All files (*.*), enter the file name with the .csv suffix for File name, and select the UTF-8 encoding format (without BOM) to save the file in CSV format.

  2. Upload the CSV file to OBS.

    1. Log in to the management console and choose Storage > Object Storage Service to access the OBS console.
    2. Click Create Bucket and set parameters as prompted to create an OBS bucket named fast-demo.

      To ensure network connectivity, select the same region for OBS bucket as that for the DataArts Studio instance. If an enterprise project is required, select the enterprise project that is the same as that of the DataArts Studio instance.

      For details about how to create a bucket on the OBS console, see Creating a Bucket in Object Storage Service Console Operation Guide.

    3. Upload data to OBS bucket fast-demo.

      For details about how to upload a file on the OBS console, see Uploading a File in Object Storage Service Console Operation Guide.

This practice involves four sample data tables: user table vertex_user, movie table vertex_movie, friend relationship table edge_friends, and movie rating table edge_rate. The details are as follows:
  • User table vertex_user.csv:
    Vivian,F,25-34,artist,98133
    Mercedes,F,Under 18,K-12 student,10562
    Katherine,F,35-44,lawyer,79101
    Stuart,M,25-34,programmer,30316
    Jacob,M,25-34,artist,55408
    Editha,F,56+,homemaker,46911
    Cassandra,F,56+,artist,55113
    Sarah,F,18-24,other or not specified,55105
    Hayden,M,56+,academic/educator,30030
    Jeffery,M,25-34,self-employed,45242
    Bonnie,F,50-55,technician/engineer,19716
    Serena,F,35-44,programmer,44106
    Sidney,M,18-24,writer,85296
    Leander,M,50-55,doctor/health care,98237
    Fred,M,35-44,other or not specified,30906
    Roger,M,45-49,technician/engineer,73069
    Ella,F,25-34,other or not specified,94402
    Ray,M,18-24,college/grad student,90241
    Eric,M,18-24,college/grad student,40205
    Frances,F,56+,retired,1234
    Allison,F,18-24,sales/marketing,49505
    Willy,M,25-34,technician/engineer,38104
    Lance,M,18-24,college/grad student,6459
    June,F,25-34,other or not specified,13326
    Marshal,M,50-55,scientist,7746
    Max,M,35-44,executive/managerial,91107
    Hardy,M,35-44,academic/educator,22181
    Jordan,M,25-34,artist,8817
    Reed,M,18-24,college/grad student,89146
    Glendon,M,35-44,self-employed,46214
    Kevin,M,56+,retired,2356
    Evan,M,45-49,programmer,53718
    Clark,M,56+,academic/educator,85718
    Johnny,M,56+,retired,52003
    Caleb,M,50-55,retired,41076
    Janet,F,35-44,homemaker,61270
    Sue,F,50-55,self-employed,13207
    Margaret,F,45-49,academic/educator,1609
    Luke,M,35-44,executive/managerial,44306
    William,M,45-49,programmer,37914
    Lena,F,35-44,other or not specified,42420
    Solomon,M,45-49,scientist,64081-8102
    Cary,M,35-44,executive/managerial,55124
    Colin,M,25-34,executive/managerial,44115
    Kenny,M,25-34,college/grad student,74074
    Gavin,M,25-34,programmer,24060
    Donald,M,35-44,programmer,95864
    Wayne,M,18-24,scientist,94606
    Frank,M,18-24,college/grad student,2906
    Alexander,M,18-24,college/grad student,61801
    Isaiah,M,25-34,other or not specified,33142
    Josephine,F,25-34,college/grad student,78728
    Joshua,M,35-44,executive/managerial,54016
    August,M,35-44,customer service,64801
    Jessie,F,18-24,clerical/admin,60640
    Yvette,F,35-44,artist,94109
    Albert,M,25-34,other or not specified,40515
    Eugene,M,35-44,other or not specified,40504
    Rachel,F,35-44,doctor/health care,33314
    Constance,F,50-55,executive/managerial,10022
    Larry,M,45-49,technician/engineer,2067
    Mike,M,25-34,other or not specified,30606
    Hank,M,50-55,programmer,44286
    Daniel,M,45-49,technician/engineer,37923
    Wesley,M,25-34,executive/managerial,35244
    Gina,F,35-44,sales/marketing,60202
    Teresa,F,45-49,academic/educator,43202
    Terry,M,35-44,writer,80222
    Leo,M,50-55,academic/educator,93105
    Bruce,M,50-55,academic/educator,19087-3622
    Terence,M,25-34,writer,14450
    Alice,F,25-34,academic/educator,79928
    Benjamin,M,25-34,technician/engineer,48092
    Sharon,F,18-24,college/grad student,55406
    Ryan,M,18-24,college/grad student,26241
    Mason,M,25-34,technician/engineer,92584
    Gloria,F,56+,retired,60506
    Tom,M,25-34,writer,10010
    Melissa,F,35-44,doctor/health care,23507
    David,M,25-34,clerical/admin,19147
    Alex,M,18-24,college/grad student,10013
    Florence,F,35-44,academic/educator,23508
    Darwin,M,45-49,customer service,98502
    Michael,M,18-24,other or not specified,31211
    Brown,M,25-34,executive/managerial,90210
    Jimmy,M,25-34,writer,94122
    Jay,M,18-24,programmer,43650
    Gladys,F,18-24,programmer,5055
    Denny,M,45-49,tradesman/craftsman,2557
    Jack,M,50-55,other or not specified,94025
    Edison,M,45-49,executive/managerial,85287-2702
    Neil,M,35-44,scientist,48187
    Jennifer,F,35-44,writer,75093
    Caspar,M,25-34,other or not specified,3766
    Mickey,M,18-24,programmer,97205
    Arthur,M,25-34,executive/managerial,2139
    Christine,F,25-34,academic/educator,32303
    Adeline,F,Under 18,other or not specified,1036
    Cody,M,18-24,college/grad student,78705
    Hillary,F,35-44,executive/managerial,21117
  • Movie table vertex_movie.csv:
    American Beauty,1999,Comedy;Drama
    Airplane!,1980,Comedy
    Rushmore,1998,Comedy
    Predator,1987,Action;Sci-Fi;Thriller
    There's Something About Mary,1998,Comedy
    The Shawshank Redemption,1994,Drama
    Election,1999,Comedy
    Clueless,1995,Comedy;Romance
    The Crying Game,1992,Drama;Romance;War
    Back to the Future,1985,Comedy;Sci-Fi
    The Talented Mr. Ripley,1999,Drama;Mystery;Thriller
    Life Is Beautiful (La vita ии bella),1997,Comedy;Drama
    2001: A Space Odyssey,1968,Drama;Mystery;Sci-Fi;Thriller
    Jaws,1975,Action;Horror
    Jerry Maguire,1996,Drama;Romance
    The Hunt for Red October,1990,Action;Thriller
    Close Encounters of the Third Kind,1977,Drama;Sci-Fi
    Star Wars: Episode IV - A New Hope,1977,Action;Adventure;Fantasy;Sci-Fi
    Rocky,1976,Action;Drama
    The Usual Suspects,1995,Crime;Thriller
    A Clockwork Orange,1971,Sci-Fi
    Psycho,1960,Horror;Thriller
    The Godfather: Part II,1974,Action;Crime;Drama
    Annie Hall,1977,Comedy;Romance
    Terminator 2: Judgment Day,1991,Action;Sci-Fi;Thriller
    Pleasantville,1998,Comedy
    Chinatown,1974,Film-Noir;Mystery;Thriller
    Independence Day (ID4),1996,Action;Sci-Fi;War
    Star Wars: Episode V - The Empire Strikes Back,1980,Action;Adventure;Drama;Sci-Fi;War
    Face/Off,1997,Action;Sci-Fi;Thriller
    Total Recall,1990,Action;Adventure;Sci-Fi;Thriller
    Blade Runner,1982,Film-Noir;Sci-Fi
    The Terminator,1984,Action;Sci-Fi;Thriller
    Robocop,1987,Action;Crime;Sci-Fi
    The Rock,1996,Action;Adventure;Thriller
    Superman,1978,Action;Adventure;Sci-Fi
    The Full Monty,1997,Comedy
    Raising Arizona,1987,Comedy
    Lethal Weapon,1987,Action;Comedy;Crime;Drama
    Platoon,1986,Drama;War
    The Fifth Element,1997,Action;Sci-Fi
    The Patriot,2000,Action;Drama;War
    Clerks,1994,Comedy
    Being John Malkovich,1999,Comedy
    The Mask,1994,Comedy;Crime;Fantasy
    Grosse Pointe Blank,1997,Comedy;Crime
  • Friend relationship table edge_friends.csv:
    Gloria,David
    Brown,Mason
    Terence,Kenny
    Clark,Brown
    Mickey,Janet
    Mickey,Margaret
    Hayden,Constance
    Frank,Janet
    Lena,Darwin
    Leo,Jimmy
    Mercedes,Gavin
    Hillary,Bruce
    Leo,Neil
    Terence,August
    Sue,Wayne
    Max,Denny
    Max,Josephine
    Hillary,Michael
    Constance,Janet
    Florence,Donald
    Alice,Jacob
    Roger,Sidney
    Margaret,Frances
    Roger,Fred
    Fred,Donald
    Margaret,Gavin
    Fred,Gavin
    Rachel,Janet
    Alexander,Clark
    Darwin,Cassandra
    Jordan,Vivian
    Terry,Larry
    Hardy,Kevin
    Terry,Rachel
    Mercedes,Marshal
    Marshal,Sharon
    Jeffery,Tom
    Terence,Max
    Katherine,Stuart
    Luke,Cassandra
    Michael,Arthur
    Luke,Editha
    Neil,Mason
    Darwin,Jessie
    Marshal,Alex
    Hardy,Margaret
    Alexander,Eric
    Mercedes,Caspar
    Brown,Clark
    Roger,Kevin
    Benjamin,Max
    Jessie,Adeline
    Michael,Luke
    Jimmy,Gloria
    Isaiah,Frances
    June,Darwin
    Editha,Vivian
    Caspar,Cassandra
    Bruce,Denny
    Caspar,Jacob
    Isaiah,Ella
    Mason,Ryan
    Mercedes,Eugene
    Roger,Josephine
    Wayne,Alice
    Hayden,Denny
    Alexander,Colin
    Larry,August
    Jimmy,Brown
    Jacob,William
    Hardy,Gladys
    Jessie,Caspar
    Mason,Terence
    June,Jennifer
    Hardy,Arthur
    Alexander,Solomon
    Larry,Wayne
    Larry,Gavin
    Ella,Ray
    Ella,Eric
    Alice,Janet
    Larry,Willy
    Isaiah,Solomon
    Benjamin,Leander
    Isaiah,Sue
    Caspar,Jordan
    Ella,Jordan
    Vivian,Eric
    Max,Jay
    Ryan,Hank
    Ella,Colin
    Luke,Alexander
    Luke,Joshua
    Wayne,Caspar
    Wayne,Denny
    Editha,Marshal
    Ryan,Jessie
    Michael,Cassandra
    Solomon,Hillary
    Jordan,Josephine
  • Movie rating table edge_rate.csv:
    Vivian,Lethal Weapon,5,2000/12/27 23:44
    Mercedes,Raising Arizona,4,2000/12/27 23:51
    Katherine,The Rock,3,2000/12/27 20:12
    Stuart,The Mask,2,2000/12/27 20:00
    Jacob,Face/Off,4,2000/12/27 20:12
    Editha,There's Something About Mary,5,2000/12/27 20:06
    Cassandra,Superman,4,2000/12/27 20:11
    Sarah,American Beauty,4,2000/12/27 20:13
    Hayden,Lethal Weapon,3,2000/12/27 20:09
    Jeffery,2001: A Space Odyssey,4,2000/12/23 1:48
    Bonnie,A Clockwork Orange,3,2000/12/22 23:23
    Serena,Lethal Weapon,4,2000/12/22 23:24
    Sidney,Raising Arizona,4,2000/12/22 23:24
    Leander,Clerks,5,2000/12/12 16:58
    Fred,Superman,5,2000/12/18 1:17
    Roger,A Clockwork Orange,5,2000/12/13 23:54
    Ella,Robocop,5,2000/12/13 23:44
    Ray,The Talented Mr. Ripley,3,2000/12/14 0:24
    Eric,Psycho,5,2002/1/3 20:29
    Frances,The Godfather: Part II,2,2000/12/10 18:45
    Allison,Independence Day (ID4),3,2000/12/13 23:58
    Willy,Clerks,4,2002/1/3 20:46
    Lance,There's Something About Mary,5,2000/12/13 23:43
    June,Superman,4,2002/1/3 20:41
    Marshal,Being John Malkovich,5,2000/12/10 18:40
    Max,Predator,4,2000/12/10 18:32
    Hardy,Total Recall,3,2000/12/10 18:39
    Jordan,American Beauty,4,2000/12/13 23:57
    Reed,Lethal Weapon,1,2000/12/10 18:37
    Glendon,Airplane!,4,2000/12/13 23:46
    Kevin,Raising Arizona,4,2000/12/13 23:51
    Evan,Jerry Maguire,1,2000/12/13 23:58
    Clark,The Hunt for Red October,5,2000/12/13 23:46
    Johnny,2001: A Space Odyssey,3,2000/12/14 0:16
    Caleb,Clerks,4,2000/12/9 16:45
    Janet,Lethal Weapon,2,2000/12/9 16:16
    Sue,Close Encounters of the Third Kind,4,2000/12/9 16:14
    Margaret,Star Wars: Episode IV - A New Hope,2,2000/12/9 16:04
    Luke,Clueless,2,2000/12/8 19:02
    William,The Terminator,2,2000/12/8 19:03
    Lena,Robocop,5,2000/12/8 18:59
    Solomon,Lethal Weapon,5,2000/12/8 18:59
    Cary,Airplane!,5,2000/12/8 19:00
    Colin,The Usual Suspects,4,2000/12/5 20:59
    Kenny,Clueless,5,2000/12/5 20:52
    Gavin,A Clockwork Orange,4,2000/12/5 20:52
    Donald,The Talented Mr. Ripley,3,2000/12/5 20:52
    Wayne,Back to the Future,3,2000/12/5 20:56
    Frank,Being John Malkovich,4,2000/12/5 20:53
    Alexander,Predator,5,2000/12/5 20:52
    Isaiah,Jaws,4,2000/12/5 20:48
    Josephine,Chinatown,3,2000/12/5 20:55
    Joshua,The Mask,4,2000/12/5 20:54
    August,Platoon,4,2000/12/5 20:53
    Jessie,Election,4,2000/12/5 20:52
    Yvette,Rocky,5,2000/12/5 20:52
    Albert,The Fifth Element,4,2000/12/5 20:55
    Eugene,Clueless,4,2000/12/5 17:59
    Rachel,Lethal Weapon,5,2000/12/5 17:58
    Constance,Raising Arizona,4,2000/12/5 17:59
    Larry,The Usual Suspects,4,2000/12/5 15:07
    Mike,The Crying Game,5,2000/12/5 15:21
    Hank,Independence Day (ID4),4,2000/12/5 15:21
    Daniel,There's Something About Mary,4,2000/12/5 15:10
    Wesley,Lethal Weapon,5,2000/12/2 19:51
    Gina,The Godfather: Part II,3,2000/12/2 19:55
    Teresa,Total Recall,4,2000/12/2 19:44
    Terry,2001: A Space Odyssey,4,2000/12/2 19:53
    Leo,A Clockwork Orange,5,2000/11/28 23:22
    Bruce,The Full Monty,2,2000/11/28 23:12
    Terence,Predator,5,2000/11/28 23:07
    Alice,Jaws,5,2000/11/28 23:20
    Benjamin,Psycho,3,2000/11/28 23:08
    Sharon,Total Recall,5,2000/11/28 23:13
    Ryan,Election,5,2000/11/28 23:18
    Mason,The Fifth Element,2,2000/11/28 23:26
    Gloria,The Usual Suspects,5,2000/11/28 12:57
    Tom,Clueless,3,2000/11/28 13:09
    Melissa,A Clockwork Orange,3,2000/12/8 15:10
    David,The Talented Mr. Ripley,5,2000/12/25 13:24
    Alex,Independence Day (ID4),4,2000/11/28 13:14
    Florence,Star Wars: Episode V - The Empire Strikes Back,2,2000/12/8 15:23
    Darwin,The Full Monty,2,2000/11/28 13:16
    Michael,Being John Malkovich,4,2000/12/25 14:44
    Brown,Predator,5,2000/11/28 13:01
    Jimmy,Lethal Weapon,4,2000/12/8 15:07
    Jay,Jaws,4,2000/11/28 13:07
    Gladys,Psycho,4,2000/11/28 13:08
    Denny,The Godfather: Part II,3,2000/12/25 13:25
    Jack,Annie Hall,4,2000/12/8 15:05
    Edison,The Mask,3,2000/11/28 13:11
    Neil,Face/Off,4,2000/12/8 15:22
    Jennifer,There's Something About Mary,3,2000/12/25 6:17
    Caspar,Superman,3,2000/12/8 15:09
    Mickey,Total Recall,1,2000/11/28 13:14
    Arthur,American Beauty,3,2000/12/8 15:18
    Christine,Platoon,3,2000/12/2 13:21
    Adeline,Raising Arizona,4,2000/12/8 15:15
    Cody,Blade Runner,1,2000/12/8 15:22
    Hillary,Election,3,2000/11/28 12:57

Creating a Data Connection in Management Center

In this practice, you need to synchronize data from the MySQL database to MRS Hive, standardize the data based on the GES graph import requirements, and generate metadata using MRS Hive.

Therefore, you need to create an MRS connection in Management Center. The procedure is as follows:

  1. Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
  2. On the DataArts Studio console, locate a workspace and click Management Center.
  3. On the displayed Manage Data Connections page, click Create Data Connection.

    Figure 2 Creating a data connection

  4. In the dialog box displayed, set data connection parameters and click OK.

    The following part describes how to create an MRS Hive connection. See Figure 3 for details.

    • Data Connection Type: MRS Hive is selected by default.
    • Name: Enter mrs_hive_link.
    • Tag: Enter a new tag name or select an existing tag from the drop-down list box. This parameter is optional.
    • Applicable Modules: Retain the default settings.
    • Connection Type: Select Proxy connection.
    • Manual: Select Cluster Name Mode. IP and Port are automatically set.
    • MRS Cluster Name: Select an existing MRS cluster.
    • KMS Key: Select a KMS key and use it to encrypt sensitive data. If no KMS key is available, click Access KMS to go to the KMS console and create one.
    • Agent: Select a DataArts Migration cluster as the connection agent. The DataArts Migration cluster and MRS cluster must be in the same region, AZ, VPC, and subnet, and the security group rule must allow communication between the two clusters. In this example, select the DataArts Migration cluster that is automatically created during DataArts Studio instance creation.

      To connect to an MRS 2.x cluster, select the DataArts Migration cluster of the 2.x version as the agent.

    • Username: Enter the Kerberos authentication user. In an MRS policy, user admin is the default management user and cannot be used as the authentication user of the cluster that uses Kerberos authentication. Therefore, to create a connection for an MRS cluster that uses Kerberos authentication, perform the following operations:
      1. Log in to MRS Manager as user admin.
      2. Choose System > Permission > Security Policy > Password Policy. Click Add Password Policy and add a policy under which the password never expires.
        • Set Password Policy Name to neverexp.
        • Set Password Validity Period (Days) to 0, indicating that the password never expires.
        • Set Password Expiration Notification (Days) to 0.
        • Retain the default values for other parameters.
      3. Choose System > Permission > User. On the page displayed, click Create to add a dedicated user as the Kerberos authentication user and set the password policy to neverexp. Select the user group superGroup for the user, and assign all roles to the user.
        • For clusters of MRS 3.1.0 or later, the user must at least have permissions of the Manager_viewer role to create data connections in Management Center. To perform database, table, and data operations on components, the user must also have user group permissions of the components.
        • For clusters earlier than MRS 3.1.0, the user must have permissions of the Manager_administrator or System_administrator role to create data connections in Management Center.
        • A user with only the Manager_tenant or Manager_auditor permission cannot create connections.
      4. Log in to Manager as the new user and change the initial password. Otherwise, the connection fails to be created.
      5. Synchronize IAM users.
        1. Log in to the MRS console.
        2. Choose Clusters > Active Clusters, select a running cluster, and click its name to go to its details page.
        3. In the Basic Information area of the Dashboard page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.
          • When the policy of the user group to which the IAM user belongs changes from MRS ReadOnlyAccess to MRS CommonOperations, MRS FullAccess, or MRS Administrator, wait for 5 minutes until the new policy takes effect after the synchronization is complete because the SSSD (System Security Services Daemon) cache of cluster nodes needs time to be updated. Then, submit a job. Otherwise, the job may fail to be submitted.
          • When the policy of the user group to which the IAM user belongs changes from MRS CommonOperations, MRS FullAccess, or MRS Administrator to MRS ReadOnlyAccess, wait for 5 minutes until the new policy takes effect after the synchronization is complete because the SSSD cache of cluster nodes needs time to be updated.
    • Password: Enter the password of the Kerberos authentication user.
    Figure 3 Creating an MRS Hive data connection

Creating Data Tables

To facilitate demonstration, you need to import the sample data in CSV format to the MySQL database using DataArts Migration. Then, the MySQL database functions as the data source. You need to create raw data tables in the MySQL database before importing data.

In the formal service process, the source data of the MySQL database needs to be imported to the OBS database as the vertex and edge data sets. In that case, you do not need to create tables in advance. However, before importing source data from the MySQL database to MRS Hive, you need to create standard data tables in the MRS Hive database in advance.

Therefore, in this practice, you need to create raw data tables in the MySQL database and standard data tables in the MRS Hive database. This section describes how to create tables using SQL statements.

  1. Create a raw data table in the MySQL database. Run the following SQL statements in the MySQL database to create four raw data tables based on the raw data structure in Preparing Data Sources:

    DROP TABLE IF EXISTS `edge_friends`;
    CREATE TABLE `edge_friends` (
        `user1` varchar(32) DEFAULT NULL,
        `user2` varchar(32) DEFAULT NULL
    );
    
    DROP TABLE IF EXISTS `edge_rate`;
    CREATE TABLE `edge_rate` (
        `user` varchar(32) DEFAULT NULL,
        `movie` varchar(64) DEFAULT NULL,
        `score` int(11) unsigned DEFAULT NULL,
        `datatime` varchar(32) DEFAULT NULL
    );
    
    DROP TABLE IF EXISTS `vertex_movie`;
    CREATE TABLE `vertex_movie` (
        `movie` varchar(64) DEFAULT NULL,
        `year` varchar(32) DEFAULT NULL,
        `genres` varchar(64) DEFAULT NULL
    );
    
    DROP TABLE IF EXISTS `vertex_user`;
    CREATE TABLE `vertex_user` (
        `user` varchar(32) DEFAULT NULL,
        `gender` varchar(32) DEFAULT NULL,
        `age` varchar(32) DEFAULT NULL,
        `occupation` varchar(32) DEFAULT NULL,
        `zip-code` varchar(32) DEFAULT NULL
    );

  2. Create standard data tables in the MRS Hive database.

    Standardize the raw data structure based on the GES graph import requirements, that is, add labels to the second columns of vertex tables vertex_user and vertex_movie, and add labels to the third columns of edge tables edge_rate and edge_friends.

    The vertex and edge data sets must comply with the data format requirements of GES graphs. The graph data format requirements are briefed as follows. For details, see Graph Data Formats.
    • The vertex data set contains the data of each vertex. Each row is the data of a vertex. The format is as follows. id is the unique identifier of vertex data.
      id,label,property 1,property 2,property 3,...
    • The edge data set contains the data of each edge. Each row is the data of an edge. Graph specifications in GES are defined based on the edge quantity, for example, one million edges. The format is as follows. id 1 and id 2 are the IDs of the two endpoints of an edge.
      id 1, id 2, label, property 1, property 2,...

    On the DataArts Factory console, you can select the MRS Hive connection created in Creating a Data Connection in Management Center, select a database, and run the following SQL statement to create a standard data table in the MRS Hive database.

    Figure 4 Creating a standard data table in the MRS Hive database

    DROP TABLE IF EXISTS `edge_friends`;
    CREATE TABLE test_ges.`edge_friends` (
        `user1` STRING COMMENT '',
        `user2` STRING COMMENT '',
        `label` STRING COMMENT ''
    );
    
    DROP TABLE IF EXISTS `edge_rate`;
    CREATE TABLE test_ges.`edge_rate` (
        `user` STRING COMMENT '',
        `movie` STRING COMMENT '',
        `label` STRING COMMENT '',
        `score` INT COMMENT '',
        `datatime` STRING COMMENT ''
    );
    
    DROP TABLE IF EXISTS `vertex_movie`;
    CREATE TABLE test_ges.`vertex_movie` (
        `movie` STRING COMMENT '',
        `label` STRING COMMENT '',
        `year` STRING COMMENT '',
        `genres` STRING COMMENT ''
    );
    
    DROP TABLE IF EXISTS `vertex_user`;
    CREATE TABLE test_ges.`vertex_user` (
        `user` STRING COMMENT '',
        `label` STRING COMMENT '',
        `gender` STRING COMMENT '',
        `age` STRING COMMENT '',
        `occupation` STRING COMMENT '',
        `zip-code` STRING COMMENT ''
    );