Updated on 2024-12-03 GMT+08:00

Preparations

To ensure a smooth migration, you need to make the following preparations.

Preparing a Huawei Account

Before using MgC, prepare a HUAWEI ID or an IAM user that can access MgC and obtain an AK/SK pair for the account or IAM user. For details, see Making Preparations.

Obtaining an AK/SK Pair for Your Alibaba Cloud Account

Obtain an AK/SK pair for your Alibaba Cloud account. For more information, see Viewing the AccessKey Pairs of a RAM User.

Ensure that the AK/SK pair has the following permissions:

  • AliyunReadOnlyAccess: read-only permission for OSS
  • AliyunMaxComputeReadOnlyAccess: read-only permission for MaxCompute

For details about how to obtain these permissions, see Granting Permissions to RAM Users.

(Optional) If there are partitioned tables to be migrated, grant the Information Schema permission to the source account. For details, see Authorization for RAM Users.

Creating a Migration Project

On the MgC console, create a migration project. For details, see Managing Migration Projects.

Configuring an Agency

To ensure that DLI functions can be used properly, you need to configure an agency with DLI and OBS permissions.

  1. Sign in to the Huawei Cloud management console.
  2. Hover the mouse over the username in the upper right corner and choose Identity and Access Management from the drop-down list.
  3. In the navigation pane, choose Agencies.
  4. Click Create Agency.

  5. On the Create Agency page, set the following parameters:

    • Agency Name: Set a name, for example, dli_obs_agency_access.
    • Agency Type: Select Cloud service.
    • Cloud Service: Select (Data Lake Insight) DLI from the drop-down list.
    • Validity Period: Set a period as needed.
    • Description: This parameter is optional.

  6. Click Next. The Select Policy/Role tab is displayed.
  7. Click Create Policy in the upper right corner. Create two policies (one for OBS and one for DLI) by referring to Step 8 and Step 9. If there are existing policies containing the required permissions, you can use them and skip this step, as well as steps 8 and 9.
  8. Configure policy information.

    1. Policy Name: Set a name, for example, dli-obs-agency.
    2. Policy View: Select JSON.
    3. Copy and paste the following content to the Policy Content box.

      Replace bucketName with the name of the bucket where the JAR packages are stored.

      {
          "Version": "1.1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "obs:bucket:GetBucketPolicy",
                      "obs:bucket:GetLifecycleConfiguration",
                      "obs:bucket:GetBucketLocation",
                      "obs:bucket:ListBucketMultipartUploads",
                      "obs:bucket:GetBucketLogging",
                      "obs:object:GetObjectVersion",
                      "obs:bucket:GetBucketStorage",
                      "obs:bucket:GetBucketVersioning",
                      "obs:object:GetObject",
                      "obs:object:GetObjectVersionAcl",
                      "obs:object:DeleteObject",
                      "obs:object:ListMultipartUploadParts",
                      "obs:bucket:HeadBucket",
                      "obs:bucket:GetBucketAcl",
                      "obs:bucket:GetBucketStoragePolicy",
                      "obs:object:AbortMultipartUpload",
                      "obs:object:DeleteObjectVersion",
                      "obs:object:GetObjectAcl",
                      "obs:bucket:ListBucketVersions",
                      "obs:bucket:ListBucket",
                      "obs:object:PutObject"
                  ],
                  "Resource": [
                      "OBS:*:*:bucket:bucketName",//Replace bucketName with the name of the bucket where the JAR packages are stored.
                      "OBS:*:*:object:*"
                  ]
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "obs:bucket:ListAllMyBuckets"
                  ]
              }
          ]
      }

  9. Configure policy information.

    1. Policy Name: Set a name, for example, dli-agency.
    2. Policy View: Select JSON.
    3. Copy and paste the following content to the Policy Content box.
      {
          "Version": "1.1",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "dli:table:showPartitions",
                      "dli:table:alterTableAddPartition",
                      "dli:table:alterTableAddColumns",
                      "dli:table:alterTableRenamePartition",
                      "dli:table:delete",
                      "dli:column:select",
                      "dli:database:dropFunction",
                      "dli:table:insertOverwriteTable",
                      "dli:table:describeTable",
                      "dli:database:explain",
                      "dli:table:insertIntoTable",
                      "dli:database:createDatabase",
                      "dli:table:alterView",
                      "dli:table:showCreateTable",
                      "dli:table:alterTableRename",
                      "dli:table:compaction",
                      "dli:database:displayAllDatabases",
                      "dli:database:dropDatabase",
                      "dli:table:truncateTable",
                      "dli:table:select",
                      "dli:table:alterTableDropColumns",
                      "dli:table:alterTableSetProperties",
                      "dli:database:displayAllTables",
                      "dli:database:createFunction",
                      "dli:table:alterTableChangeColumn",
                      "dli:database:describeFunction",
                      "dli:table:showSegments",
                      "dli:database:createView",
                      "dli:database:createTable",
                      "dli:table:showTableProperties",
                      "dli:database:showFunctions",
                      "dli:database:displayDatabase",
                      "dli:table:alterTableRecoverPartition",
                      "dli:table:dropTable",
                      "dli:table:update",
                      "dli:table:alterTableDropPartition"
                  ]
              }
          ]
      }

  10. Click Next.
  11. Select the created custom policies with OBS and DLI permissions and click Next. Set Scope to All resources.
  12. Click OK. It takes 15 to 30 minutes for the authorization to be in effect.
  13. Update the agency permissions by referring to Updating DLI Agency Permissions.

Creating a VPC

Before purchasing an ECS, you need to create a VPC and subnet for it. For details, see Creating a VPC and Subnet.

The VPC of the ECS must not conflict with the CIDR block used by the DLI elastic resource pool you prepare below. When you create a DLI queue, the preset CIDR block is 172.16.0.0/18.

Purchasing an ECS

  • Purchase a Linux ECS. The ECS and the DLI resources you prepare later must be in the same region. For details, see Purchasing an ECS. Select the created VPC and subnet for the ECS. The ECS must:
    • Be able to access the Internet and the domain names of MgC, IoTDA, and other cloud services. For details about the domain names to be accessed, see Domain Names.
    • Allow outbound traffic on 8883 if the ECS is in a security group.
    • Run CentOS 8.X.
    • Have at least 8 vCPUs and 16 GB of memory.
  • Configure an EIP for the ECS, so that the ECS can access the Internet. For details, see Assigning an EIP and Binding an EIP to an ECS.
    • Billing Mode: Select Pay-per-use.
    • Bandwidth: 5 Mbit/s is recommended.

Installing Edge and Connecting Edge to MgC

  • On the purchased ECS, install Edge, which will be used for data verification. For details, see Installing Edge for Linux.
  • Register an Edge account. In the address box of a browser, enter the NIC IP address of the Linux ECS and port 27080, for example, https://x.x.x.x:27080. After the first login, the registration page is displayed. Enter a username and password, confirm the password, and click Privacy Statement. Read the statement carefully, select I have read and agree to the Privacy Statement, and click OK to complete the registration.
  • Connect Edge to MgC.

Edge does not support automatic restart. Do not restart Edge during task execution, or tasks will fail.

Adding Credentials

On the Edge console, add the AK/SK pairs of your Alibaba Cloud account and Huawei Cloud account. For details, see Adding Resource Credentials.

  • Enter the AK/SK pair of your source Alibaba Cloud account. This key pair will be used to access your MaxCompute resources.

  • Enter the AK/SK pair of your Huawei Cloud account. This key pair will be used to access your DLI resources.

Creating an OBS Bucket and Uploading JAR Packages

Create a bucket on Huawei Cloud OBS and upload the Java files (in JAR packages) that data migration depends on to the bucket. For details about how to create an OBS bucket, see Creating a Bucket. For details about how to upload files, see Uploading an Object.

The JAR packages that data migration depends on are migration-dli-spark-1.0.0.jar, fastjson-1.2.54.jar, and datasource.jar. Below, you'll find a description of what each package is used for and how to obtain them.

  • migration-dli-spark-1.0.0.jar
    • This package is used to create Spark sessions and submit SQL statements.
    • You can obtain the package from the /opt/cloud/Edge/tools/plugins/collectors/bigdata-migration/dliSpark directory on the ECS where Edge is installed.
  • fastjson-1.2.54.jar
    • This package provides fast JSON conversion.
    • You can obtain the package from the /opt/cloud/Edge/tools/plugins/collectors/bigdata-migration/deltaSpark directory on the ECS where Edge is installed.
  • datasource.jar
    • This package contains the configuration and connection logic of data sources and allows Edge to connect to different databases or data storage systems.
    • You need to obtain and compile files in datasource.jar as required..

Purchasing an Elastic Resource Pool and Adding a Queue on DLI

An elastic resource pool provides compute resources (CPU and memory) required for running DLI jobs. For details about how to buy a resource pool, see Buying an Elastic Resource Pool.

You can add a general-purpose queue and an SQL queue to the pool for running jobs. For more information, see Creating Queues in an Elastic Resource Pool.

Creating an Enhanced Datasource Connection on DLI

  1. Sign in to the DLI console. In the navigation pane on the left, choose Resources > Resource Pool.
  2. Click next to the name of the elastic resource pool. In the expanded information, obtain the CIDR block of the elastic resource pool.
  3. In the security group of the ECS with Edge installed, add an inbound rule to allow access from the CIDR block of the elastic resource pool.

    1. Sign in to the ECS console.
    2. In the ECS list, click the name of the Linux ECS with Edge installed.
    3. Click the Security Groups tab and click Manage Rule.

    4. Under Inbound Rules, click Add Rule.

    5. Configure parameters for the inbound rule.
      • Priority: Set it to 1.
      • Action: Select Allow.
      • Type: Select IPv4.
      • Protocol & Port: Select Protocols/All.
      • Source: Select IP Address and enter the CIDR block of the elastic resource pool.
    6. Click OK.

  4. Switch to the DLI console. In the navigation pane on the left, click Datasource Connections.
  5. Under Enhanced, click Create.

  6. Configure enhanced datasource connection information based on Table 1.

    Table 1 Parameters required for creating an enhanced connection

    Parameter

    Configuration

    Connection Name

    Enter a name.

    Resource Pool

    Select the purchased elastic resource pool.

    VPC

    Select the created VPC and subnet.

    Subnet

    Route Table

    Retain the preset value.

    Host Information

    Add the information about the source MaxCompute hosts in the following format:

    EndpointIP Endpoint (with a space in between)

    TunnelEndpointIP TunnelEndpoint (with a space in between)

    Separate two pieces of information by pressing Enter. For example:

    118.178.xxx.xx service.cn-hangzhou.maxcompute.aliyun.com.vipgds.alibabadns.com

    47.97.xxx.xx dt.cn-hangzhou.maxcompute.aliyun.com

    To obtain the values of EndpointIP and TunnelEndpointIP, use any server with a public IP address to ping the endpoint and tunnel endpoint of the region where the source MaxCompute cluster is located. The corresponding IP addresses will display in the command output.

    For details about MaxCompute endpoints and Tunnel endpoints, see Endpoints in different regions.

  7. Click OK. After the creation process is finished, the Connection Status of the newly created connection will be Active, indicating that the connection has been successfully created.

Adding and Configuring Routes

  • Add routes.

    Add two routes for the DLI enhanced datasource connection. For details, see Adding a Route. In the routes, the IP address must be the same as the host IP addresses configured during datasource connection creation.

  • Configure routes.
  1. Sign in to the VPC console.
  2. In the navigation pane on the left, choose Virtual Private Cloud > Route Tables.
  3. In the route table list, click the route table used for creating a datasource connection (that is, the route table of the VPC where the ECS resides).
  4. Click Add Route.

  5. Set the parameters by following the instructions. You need to click to add two routes.
    • Destination Type: Select IP address.
    • Destination: Enter the host IP address configured during datasource connection creation.
    • Next Hop Type: Select Server.
    • Next Hop: Select the purchased ECS.

Configuring SNAT Rules

Configure SNAT rules for the ECS by following the instructions below. Restarting the ECS will clear the rules, so you will need to reconfigure them after a restart.

  1. Log in to the purchased ECS.
  2. Run the following commands in sequence:

    sysctl net.ipv4.ip_forward=1

    This command is used to enable IP forwarding in Linux.

    iptables -t nat -A POSTROUTING -o eth0 -s {CIDR block where the DLI elastic resource pool resides} -j SNAT --to {Private IP address of the ECS}

    This command is used to set an iptables rule for network address translation.

Testing the Connectivity Between the DLI Queue and the Data Source

  1. Sign in to the DLI console. In the navigation pane on the left, choose Resources > Queue Management.
  2. Locate the elastic resource pool that the DLI queue was added and choose More > Test Address Connectivity in the Operation column.
  3. Enter the endpoint, endpoint IP address, Tunnel endpoint, and Tunnel endpoint IP address of MaxCompute to perform four connectivity tests. You are advised to use port 443.

  4. Click Test.

    If the test address is reachable, you will receive a message.

    If the test address is unreachable, you will also receive a message. Check the network configurations and try again. Network configurations include the VPC peering and the datasource connection. Check whether they have been activated.

Adding to the DLI Spark 3.3 Whitelist and the JAR Program Whitelist for Metadata Access

Contact the DLI technical support to whitelist you to use the DLI Spark 3.3 feature and allow Jar access to DLI metadata.

(Optional) Enabling DLI Spark Lifecycle Whitelist

If the metadata to be migrated has a lifecycle (that is, the DDL contains the LIFECYCLE field), contact DLI technical support to enable the Spark lifecycle feature whitelist.

(Optional) Enabling the CIDR Block 100 Whitelist

If you use Direct Connect to migrate data, you need to request VPC support to enable the whitelist of the 100.100.x.x segment.

Submit a service ticket to the VPC service and provide the following information:

  • Huawei Cloud account and project ID of the region where your DLI resources reside. For details about how to obtain them, see API Credentials.
  • DLI tenant name and tenant project ID: Contact DLI technical support to obtain them.