Creating a Connection to a Source Component
To verify the consistency of data stored using big data components, you need to establish connections between MgC and the big data components.
The supported source big data components include:
- Doris
- HBase
- ClickHouse
- Hive Metastore
- Delta Lake (with metadata)
- Delta Lake (without metadata)
- Hudi (with metadata)
- Hudi (without metadata)
Procedure
- Sign in to the MgC console. In the navigation pane, under Project, select a big data migration project from the drop-down list.
- In the navigation pane on the left, choose Migrate > Big Data Verification.
- If you are performing a big data verification with MgC for the first time, select your MgC Agent to enable this feature. Click Select MgC Agent. In the displayed dialog box, select the MgC Agent you connected to MgC during preparations from the drop-down list.
CAUTION:
Ensure that the selected MgC Agent is always Online and Enabled before your verification is complete.
- In the Features area, click Preparations.
- Choose Connection Management and click Create Connection.
Figure 1 Creating a connection
- Select a big data component and click Next.
- Set parameters based on the big data component you selected.
- Parameters for creating a connection to Doris
- Parameters for creating a connection to HBase
- Parameters for creating a connection to ClickHouse
- Parameters for creating a connection to Hive Metastore
- Parameters for creating a connection to Delta Lake (with metadata)
- Parameters for creating a connection to Delta Lake (without metadata)
- Parameters for creating a connection to Hudi (with metadata)
- Parameters for creating a connection to Hudi (without metadata)
Table 1 Parameters for creating a connection to Doris Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Doris-4 random characters (including letters and numbers). You can also customize a name.
MgC Agent
Select the MgC Agent installed in the source environment.
Doris Credential
Select the source Doris credential added to the MgC Agent. For details about how to add credentials, see "Big data - Doris" in Adding Resource Credentials.
Database IP Address
Enter the IP address for accessing the source Doris cluster.
Database Port
Enter the port used for accessing the source Doris cluster. The default value is 3306.
Database Name
Enter the name of the source Doris database.
Collect Resource Usage Information
This parameter is optional. If this option is enabled, usage metrics for your big data resources will be collected during the execution of tasks created using this connection. The collected information is used to generate reports on the MgC console and for performance optimization.
NOTICE:Before using this function, ensure that the Huawei Cloud account you added to the MgC Agent has the read-only permission for MRS and DLI.
- If the selected credential is the one you currently use to access MgC, you can select This is my MgC credential, and the projects in the region you choose will be listed.
- Under Region, select the region where the data to be verified is located.
- Under Project, select the project where the data to be verified is stored.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
- If the selected Doris credential is not the one you currently use to access MgC:
- Under Region ID, enter the ID of the region where the data to be verified is located. For example, if the region is CN South-Guangzhou, enter cn-south-1.
- Under Project ID, enter the project ID corresponding to the region.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
NOTE:
- To view the region ID and project ID, choose My Credentials > API Credentials.
- For details about how to obtain the cluster ID, see Obtaining an MRS Cluster ID.
Table 2 Parameters for creating a connection to HBase Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is HBase-4 random characters (including letters and numbers). You can also customize a name.
MgC Agent
Select the MgC Agent installed in the source environment.
HBase Credential
Select the source HBase credential added to the MgC Agent. For details about how to add credentials, see "Big data - HBase" in Adding Resource Credentials.
Secured Cluster
Choose whether the cluster is secured.
ZooKeeper IP Address
Enter the IP address for connecting to the source ZooKeeper node. You can enter the public or private IP address of the ZooKeeper node.
ZooKeeper Port
Enter the port for connecting to the source ZooKeeper node.
HBase Version
Select the source HBase version.
Table 3 Parameters for creating a connection to ClickHouse Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is ClickHouse-4 random characters (including letters and numbers). You can also customize a name.
MgC Agent
Select the MgC Agent installed in the source environment.
ClickHouse Credential (Optional)
Select the source ClickHouse credential added to the MgC Agent. For details about how to add credentials, see "Big data - ClickHouse" in Adding Resource Credentials.
Secured Cluster
Choose whether the cluster is secured.
ClickHouse Server IP Address
Enter the IP address for accessing the source ClickHouse server. Generally, the IP address refers to that of the server where ClickHouse is hosted.
HTTP Port
If the ClickHouse cluster is unsecured, enter the HTTP port for communicating with the ClickHouse server.
To obtain the value, log in to the FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse > Configurations > All Configurations, and search for the http_port parameter.
HTTP SSL/TLS Port
If the source ClickHouse cluster is secured, enter the HTTPS port for communicating with the ClickHouse server.
To obtain the value, log in to the FusionInsight Manager, choose Cluster > Services > ClickHouse > Configurations > All Configurations, and search for the https_port parameter.
Collect Usage Metrics
This parameter is optional. If this option is enabled, usage metrics for your big data resources will be collected during the execution of tasks created using this connection. The collected information is used to generate reports on the MgC console and for performance optimization.
NOTICE:Before using this function, ensure that the Huawei Cloud account you added to the MgC Agent has the read-only permission for MRS and DLI.
- If the selected credential is the one you currently use to access MgC, you can select This is my MgC credential, and the projects in the region you choose will be listed.
- Under Region, select the region where the data to be verified is located.
- Under Project, select the project where the data to be verified is stored.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
- If the selected Doris credential is not the one you currently use to access MgC:
- Under Region ID, enter the ID of the region where the data to be verified is located. For example, if the region is CN South-Guangzhou, enter cn-south-1.
- Under Project ID, enter the project ID corresponding to the region.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
NOTE:
- To view the region ID and project ID, choose My Credentials > API Credentials.
- For details about how to obtain the cluster ID, see Obtaining an MRS Cluster ID.
Table 4 Parameters for creating a connection to Hive Metastore Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Hive-Metastore-4 random characters (including letters and numbers). You can also customize a name.
Secure Connection
Choose whether to enable secure connection.
- If Hive Metastore is deployed in an unsecured cluster, do not enable secure connection.
- If Hive Metastore is deployed in a secured cluster, enable secure connection and provide access credentials. For details about how to obtain and add credentials to MgC, see "Big data - Hive Metastore" in Adding Resource Credentials.
Hive Version
Select the source Hive version.
CAUTION:If the source Hive version is 2.1.1, select 1.x.
Hive Metastore IP Address
Enter the IP address for connecting to the Hive Metastore node.
Hive Metastore Thrift Port
Enter the port for connecting to the Hive Metastore Thrift service. The default port is 9083.
Connect to Metadata Database
During an incremental data verification, querying with Hive Metastore on more than 30,000 partitions may lead to a memory overflow (OOM) since all partition information is loaded into memory. Connecting to the MySQL metadata database can effectively prevent this issue.
- If you disable this option, the system queries the information of Hive tables and partitions using Hive Metastore.
- If you enable this option, configure the MySQL database information. The system will query the information of Hive tables and partitions through the MySQL database. You need to set the following parameters:
- Metadata Database Type: Only MySQL is supported.
- MySQL Credential: Select the credential for accessing the MySQL database. You need to add the credential to the MgC Agent and synchronize it to MgC. For details, see Adding Resource Credentials.
- MySQL Node IP Address: Enter the IP address of the MySQL database server.
- MySQL Port: Enter the port of the MySQL database service.
- Database Name: Enter the name of the database that stores the Hive table metadata.
NOTE:
Ensure that the entered MySQL credential, node IP address, service port, and database name match the MySQL database used by Hive. Otherwise, data verification will fail.
Table 5 Parameters for creating a connection to Delta Lake (with metadata) Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Delta-Lake-with-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential
Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address
Enter the IP address for connecting to the executor.
Executor Port
Enter the port for connecting to the executor.
Spark Client Directory
Enter the installation directory of the Spark client.
Environment Variable Address
Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location
Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory.
NOTICE:After the migration is complete, you need to manually clear the folders generated at this location to release storage space.
Table 6 Parameters for creating a connection to Delta Lake (without metadata) Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Delta-Lake-without-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential
Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address
Enter the IP address for connecting to the executor.
Executor Port
Enter the port for connecting to the executor.
Spark Client Directory
Enter the installation directory of the Spark client.
Environment Variable Address
Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location
Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory.
NOTICE:After the migration is complete, you need to manually clear the folders generated at this location to release storage space.
Collect Usage Metrics
This parameter is optional. If this option is enabled, usage metrics for your big data resources will be collected during the execution of tasks created using this connection. The collected information is used to generate reports on the MgC console and for performance optimization.
NOTICE:Before using this function, ensure that the Huawei Cloud account you added to the MgC Agent has the read-only permission for MRS and DLI.
- If the selected credential is the one you currently use to access MgC, you can select This is my MgC credential, and the projects in the region you choose will be listed.
- Under Region, select the region where the data to be verified is located.
- Under Project, select the project where the data to be verified is stored.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
- If the selected Doris credential is not the one you currently use to access MgC:
- Under Region ID, enter the ID of the region where the data to be verified is located. For example, if the region is CN South-Guangzhou, enter cn-south-1.
- Under Project ID, enter the project ID corresponding to the region.
- Under Cluster ID, enter the ID of the cluster where the data to be verified is located.
NOTE:
- To view the region ID and project ID, choose My Credentials > API Credentials.
- For details about how to obtain the cluster ID, see Obtaining an MRS Cluster ID.
Table 7 Parameters for creating a connection to Hudi (with metadata) Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Hudi-with-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential
Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address
Enter the IP address for connecting to the executor.
Executor Port
Enter the port for connecting to the executor.
Spark Client Directory
Enter the installation directory of the Spark client.
Environment Variable Address
Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location
Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory.
NOTICE:After the migration is complete, you need to manually clear the folders generated at this location to release storage space.
Table 8 Parameters for creating a connection to Hudi (without metadata) Parameter
Configuration
Connection To
Select Source.
Connection Name
The default name is Hudi-without-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential
Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address
Enter the IP address for connecting to the executor.
Executor Port
Enter the port for connecting to the executor.
Spark Client Directory
Enter the installation directory of the Spark client.
Environment Variable Address
Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location
Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory.
NOTICE:After the migration is complete, you need to manually clear the folders generated at this location to release storage space.
- Click Test. MgC verifies whether the component can be connected using the information you provided. If the test is successful, the connection can be set up.
- After the connection test is successful, click Confirm. The connection is created.
- On the Connection Management page, view the created connection and its basic information. In the Operation column, click Modify to modify the connection settings.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot