Creating a Connection to a Source Component

To verify the consistency of data stored using big data components, you need to establish connections between MgC and the big data components.

The supported source big data components include:

Doris
HBase
ClickHouse
Hive Metastore
Delta Lake (with metadata)
Delta Lake (without metadata)
Hudi (with metadata)
Hudi (without metadata)

Procedure

Sign in to the MgC console.
In the navigation pane on the left, choose Migrate > Big Data Verification. Select the created migration project in the upper left corner of the page.
In the Features area, click Connection Management.
Click Create Connection in the upper right corner of the page.
Select a big data component and click Next.

Set parameters based on the big data component you selected.

Parameters for creating a connection to Doris
Parameters for creating a connection to HBase
Parameters for creating a connection to ClickHouse
Parameters for creating a connection to Hive Metastore
Parameters for creating a connection to Delta Lake (with metadata)
Parameters for creating a connection to Delta Lake (without metadata)
Parameters for creating a connection to Hudi (with metadata)
Parameters for creating a connection to Hudi (without metadata)

**Table 1** Parameters for creating a connection to Doris
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Doris-4 random characters (including letters and numbers). You can also customize a name.
Doris Credential	Select the source Doris credential added to the Edge device. For details about how to add credentials, see "Big data - Doris" in Adding Resource Credentials.
Database IP Address	Enter the IP address for accessing the source Doris cluster.
Database Port	Enter the port used for accessing the source Doris cluster. The default value is 3306.
Database Name	Enter the name of the source Doris database.

**Table 2** Parameters for creating a connection to HBase
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is HBase-4 random characters (including letters and numbers). You can also customize a name.
HBase Credential	Select the source HBase credential added to the Edge device. For details about how to add credentials, see "Big data - HBase" in Adding Resource Credentials.
Secured Cluster	Choose whether the cluster is secured.
ZooKeeper IP Address	Enter the IP address for connecting to the source ZooKeeper node. You can enter the public or private IP address of the ZooKeeper node.
ZooKeeper Port	Enter the port for connecting to the source ZooKeeper node.
HBase Version	Select the source HBase version.

**Table 3** Parameters for creating a connection to ClickHouse
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is ClickHouse-4 random characters (including letters and numbers). You can also customize a name.
ClickHouse Credential (Optional)	Select the source ClickHouse credential added to the Edge device. For details about how to add credentials, see "Big data - ClickHouse" in Adding Resource Credentials.
Secured Cluster	Choose whether the cluster is secured.
ClickHouse Server IP Address	Enter the IP address for accessing the source ClickHouse server. Generally, the IP address refers to that of the server where ClickHouse is hosted.
HTTP Port	If the source ClickHouse cluster is unsecured, enter the HTTP port for communicating with the ClickHouse server. The default value is 8123.
HTTP SSL/TLS Port	If the source ClickHouse cluster is secured, enter the HTTPS port for communicating with the ClickHouse server.

**Table 4** Parameters for creating a connection to Hive Metastore
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Hive-Metastore-4 random characters (including letters and numbers). You can also customize a name.
Secure Connection	Choose whether to enable secure connection. If Hive Metastore is deployed in an unsecured cluster, do not enable secure connection. If Hive Metastore is deployed in a secured cluster, enable secure connection and provide access credentials. For details about how to obtain and add credentials to MgC, see "Big data - Hive Metastore" in Adding Resource Credentials.
Hive Version	Select the source Hive version. CAUTION: If the source Hive version is 2.1.1, select 1.x.
Hive Metastore IP Address	Enter the IP address for connecting to the Hive Metastore node.
Hive Metastore Thrift Port	Enter the port for connecting to the Hive Metastore Thrift service. The default port is 9083.
Connect to Metadata Database	During an incremental data verification, querying with Hive Metastore on more than 30,000 partitions may lead to a memory overflow (OOM) since all partition information is loaded into memory. Connecting to the MySQL metadata database can effectively prevent this issue. If you disable this option, the system queries the information of Hive tables and partitions using Hive Metastore. If you enable this option, configure the MySQL database information. The system will query the information of Hive tables and partitions through the MySQL database. You need to set the following parameters: Metadata Database Type: Only MySQL is supported. MySQL Credential: Select the credential for accessing the MySQL database. You need to add the credential to Edge and synchronize it to MgC. For details, see Adding Resource Credentials. MySQL Node IP Address: Enter the IP address of the MySQL database server. MySQL Port: Enter the port of the MySQL database service. Database Name: Enter the name of the database that stores the Hive table metadata. NOTE: Ensure that the entered MySQL credential, node IP address, service port, and database name match the MySQL database used by Hive. Otherwise, data verification will fail.

**Table 5** Parameters for creating a connection to Delta Lake (with metadata)
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Delta-Lake-with-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential	Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address	Enter the IP address for connecting to the executor.
Executor Port	Enter the port for connecting to the executor.
Spark Client Directory	Enter the installation directory of the Spark client.
Environment Variable Address	Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location	Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory. NOTICE: After the migration is complete, you need to manually clear the folders generated at this location to release storage space.

**Table 6** Parameters for creating a connection to Delta Lake (without metadata)
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Delta-Lake-without-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential	Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address	Enter the IP address for connecting to the executor.
Executor Port	Enter the port for connecting to the executor.
Spark Client Directory	Enter the installation directory of the Spark client.
Environment Variable Address	Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location	Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory. NOTICE: After the migration is complete, you need to manually clear the folders generated at this location to release storage space.

**Table 7** Parameters for creating a connection to Hudi (with metadata)
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Hudi-with-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential	Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address	Enter the IP address for connecting to the executor.
Executor Port	Enter the port for connecting to the executor.
Spark Client Directory	Enter the installation directory of the Spark client.
Environment Variable Address	Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location	Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory. NOTICE: After the migration is complete, you need to manually clear the folders generated at this location to release storage space.

**Table 8** Parameters for creating a connection to Hudi (without metadata)
Parameter	Configuration
Connection To	Select Source.
Connection Name	The default name is Hudi-without-metadata-4 random characters (including letters and numbers). You can also customize a name.
Executor Credential	Select the login credential of the executor. For details about how to add credentials, see "Big data - Executor" in Adding Resource Credentials.
Executor IP Address	Enter the IP address for connecting to the executor.
Executor Port	Enter the port for connecting to the executor.
Spark Client Directory	Enter the installation directory of the Spark client.
Environment Variable Address	Enter the absolute path of the environment variable file (configuration file), for example, /opt/bigdata/client/bigdata_env.
SQL File Location	Enter a directory for storing the SQL files generated for consistency verification. You must have the read and write permissions for the directory. NOTICE: After the migration is complete, you need to manually clear the folders generated at this location to release storage space.

Click Test. MgC verifies whether the component can be connected using the information you provided. If the test is successful, the connection can be set up.
After the connection test is successful, click Confirm. The connection is created.
On the Connection Management page, view the created connection and its basic information. In the Operation column, click Modify to modify the connection settings.