Help Center/ ROMA Connect/ User Guide/ Data Source Management/ Connecting to an MRS HDFS Data Source
Updated on 2024-05-07 GMT+08:00

Connecting to an MRS HDFS Data Source

Overview

ROMA Connect can use the MRS HDFS data source for data integration tasks. Before using the MRS HDFS data source, you need to connect it to ROMA Connect.

If two data integration tasks use MRS data sources of different versions (including MRS Hive, MRS HDFS, and MRS HBase) and Kerberos authentication is enabled for the MRS data sources, the two data integration tasks cannot be executed at the same time. Otherwise, the integration tasks fail.

Prerequisites

  • Each connected data source must belong to an integration application. Ensure that an integration application is available before connecting a data source, or create one first.
  • Kerberos authentication has been enabled for the MRS cluster where the MRS HDFS data source is located. The execution permission has been configured for machine-machine interaction users. For details, see Preparing a Development User

Procedure

  1. Log in to the ROMA Connect console. On the Instances page, click View Console of an instance.
  2. In the navigation pane on the left, choose Data Sources. In the upper right corner of the page, click Access Data Source.
  3. On the Default tab page, select MRS HDFS and click Next.
  4. Configure the data source connection information.
    Table 1 Data source connection information

    Parameter

    Description

    Name

    Enter a data source name. Using naming rules facilitates future search.

    Encoding Format

    Default: utf-8

    Integration Application

    Select the integration application to which the data source belongs.

    Description

    Enter a brief description of the data source.

    HDFS URL

    Enter the name of the MRS HDFS file system to access.

    • If the root directory is used, set this parameter to hdfs:///. This operation requires administrator permissions.
    • If the default directory is used, set this parameter to hdfs:///hacluster. This operation requires administrator permissions.
    • If a planned directory is used, set this parameter to the planned directory.
    • If a user database directory is used, for example, /user/hdfs/testdb, the user must have the permission on the directory.

    Machine-machine Username

    Enter the machine-machine username for connecting to MRS HDFS.

    Configuration File

    Click Upload to upload the MRS HDFS configuration file. For details about how to obtain the files, see "Obtaining MRS HDFS configuration files".

    Obtaining MRS HDFS configuration files

    1. Obtain the krb5.conf and user.keytab files.

      Download the user authentication file from MRS Manager by following the procedure described in Downloading a User Authentication File, and decompress the file to obtain the krb5.conf and user.keytab files.

    2. Obtain the core-site.xml, hdfs-site.xml, and hosts files.

      Download the client configuration file from the MRS console by following the procedure described in Updating a Client Configuration File. After the file is decompressed:

      • Obtain the hosts file from xxx_Services_ClientConfig_ConfigFiles.
      • Obtain the core-site.xml and hdfs-site.xml files from xxx_Services_ClientConfig_ConfigFiles > HDFS > config.

        Check whether the value of dfs.client.failover.proxy.provider.hacluster in the hdfs-site.xml file is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider. If no, change it to this value.

    3. Obtain the MRS HDFS configuration files.

      Save the obtained files to a new directory and compress them into a .zip package. All files are stored in the root directory of the .zip package.

      • The file name contains a maximum of 255 characters, including only letters and digits.
      • The file size cannot exceed 2 MB.
  5. Click Check Connectivity to check the connectivity between ROMA Connect and the data source.
    • If the test result is Data source connected successfully, go to the next step.
    • If the test result is Failed to connect to the data source, check the data source status and connection parameters, and click Recheck until the connection is successful.
  6. Click Create.