Help Center > > User Guide> Storage-Compute Decoupling Operation Guide> Configuring the HDFS Mapping Mode to Connect to the OBS File System

Configuring the HDFS Mapping Mode to Connect to the OBS File System

Updated at: Aug 17, 2021 GMT+08:00

By mapping HDFS addresses to OBS addresses, you can migrate data from HDFS to OBS and access data without changing the data addresses in the service logic.

For example, if you migrate data from HDFS to OBS, you can use the HDFS address mapping function to access data stored in OBS without modifying your service code logic. Alternatively, you can migrate a part of metadata from HDFS to OBS, and use the HDFS address mapping function to access data stored in both OBS and HDFS.

This function does not support access using the REST API of WebHdfsFileSystem (a FileSystem for HDFS over the web).

Do not modify the configuration file on the HDFS server. Otherwise, the HDFS service may fail to be restarted.

Operating HDFS by Running a Hadoop Client Command (/path or hdfs://namespace/path)

  1. Log in to all Master nodes and perform the following operations:
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

  4. Use hdfs://namespace/ to access data.

    hadoop fs -mkdir -p hdfs://hacluster/var/test

    hadoop fs -put abc.txt hdfs://hacluster/var/test/

    hadoop fs -ls hdfs://hacluster/var/test

  5. Use the Hadoop command line (without namespace) to access data.

    hadoop fs -mkdir -p /var/test

    hadoop fs -put abc.txt /var/test/

    hadoop fs -ls /var/test

Operating HDFS by Running the Hadoop Client Command (hdfs://namenodeIp:port/path)

  1. Log in to all Master nodes and perform the following operations:
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.mounttable.namenodeIp1:port.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.mounttable.namenodeIp2:port.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.
    • namenodeIp indicates the IP address of the HDFS NameNode instance, and port indicates the RPC port number of the HDFS NameNode. The default value is 9820. The IP addresses of all NameNode instances must be configured in the core-site.xml file.

Executing a MapReduce Job by Running a Hadoop Client Command

  1. Log in to all Master nodes and perform the following operations:
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

  4. Run the following command to edit the core-site.xml file used by Yarn:

    vim /opt/client/Yarn/config/core-site.xml

  5. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

Accessing HDFS Data Using Hive Beeline JDBC

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click Manage Agency on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.

Detailed configuration:

  1. Log in to the service configuration page.

    • For versions earlier than MRS 3.x, log in to the cluster details page and choose Components > Hive > Service Configuration.

      If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click Click to synchronize on the right side of IAM User Sync to synchronize IAM users.)

    • For MRS 3.x or later, log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager (MRS 3.x or Later). Choose Cluster > Services > Hive > Configurations.

  2. In the configuration type drop-down box, switch Basic Configurations to All Configurations.
  3. Choose Hive > Customization, and add the following configurations to core.site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
      • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
      • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
      • The directory following the mapped key .link cannot be the following directories or their subdirectories used during Hive startup.
        • /tmp/hive-scratch
        • /tmp/hive
        • /apps
        • /datasets
        • /mrs
    Figure 1 Modifying customized Hive configurations on MRS Manager
    Figure 2 Modifying customized Hive configurations on FusionInsight Manager

  4. Click Save Configuration and select Restart the affected services or instances to restart the Hive service.

Accessing HDFS Data Using presto_cli.sh jdbc

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click Manage Agency on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.
  • Clusters of MRS 1.9.2.

Detailed configuration:

  1. On the cluster details page, click the Components tab.

    If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click Click to synchronize on the right side of IAM User Sync to synchronize IAM users.)

  2. Choose Presto > Service Configuration.
  3. In the configuration type drop-down box, switch Basic Configurations to All Configurations.
  4. Choose Presto > Hive, and add the following configurations to core.site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    • Add the following configurations for the user who uses VMs:
      • If user root accesses data: fs.hdfs.mounttable.hacluster.link./tmp/presto-root = obs://obs-test/job/presto_root/
      • If user omm accesses data: fs.hdfs.mounttable.hacluster.link./tmp/presto-omm = obs://obs-test/job/presto_omm/
      • If other users access data, replace the usernames in the following: fs.hdfs.mounttable.hacluster.link./tmp/presto-{Username} = obs://obs-test/job/presto_{Username}/
      Figure 3 Presto configuration
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.

  5. Click Save Configuration and select Restart the affected services or instances to restart the Presto service.

Submitting a Job Using Flink Installed on MRS

  1. Log in to the service configuration page.

    • For versions earlier than MRS 3.x, log in to the cluster details page and choose Components > Yarn > Service Configuration.

      If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click Click to synchronize on the right side of IAM User Sync to synchronize IAM users.)

    • For MRS 3.x or later, log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager (MRS 3.x or Later). Choose Cluster > Services > Yarn > Configurations.

  2. In the configuration type drop-down box, switch Basic Configurations to All Configurations.
  3. Choose Yarn > Customization, and add the following configurations to yarn.core-site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    Figure 4 Modifying Yarn configurations on MRS Manager
    Figure 5 Modifying Yarn configurations on FusionInsight Manager
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.
    • When Flink is started, it reads the configuration file in the HADOOP_HOME directory. If the OBS mapping has been configured in the HADOOP_HOME configuration file, an error indicating that the class cannot be found is reported when Flink submits a job, you can run the following commands to copy the packages in the HADOOP_HOME directory to the lib directory of Flink and change the permission of the user who submits the task to the read permission on the directory. (The JAR file name varies according to the cluster version. You can ignore the JAR file name when running the following commands.)

      cp $HADOOP_HOME/share/hadoop/hdfs/lib/*-wrapper-file-system-*.jar $FLINK_HOME/lib/

      chmod 755 $FLINK_HOME/lib/*-wrapper-file-system-*.jar

  4. Click Save Configuration and select Restart the affected services or instances to restart the Yarn service.

Submitting a Spark Job or Executing Spark SQL Statements

  1. Log in to all Master nodes and perform the following operations:
  2. Run the following command to edit the core-site.xml file used by Spark:

    vim /opt/client/Spark/spark/conf/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./yy</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.AbstractFileSystem.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapper</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.

Accessing HDFS Data Using Spark Beeline JDBC

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click Manage Agency on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.

Detailed configuration:

  1. Log in to the service configuration page.

    • For versions earlier than MRS 3.x, log in to the cluster details page and choose Components > Spark > Service Configuration.

      If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click Click to synchronize on the right side of IAM User Sync to synchronize IAM users.)

    • For MRS 3.x or later, log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager (MRS 3.x or Later). Choose Cluster > Services > Spark2x > Configurations.

  2. Switch Basic Configurations to All Configurations and set Role to JDBCServer.
  3. Choose JDBCServer > Customization, and add the following configurations to spark.core-site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    • fs.AbstractFileSystem.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapper
    Figure 6 Modifying Spark configurations on MRS Manager
    Figure 7 Modifying Spark configurations on FusionInsight Manager
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS file system. Change it based on site requirements.

  4. Click Save Configuration and select Restart the affected services or instances to restart the Spark service.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel