Help Center > > User Guide> MRS Cluster Component Operation Guide> Using MRS to Access OBS> Accessing OBS by Mapping HDFS Addresses

Accessing OBS by Mapping HDFS Addresses

Updated at: Apr 28, 2020 GMT+08:00

By mapping HDFS addresses to OBS addresses, you can migrate data from HDFS to OBS and access data without changing the data addresses in the service logic.

For example, if you migrate data from HDFS to OBS, you can use the HDFS address mapping function to access data stored in OBS without modifying your service code logic. Alternatively, you can migrate a part of metadata from HDFS to OBS, and use the HDFS address mapping function to access data stored in both OBS and HDFS.

This function does not support access using the REST API of WebHdfsFileSystem (A FileSystem for HDFS over the web).

Operating HDFS by Running a Hadoop Client Command (/path or hdfs://namespace/path)

  1. Log in to a Master node.
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

  4. Use hdfs://namespace/ to access data.

    hadoop fs -mkdir -p hdfs://hacluster/var/test

    hadoop fs -put abc.txt hdfs://hacluster/var/test/

    hadoop fs -ls hdfs://hacluster/var/test

  5. Use the Hadoop command line (without namespace) to access data.

    hadoop fs -mkdir -p /var/test

    hadoop fs -put abc.txt /var/test/

    hadoop fs -ls /var/test

Operating HDFS by Running the Hadoop Client Command (hdfs://namenodeIp:port/path)

  1. Log in to a Master node.
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.mounttable.namenodeIp1:port.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.mounttable.namenodeIp2:port.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.
    • namenodeIp indicates the IP address of the HDFS NameNode instance, and port indicates the RPC port number of the HDFS NameNode. The default value is 9820. The IP addresses of all NameNode instances must be configured in the core-site.xml file.

Executing a MapReduce Job by Running a Hadoop Client Command

  1. Log in to a Master node.
  2. Run the following command to edit the core-site.xml file used by HDFS:

    vim /opt/client/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

  4. Run the following command to edit the core-site.xml file used by Yarn:

    vim /opt/client/Yarn/config/core-site.xml

  5. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./var</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    <property>
    <name>fs.obs.endpoint</name>
    <value>obs endpoint</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.
    • obs endpoint indicates the endpoint of OBS. Obtain the endpoint from Regions and Endpoints.

Accessing HDFS Data Using Hive Beeline JDBC

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.

Detailed configuration:

  1. On the cluster details page, click the Components tab.

    • If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click on the right side of IAM User Sync to synchronize IAM users.)
    • For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services.

  2. Choose Hive > Service Configuration.
  3. Set Type to All.
  4. Choose Hive > Customization, and add the following configurations to core.site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
      • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
      • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.
      • The directory following the mapped key .link cannot be the following directories or their subdirectories used during Hive startup.
        • /tmp/hive-scratch
        • /tmp/hive
        • /apps
        • /datasets
        • /mrs
        • /user
    Figure 1 Custom Hive configuration

  5. Click Save Configuration and select Restart the affected services or instances to restart the Hive service.

Accessing HDFS Data Using presto_cli.sh jdbc

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.

Detailed configuration:

  1. On the cluster details page, click the Components tab.

    • If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click on the right side of IAM User Sync to synchronize IAM users.)
    • For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services.

  2. Choose Presto > Service Configuration.
  3. Set Type to All.
  4. Choose Presto > Hive, and add the following configurations to core.site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    • Add the following configurations for the user who uses VMs:
      • If user root accesses data: fs.hdfs.mounttable.hacluster.link./tmp/presto-root = obs://obs-test/job/presto_root/
      • If user omm accesses data: fs.hdfs.mounttable.hacluster.link./tmp/presto-omm = obs://obs-test/job/presto_omm/
      • If other users access data, replace the usernames in the following: fs.hdfs.mounttable.hacluster.link./tmp/presto-{Username} = obs://obs-test/job/presto_{Username}/
      Figure 2 Presto configuration
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.

  5. Click Save Configuration and select Restart the affected services or instances to restart the Presto service.

Submitting a Job Using the Flink Component Installed on MRS

  1. On the cluster details page, click the Components tab.

    • If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click on the right side of IAM User Sync to synchronize IAM users.)
    • For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services.

  2. Choose Yarn > Service Configuration.
  3. Set Type to All.
  4. Choose Yarn > Customization, and add the following configurations to yarn.core-site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    Figure 3 Yarn configuration
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.

  5. Click Save Configuration and select Restart the affected services or instances to restart the Yarn service.

Submitting a Spark Job or Executing Spark SQL Statements

  1. Log in to a Master node.
  2. Run the following command to edit the core-site.xml file used by Spark:

    vim /opt/client/Spark/spark/conf/core-site.xml

  3. Add the following information to the core-site.xml file:

    <property>
    <name>fs.hdfs.mounttable.hacluster.link./yy</name>
    <value>obs://obs-test/job/yy</value>
    </property>
    <property>
    <name>fs.hdfs.impl</name>
    <value>com.huawei.hadoop.MRSHDFSWrapperFileSystem</value>
    </property>
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.

Accessing HDFS Data Using Spark Beeline JDBC

Prerequisites:

  • The default file system of the cluster is HDFS. That is, the configuration item of fs.defaultFS starts with hdfs://.
  • The default agency has been configured for the cluster. Perform the following steps to bind the agency.
    1. On the Dashboard tab page of the cluster details page, click on the right side of Agency and bind MRS_ECS_DEFAULT_AGENCY.
    2. On the Nodes tab page, click each node name. On the ECS details page that is displayed, ensure that MRS_ECS_DEFAULT_AGENCY has been bound to all nodes.

Detailed configuration:

  1. On the cluster details page, click the Components tab.

    • If the Components tab is not displayed on the cluster details page, complete IAM user synchronization first. (On the Dashboard tab page of the cluster details page, click on the right side of IAM User Sync to synchronize IAM users.)
    • For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services.

  2. Choose Spark > Service Configuration.
  3. Set Type to All and Role to JDBCServer.
  4. Choose JDBCServer > Customization, and add the following configurations to spark.core-site.customized.configs:

    • fs.hdfs.impl = com.huawei.hadoop.MRSHDFSWrapperFileSystem
    • fs.hdfs.mounttable.hacluster.link./yy = obs://obs-test/job/yy
    Figure 4 Spark configuration
    • hacluster is the namespace of the value of fs.defaultFS in the core-site.xml file. If the namespace of the default value of fs.defaultFS is changed, change the value of hacluster.
    • obs://obs-test/job/yy indicates the directory of the data to be accessed in the OBS bucket. Change it based on site requirements.

  5. Click Save Configuration and select Restart the affected services or instances to restart the Spark service.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel