Updated on 2024-09-23 GMT+08:00

Example Application Development for Interconnecting HDFS with OBS

Interconnection Principles

  • The code for creating a FileSystem object in HDFS queries the corresponding implementation class based on the URI scheme. That is, the implementation classes provided by different underlying layers are configured in the HDFS configuration file. HDFS creates the corresponding implementation class based on fs.AbstractFileSystem.%s.impl. The following is an example:
    *Create a file system instance for the specified uri using the conf. The conf is used to find the class name that implements the file system. The conf is also passed to the file system for its configuration.
    *
    *@param uri URI of the file system
    *@param conf Configuration for the file system
    *
    *@return Returns the file system for the given URI
    *
    *@throws UnsupportedFileSystemException file system for <code>uri</code> is not found
    */
    public static AbstractFileSystem createFileSystem(URI uri, Configuration conf)
        throws UnsupportedFileSystemException {
      final String fsImplConf = String.format("fs.AbstractFileSystem.%s.impl", uri.getScheme());
    
      Class<?> clazz = conf.getClass(fsImplConf, null);
      if (clazz == null) {
        throw new UnsupportedFileSystemException(String.format(
            "%s=null: %s: %s",
            fsImplConf, NO_ABSTRACT_FS_ERROR, uri.getScheme()));
      }
      return (AbstractFileSystem) newInstance(clazz, uri, conf);
    }
  • In core-default of HDFS, corresponding implementation classes have been added for different URLs such as adl, hdfs, and file.
    <property>
      <name>fs.AbstractFileSystem.adl.impl</name>
      <value>org.apache.hadoop.fs.adl.Adl</value>
    </property>
    <property>
      <name>fs.AbstractFileSystem.hdfs.impl</name>
      <value>org.apache.hadoop.fs.Hdfs</value>
      <description>The FileSystem for hdfs: uris.</description>
    </property>
    <property>
      <name>fs.AbstractFileSystem.file.impl</name>
      <value>org.apache.hadoop.fs.local.LocalFs</value>
      <description>The AbstractFileSystem for file: uris.</description>
    </property>
    
    <property>
      <name>fs.AbstractFileSystem.har.impl</name>
      <value>org.apache.hadoop.fs.HarFs</value>
      <description>The AbstractFileSystem for har: uris.</description>
    </property>
  • The OBS implementation class has been added to the default configuration file of MRS to connect to OBS.
    <property>
    <name>fs.AbstractFileSystem.obs.impl</name>
    <value>org.apache.hadoop.fs.obs.OBS</value>
    </property>

Obtaining the Configuration File of a Cluster

  1. Download and decompress the client by referring to Installing an MRS Cluster Client.
  2. Obtain core-site.xml and hdfs-site.xml from the downloaded HDFS client configuration file (Download path/HDFS/hadoop/etc/hadoop) and core-site.xml from the YARN client configuration file (Download path/Yarn/config).

    These files are used to replace the configuration files used in the original code.

  3. Add the following OBS access information to HDFS' and YARN's core-site.xml files:
    <property>
      <name>fs.obs.endpoint</name>
      <value>obs endpoint</value>
    </property>
    <property>
      <name>fs.obs.access.key</name>
      <value>xxx</value>
      <description>huaweicloud access key</description>
    </property>
    <property>
      <name>fs.obs.secret.key</name>
      <value>xxx</value>
      <description>huaweicloud secret key</description>
    </property>
    • Configuration files containing authentication passwords pose security risks. Delete such files after configuration or store them securely.
  4. Change the value of fs.defaultFS in the core-site.xml file on the HDFS client.
    For example, the value is hdfs://hacluster before the change.
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://hacluster</value>
    </property>

    Change the value to obs://Bucket name.

    <property>
      <name>fs.defaultFS</name>
      <value>obs://Bucket name</value>
    </property>
  5. To reduce OBS logs, add the following configuration to the log4j.properties file:
    log4j.logger.org.apache.hadoop.fs.obs=WARN
    log4j.logger.com.obs=WARN 

    If a large number of logs are printed in the OBS file system, the read and write performance may be affected. You can adjust the log level of the OBS client as follows:

    cd ${client_home}/HDFS/hadoop/etc/hadoop

    vi log4j.properties

    Add the OBS log level configuration to the file as follows:

    log4j.logger.org.apache.hadoop.fs.obs=WARN
    log4j.logger.com.obs=WARN

Adding Dependency Packages to Service Programs

Obtain the JAR files hadoop-huaweicloud-xxx-hw-xx.jar and mrs-obs-provider-xxx.jar from the MRS HDFS client installation package, place them in the classpath directory of the program, and modify the permissions and owner of the JAR files.