Updated on 2025-08-11 GMT+08:00

Accessing OBS Using YARN Through Guardian

After Guardian is interconnected with OBS by referring to Disabling Ranger OBS Path Authentication for Guardian or Enabling Ranger OBS Path Authentication for Guardian, you can execute YARN jobs on the cluster client to access OBS.

Prerequisites

If Guardian is connected to OBS by referring to Enabling Ranger OBS Path Authentication for Guardian, ensure that you have the read and write permissions on OBS path in Ranger. For details about how to grant the permissions, see Configuring Ranger Permissions.

Interconnecting YARN with OBS

  1. Log in to the node where the YARN client is installed as the client installation user.
  2. Run the following command to switch to the client installation directory.

    cd Client installation directory

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. If the cluster is enabled with Kerberos authentication, run the following command to perform user authentication. The user must have the read and write permissions on the OBS directory. User authentication is not required for clusters with Kerberos authentication disabled.

    kinit User performing HDFS operations

  5. Explicitly add the OBS file system to be accessed in the YARN command line.

    • Access the OBS file system.
      hdfs dfs -ls obs://OBS parallel file system name/path
    • Create a directory in the OBS file system.
      hdfs dfs -mkdir obs://OBS parallel file system name/hadoop1
    • Execute the YARN task to access OBS.
      yarn jar Client installation directory/HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi -Dmapreduce.job.hdfs-servers=NAMESERVICE -fs obs://OBS parallel file system name 1 1

      NAMESERVICE indicates the NameService in HDFS. The default value is hdfs://hacluster. If there are multiple NameServices, separate them with ,.

      Example:

      yarn jar /opt/hadoopclient/HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi -Dmapreduce.job.hdfs-servers=hdfs://hacluster -fs obs://bucketname 1 1
    • Run the following command to write data to OBS:
      yarn jar Client installation directory/HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 100  obs://OBS parallel file system name/hadoop1/teragen1
    • Run the following command to copy data from OBS to HDFS:
      hadoop distcp obs://OBS parallel file system name/hadoop1/teragen1 /tmp

Changing the Log Level of OBS Client

If a large number of logs are printed in the OBS file system, the read and write performance may be affected. You can adjust the log level of the OBS client as follows:
  1. Go to the hadoop directory.
    cd Client installation directory/HDFS/hadoop/etc/hadoop
  2. Edit the file log4j.properties.
    vi  log4j.properties

    Add the following OBS log level configuration to the file and save it.

    log4j.logger.org.apache.hadoop.fs.obs=WARN
    log4j.logger.com.obs=WARN
  3. Run the following command:
    tail -4 log4j.properties

    If the command output shown in Figure 1 is displayed, the log level is successfully changed.

    Figure 1 Adding an OBS log level

Configuring Ranger Permissions

  1. Log in to FusionInsight Manager and choose System > Permission > User Group. On the displayed page, click Create User Group to create a user group without any roles, for example, obs_hadoop1.

    For details about how to log in to MRS Manager, see Accessing MRS Manager.

  2. Back to FusionInsight Manager and choose System > Permission > User. On the displayed page, click Create User to create a user that is associated with the obs_hadoop1 user group and the default role, for example, hadoopuser1.
  3. Log in to the Ranger management page as the rangeradmin user.
  4. On the home page, click component plug-in name OBS in the EXTERNAL AUTHORIZATION area.
  5. Click Add New Policy and add the Read and Write permissions on the desired OBS paths to the user group created in Step 1.

    The following figure shows the configurations needed for adding the Read and Write permissions on obs://OBS parallel file system name/hadoop1 to user group obs_hadoop1.

    Figure 2 Granting the new user group permissions for reading and writing OBS paths

    Before configuring permission policies for OBS paths on Ranger, ensure that the AccessLabel function has been enabled for OBS. If the function is not enabled, manually enable it. For details, contact OBS O&M personnel.