Updated on 2025-08-11 GMT+08:00

Accessing OBS Using MapReduce Through Guardian

After Guardian is interconnected with OBS by referring to Disabling Ranger OBS Path Authentication for Guardian or Enabling Ranger OBS Path Authentication for Guardian, you need to add custom configurations for the MapReduce component.

Interconnecting MapReduce with OBS

  1. Log in to the MRS management console. In the navigation pane on the left, choose Clusters > Active Clusters. On the displayed page, click the name of the cluster you created to access its details page.
  2. Choose Components > MapReduce.On the displayed page, click Configurations then All Configurations. In the navigation pane on the left, choose MapReduce > Customization. In the custom configuration items, add the configuration item mapreduce.jobhistory.always-scan-user-dir to core-site.xml file.

    This parameter specifies whether JobHistory always includes log files in the user directory when scanning job logs. The value can be true or false.

    • true: The JobHistory service will forcibly scan the specified directory even if the directory has been scanned. This ensures that all related log files are correctly collected and processed each time a job is executed.
    • false (default value): The JobHistory service does not scan the user directories that has been scanned. This reduces unnecessary disk I/O operations.

    Set this parameter to true.

    Figure 1 Adding a custom parameter

  3. Click Save Configuration.
  4. Click the Service Status tab and choose More > Restart Service to restart the MapReduce service.
  5. If the OBS job is submitted and executed, MapReduce is successfully connected to OBS. For example, create a Hive table and set its Location to an OBS path.

    1. Log in to the node where the Hive client is installed and run the following commands. The authenticated user must have the read and write permissions to create Hive tables and access OBS paths.

      Go to the client installation directory.

      cd Client installation directory

      Load the environment variables.

      source bigdata_env

      Authenticate the user. Skip this step for clusters with Kerberos authentication disabled.

      kinit Service user
    2. Log in to the Hive client.
      beeline
    3. Create a Hive table and insert data into the table.
      1. Create a table.
        create table test(name string) location "obs://OBS parallel file system name/user/hive/warehouse/Database name/Table name";
      2. Check the Location of the table.
        desc formatted test;

        It is found that the Location of the table is the OBS path.

        Figure 2 Checking the table location
      3. Insert data.
        insert into table test values("A");
      4. Query table data.
        select * from test;
    4. Log in to FusionInsight Manager, choose Cluster > Services > Yarn, and click the hyperlink next to ResourceManager WebUI to go to the YARN web UI. On the All Applications page, click the application ID of the target Hive task to view the job running information. If Application Type is MAPREDUCE, the job submitted to OBS is a MapReduce job.
      Figure 3 Viewing job running information
    5. Click Logs in the Logs column at the bottom of the page to view job logs.