Help Center > > User Guide> MRS Cluster Component Operation Guide> Using MRS to Access OBS> Accessing OBS Using obs

Accessing OBS Using obs

Updated at: Apr 28, 2020 GMT+08:00

In MRS 1.9.2 or later, OBS can be interconnected with MRS using obs://. Currently, Hadoop, Hive, Spark, HBase, Presto, and Flink are supported. In MRS 1.9.2, HBase cannot use obs:// to interconnect with OBS.

MRS provides two methods for accessing OBS using the obs:// protocol:

  • Configure the AK/SK in an MRS cluster. The AK/SK will be exposed in the configuration file in plaintext. Exercise caution when performing this operation. For details, see the following part in this section.
  • Bind an agency of the ECS type to an MRS cluster to access OBS, preventing the AK/SK from being exposed in the configuration file. For details, see Accessing OBS Using an ECS Agency.

Using Hadoop to Access OBS

  • Add the following content to file core-site.xml in the HDFS directory ($client_home/ HDFS/hadoop/etc/hadoop) on the MRS client:
    <property>
        <name> fs.obs.access.key</name>
        <value>ak</value>
    </property>
    <property>
        <name> fs.obs.secret.key</name>
        <value>sk</value>
    </property>
    <property>
        <name> fs.obs.endpoint</name>
        <value>obs endpoint</value>
    </property>

    AK and SK will be displayed as plaintext in the configuration file. Exercise caution when setting AK and SK in the file.

    After the configuration is added, you can directly access data on OBS without manually adding the AK/SK and endpoint. For example, run the following command to view the file list of folder test_obs_orc in bucket obs-test:

    hadoop fs –ls "obs://obs-test/test_obs_orc"

  • Add AK/SK and endpoint to the command line to access data on OBS.

    hadoop fs -Dfs.obs.endpoint=xxx -Dfs.obs.access.key=xx -Dfs.obs.secret.key=xx -ls "obs://obs-test/ test_obs_orc"

Using Hive to Access OBS

  1. Go to the cluster details page and choose Components > Hive > Service Configuration.

    For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services > Hive > Service Configuration.

  2. Set Type to All.
  3. Search for fs.obs.access.key and fs.obs.secret.key and set them to the AK and SK of OBS respectively.

    Figure 1 Configuring the AK/SK of OBS

  4. Click Save Configuration and select Restart the affected services or instances to restart the Hive service.
  5. Access the OBS directory in the beeline. For example, run the following command to create a Hive table and specify that data is stored in the test_obs directory of bucket test-bucket:

    create table test_obs(a int, b string) row format delimited fields terminated by "," stored as textfile location "obs://test-bucket/test_obs";

Using Spark to Access OBS

SparkSQL depends on Hive. Therefore, when configuring OBS on Spark, you need to modify the OBS configuration used in Using Hive to Access OBS.

  • spark-beeline and spark-sql

    You can add the following OBS attributes to the shell to access OBS:

    set fs.obs.endpoint=xxx
    set fs.obs.access.key=xxx
    set fs.obs.secret.key=xxx
  • spark-beeline
    The spark-beeline can access OBS by configuring service parameters in MRS Manager. The procedure is as follows:
    1. Go to the cluster details page and choose Components > Spark > Service Configuration.

      For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services > Spark > Service Configuration.

    2. Set Type to All.
    3. Choose JDBCServer > OBS, and set values for fs.obs.access.key and fs.obs.secret.key.
    4. Click Save Configuration and select Restart the affected services or instances to restart the HBase service.
    5. Access OBS in spark-beeline. For example, access the obs://obs-demo-input/table/ directory.

      create table test(id int) location 'obs://obs-demo-input/table/';

  • spark-sql and spark-submit

    The spark-sql can also access OBS by modifying the core-site.xml configuration file.

    The method of modifying the configuration file is the same when you use the spark-sql and spark-submit to submit a task to access OBS.

    Add the following content to core-site.xml in the Spark configuration folder ($client_home/Spark/spark/conf) on the MRS client:

    <property>
        <name> fs.obs.access.key</name>
        <value>ak</value>
    </property>
    <property>
        <name> fs.obs.secret.key</name>
        <value>sk</value>
    </property>
    <property>
        <name> fs.obs.endpoint</name>
        <value>obs endpoint</value>
    </property>

Using HBase to Access OBS

In MRS 1.9.2, HBase cannot use obs:// to interconnect with OBS.

  1. Go to the cluster details page and choose Management Operation > Stop All Components in the upper right corner.

    For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services > HBase, and click Stop Service.

  2. Log in to a Master node. For details, see Logging In to an ECS.
  3. Run the following command to initialize environment variables:

    source /opt/client/bigdata_env

  4. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If the Kerberos authentication is disabled for the current cluster, skip this step.

    kinit MRS cluster user

    For example, kinit hbaseuser.

  5. Run the following command to delete HDFS/ZooKeeper data from the original HBase:

    hbase clean --cleanAll

  6. Go to the cluster details page and choose Components > HBase > Service Configuration.

    For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services > HBase > Service Configuration.

  7. Set Type to All.
  8. Set the following parameters for accessing OBS:

    • fs.obs.access.key
    • fs.obs.secret.key
    • Set hbase.wal.provider to filesystem.
    • Set hbase.rootdir to the data storage directory, for example, obs://bucket _name/hbase.

  9. Click Save Configuration and select Restart the affected services or instances to restart the HBase service.
  10. When HBase is used, the HFile data on HBase is stored on OBS by default, and WAL files are still stored on HDFS.

Using Presto to Access OBS

MRS 2.0.3 or later supports this function.

  1. Go to the cluster details page and choose Components > Presto > Service Configuration.

    For MRS 1.8.10 or earlier, log in to MRS Manager. For details, see Accessing MRS Manager. Then, choose Services > Hive > Service Configuration.

  2. Set Type to All.
  3. Search for and configure the following parameters:

    • Set fs.obs.access.key to ak.
    • Set fs.obs.secret.key to sk.

  4. Click Save Configuration and select Restart the affected services or instances to restart the Presto service.
  5. Choose Components > Hive > Service Configuration.
  6. Set Type to All.
  7. Search for and configure the following parameters:

    • Set fs.obs.access.key to ak.
    • Set fs.obs.secret.key to sk.

  8. Click Save Configuration and select Restart the affected services or instances to restart the Hive service.
  9. On the Presto client, run the following statement to create a schema and set location to an OBS path:

    CREATE SCHEMA hive.demo WITH (location = 'obs://obs-demo/presto-demo/');

  10. Create a table in the schema. The table data is stored in the OBS bucket. The following is an example.

    CREATE TABLE hive.demo.demo_table WITH (format = 'ORC') AS SELECT * FROM tpch.sf1.customer;

Using Flink to Access OBS

Add the following configuration to the Flink configuration file of the MRS client in Client installation path /Flink/flink/conf/flink-conf.yaml:
fs.obs.access.key:ak
fs.obs.secret.key: sk  
fs.obs.endpoint: obs endpoint

AK and SK will be displayed as plaintext in the configuration file. Exercise caution when setting AK and SK in the file.

After the configuration is added, you can directly access data on OBS without manually adding the AK/SK and endpoint.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel