Updated on 2024-09-23 GMT+08:00

Configuration Process

With MRS, you can store data in OBS and dedicate MRS clusters solely to computing tasks, isolating storage and compute resources. This approach offers flexible, on-demand scaling at a lower cost, making it well-suited for big data processing.

  • In big data storage-computing decoupling scenarios, make sure to use an OBS parallel file system. For details, see Parallel File System. Using a regular object bucket can significantly impact the performance of the cluster.
  • If a cluster has been connected to OBS (storage and compute decoupling or cold and hot data separation), you need to manually delete service data on OBS after deleting a component or MRS cluster.

Perform the following steps to use the storage-compute decoupling function:

  1. Configure a cluster with decoupled storage and compute.
    Select one of the following configurations (Using an agency is recommended.):
  2. Use the cluster.

    After the required permissions for accessing OBS are obtained, components in the MRS cluster can access the corresponding files through the client.

    For details about how to configure components to access OBS, see the following content:

Configuring Storage-Compute Decoupling

  1. Create an MRS cluster.

    The MRS cluster must contain basic components such as Guardian, Ranger, and Hadoop.

    Currently, only MRS 3.3.0-LTS or later supports interconnection with OBS using Guardian.

  2. Create an OBS agency.

    Create an agency with OBS access permissions, which allows Guardian to connect to OBS.

  3. Enable Guardian to connect to OBS and set parameters.

    Modify Guardian configuration parameters and configure IAM agency authentication information.

  4. Configure the policy for clearing component data in the recycle bin directory.

    In storage and computing decoupling scenarios, the data anti-deletion function is enabled for components that connect to OBS by default. When you delete data, the deleted object will be moved to the corresponding recycle bin directory. To avoid the risk of storage space being used up, you need to configure a lifecycle policy for the corresponding directory in the OBS file system.

  5. Interconnect components with OBS.
  6. Components in the MRS cluster can directly access the corresponding path after being granted required permissions for accessing OBS buckets. You can directly access resources in the OBS file system via the component client using an absolute path.

Granting OBS Permissions

Enabling storage-compute decoupling and Ranger authentication for an MRS cluster that connects to OBS using Guardian allows Ranger administrators to grant cluster users permissions to read and write OBS directories and files.

In addition, based on the Guardian permission model, storage-compute decoupling, and Hive cascading authorization, users can be granted service table authorization based on Ranger, which automatically associates fine-grained OBS storage directory permissions without requiring secondary authorization. Users only need to authorize the service table once on Ranger, and the system will automatically associate the permissions of the data storage source in a fine-grained manner, without the need to be aware of the storage path of the table or perform secondary authorization.

  • On Ranger, only users in Ranger custom user groups can be granted OBS permissions. Enter 1 to 52 characters for each user group name. Only letters, numbers, underscores (_), and number signs (#) are allowed. Or, the policy fails to add.
  • For clusters with Kerberos authentication enabled, permissions need to be granted based on Ranger. For clusters with Kerberos authentication disabled, OBS permissions are granted by default, and no additional configuration is required.
  • If Kerberos authentication is not enabled for the current cluster, the user who accesses OBS must belong to the supergroup group.