Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring a Storage-Compute Decoupled Cluster (Agency)

Updated on 2024-03-18 GMT+08:00

MRS allows you to store data in OBS and use an MRS cluster for data computing only. In this way, storage and compute are separated. You can create an IAM agency, which enables ECS to automatically obtain the temporary AK/SK to access OBS. This prevents the AK/SK from being exposed in the configuration file.

By binding an agency, ECSs or BMSs can manage some of your resources. Determine whether to configure an agency based on the actual service scenario.

MRS provides the following configuration modes for accessing OBS. You can select one of them. The agency mode is recommended.

  • Bind an agency of the ECS type to an MRS cluster to access OBS, preventing the AK/SK from being exposed in the configuration file. For details, see the following part in this section.
  • Configure the AK/SK in an MRS cluster. The AK/SK will be exposed in the configuration file in plaintext. Exercise caution when performing this operation. For details, see Configuring a Storage-Compute Decoupled Cluster (AK/SK).

This function is available for components Hadoop, Hive, Spark, Presto, and Flink in clusters of .

(Optional) Step 1: Create an ECS Agency with OBS Access Permissions

NOTE:
  • MRS presets MRS_ECS_DEFAULT_AGENCY in the agency list of IAM so that you can select this agency when creating a cluster. This agency has the OBSOperateAccess permission and the CESFullAccess (only available for users who have enabled fine-grained policies), CES Administrator, and KMS Administrator permissions in the region where the cluster is located. Do not modify MRS_ECS_DEFAULT_AGENCY on IAM.
  • If you want to use the preset agency, skip the step for creating an agency. If you want to use a custom agency, perform the following steps to create an agency. (To create or modify an agency, you must have the Security Administrator permission.)
  1. Log in to the management console.
  2. Choose Service List > Management & Governance > Identity and Access Management.
  3. Choose Agencies. On the displayed page, click Create Agency.
  4. Enter an agency name, for example, mrs_ecs_obs.
  5. Set Agency Type to Cloud service and select ECS BMS to authorize ECS or BMS to invoke OBS.
  6. Set Validity Period to Unlimited and click Next.
  7. On the displayed page, search for the OBS OperateAccess and select it.
  8. Click Next. On the displayed page, select the desired scope for permissions you selected. By default, All resources is selected. Click Show More and select Global resources.
  9. In the dialog box that is displayed, click OK to start authorization. After the message "Authorization successful." is displayed, click Finish. The agency is successfully created.

Step 2: Create a Cluster with Storage and Compute Separated

You can configure an agency when creating a cluster or bind an agency to an existing cluster to separate storage and compute. This section uses a cluster with Kerberos authentication enabled as an example.

Configuring an agency when creating a cluster:

  1. Log in to the MRS management console.
  2. Click Create Cluster. The page for creating a cluster is displayed.
  3. Click the Custom Config tab.
  4. On the Custom Config tab page, set software parameters.
    • Region: Select a region as required.
    • Cluster Name: You can use the default name. However, you are advised to include a project name abbreviation or date for consolidated memory and easy distinguishing.
    • Cluster Version: Select a cluster version.
    • Cluster Type: Select Analysis cluster or Hybrid cluster and select all components.
    • Metadata: Select Local.
  5. Click Next and set hardware parameters.
    • AZ: Use the default value.
    • VPC: Use the default value.
    • Subnet: Use the default value.
    • Security Group: Use the default value.
    • EIP: Use the default value.
    • Enterprise Project: Use the default value.
    • Cluster Node: Select the number of cluster nodes and node specifications based on site requirements.
  6. Click Next and set related parameters.
    • Kerberos Authentication: This function is enabled by default. You can enable or disable it.
    • Username: The default username is admin, which is used to log in to MRS Manager.
    • Password: Set a password for user admin.
    • Confirm Password: Enter the password of user admin again.
    • Login Mode: Select a method for logging in to ECSs. In this example, select Password.
    • Username: The default username is root, which is used to remotely log in to ECSs.
    • Password: Set a password for user root.
    • Confirm Password: Enter the password of user root again.
  7. In this example, configure an agency and leave other parameters blank. For details about how to configure other parameters, see (Optional) Advanced Configuration.

    Agency: Select the agency created in (Optional) Step 1: Create an ECS Agency with OBS Access Permissions or MRS_ECS_DEFAULT_AGENCY preset in IAM.

  8. To enable secure communications, select Enable. For details, see Communication Security Authorization.
  9. Click and wait until the cluster is created.

    If Kerberos authentication is enabled for a cluster, check whether Kerberos authentication is required. If yes, click Continue. If no, click Back to disable Kerberos authentication and then create a cluster.

Configuring an agency for an existing cluster:

  1. Log in to the MRS management console. In the left navigation pane, choose Clusters > Active Clusters.
  2. Click the name of the cluster to enter its details page.
  3. On the Dashboard page, click Synchronize on the right of IAM User Sync to synchronize IAM users.
  4. On the Dashboard tab page, click Manage Agency on the right side of Agency to select an agency and click OK to bind it. Alternatively, click Create Agency to go to the IAM console to create an agency and select it.

Step 3: Create an OBS File System for Storing Data

NOTE:

In the big data decoupled storage-compute scenario, the OBS parallel file system must be used to configure a cluster. Using common object buckets will greatly affect the cluster performance.

  1. Log in to OBS Console.
  2. Choose Parallel File System > Create Parallel File System.
  3. Enter the file system name, for example, mrs-word001.

    Set other parameters as required.

  4. Click Create Now.
  5. In the parallel file system list on the OBS console, click the file system name to go to the details page.
  6. In the navigation pane, choose Files and create the program and input folders.
    • program: Upload the program package to this folder.
    • input: Upload the input data to this folder.

Step 4: Accessing the OBS File System

  1. Log in to a Master node as user root. For details, see Logging In to an ECS.
  2. Run the following command to set the environment variables:

    For versions earlier than MRS 3.x, run the source /opt/client/bigdata_env command.

    For MRS 3.x or later, run the source /opt/Bigdata/client/bigdata_env command.

  3. Verify that Hadoop can access OBS.
    1. View the list of files in the file system mrs-word001.

      hadoop fs -ls obs://mrs-word001/

    2. Check whether the file list is returned. If it is returned, OBS access is successful.
      Figure 1 Returned file list
  4. Verify that Hive can access OBS.
    1. If Kerberos authentication has been enabled for the cluster, run the following command to authenticate the current user. The current user must have a permission to create Hive tables. For details about how to configure a role with a permission to create Hive tables, see Creating a Role. For details about how to create a user and bind a role to the user, see Creating a User. If Kerberos authentication is disabled for the current cluster, skip this step.

      kinit MRS cluster user

      Example: kinit hiveuser

    2. Run the client command of the Hive component.

      beeline

    3. Access the OBS directory in the beeline. For example, run the following command to create a Hive table and specify that data is stored in the test_obs directory of the file system mrs-word001:

      create table test_obs(a int, b string) row format delimited fields terminated by "," stored as textfile location "obs://mrs-word001/test_obs";

    4. Run the following command to query all tables. If table test_obs is displayed in the command output, OBS access is successful.

      show tables;

      Figure 2 Returned table name
    5. Press Ctrl+C to exit the Hive beeline.
  5. Verify that Spark can access OBS.
    1. Run the client command of the Spark component.

      spark-beeline

    2. Access OBS in spark-beeline. For example, create table test in the obs://mrs-word001/table/ directory.

      create table test(id int) location 'obs://mrs-word001/table/';

    3. Run the following command to query all tables. If table test is displayed in the command output, OBS access is successful.

      show tables;

      Figure 3 Returned table name
    4. Press Ctrl+C to exit the Spark beeline.
  6. Verify that Presto can access OBS.
    • For normal clusters with Kerberos authentication disabled
      1. Run the following command to connect to the client:

        presto_cli.sh

      2. On the Presto client, run the following statement to create a schema and set location to an OBS path:

        CREATE SCHEMA hive.demo01 WITH (location = 'obs://mrs-word001/presto-demo002/');

      3. Create a table in the schema. The table data is stored in the OBS file system. The following is an example.

        CREATE TABLE hive.demo.demo_table WITH (format = 'ORC') AS SELECT * FROM tpch.sf1.customer;

        Figure 4 Return result
      4. Run exit to exit the client.
    • For security clusters with Kerberos authentication enabled
      1. Log in to MRS Manager and create a role with the Hive Admin Privilege permissions, for example, prestorole. For details about how to create a role, see Creating a Role.
      2. Create a user that belongs to the Presto and Hive groups and bind the role created in 6.a to the user, for example, presto001. For details about how to create a user, see Creating a User.
      3. Authenticate the current user.

        kinit presto001

      4. Download the user credential.
        1. For MRS 3.x earlier, on MRS Manager, choose System > Manage User. In the row of the new user, choose More > Download Authentication Credential.
          Figure 5 Downloading the Presto user authentication credential
        2. On FusionInsight Manager for MRS 3.x or later,, choose System > Permission > User. In the row that contains the newly added user, click More > Download Authentication Credential.
          Figure 6 Downloading the Presto user authentication credential
      5. Decompress the downloaded user credential file, and save the obtained krb5.conf and user.keytab files to the client directory, for example, /opt/Bigdata/client/Presto/.
      6. Run the following command to obtain a user principal:

        klist -kt /opt/Bigdata/client/Presto/user.keytab

      7. For clusters with Kerberos authentication enabled, run the following command to connect to the Presto Server of the cluster:

        presto_cli.sh --krb5-config-path {krb5.conf file path} --krb5-principal {user principal} --krb5-keytab-path {user.keytab file path} --user {presto username}

        • krb5.conf file path: Replace it with the file path set in 6.e, for example, /opt/Bigdata/client/Presto/krb5.conf.
        • user.keytab file path: Replace it with the file path set in 6.e, for example, /opt/Bigdata/client/Presto/user.keytab.
        • user principal: Replace it with the result returned in 6.f.
        • presto username: Replace it with the name of the user created in 6.b, for example, presto001.

        Example: presto_cli.sh --krb5-config-path /opt/Bigdata/client/Presto/krb5.conf --krb5-principal prest001@xxx_xxx_xxx_xxx.COM --krb5-keytab-path /opt/Bigdata/client/Presto/user.keytab --user presto001

      8. On the Presto client, run the following statement to create a schema and set location to an OBS path:

        CREATE SCHEMA hive.demo01 WITH (location = 'obs://mrs-word001/presto-demo002/');

      9. Create a table in the schema. The table data is stored in the OBS file system. The following is an example.

        CREATE TABLE hive.demo01.demo_table WITH (format = 'ORC') AS SELECT * FROM tpch.sf1.customer;

        Figure 7 Return result
      10. Run exit to exit the client.
  7. Verify that Flink can access OBS.
    1. On the Dashboard page, click Synchronize on the right of IAM User Sync to synchronize IAM users.
    2. After user synchronization is complete, choose Jobs > Create on the cluster details page to create a Flink job. In Parameters, enter parameters in --input <Job input path> --output <Job output path> format. You can click OBS to select a job input path, and enter a job output path that does not exist, for example, obs://mrs-word001/output/. See Figure 8.
      Figure 8 Creating a Flink job
    3. On OBS Console, go to the output path specified during job creation. If the output directory is automatically created and contains the job execution results, OBS access is successful.
      Figure 9 Flink job execution result

Reference

For details about how to control permissions to access OBS, see Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback