Configuring a Storage-Compute Decoupled Cluster (Agency)
MRS allows you to store data in OBS and use an MRS cluster for data computing only. In this way, storage and compute are separated. You can create an IAM agency, which enables ECS to automatically obtain the temporary AK/SK to access OBS. This prevents the AK/SK from being exposed in the configuration file.
By binding an agency, ECSs or BMSs can manage some of your resources. Determine whether to configure an agency based on the actual service scenario.
MRS provides the following configuration modes for accessing OBS. You can select one of them. The agency mode is recommended.
- Bind an agency of the ECS type to an MRS cluster to access OBS, preventing the AK/SK from being exposed in the configuration file. For details, see the following part in this section.
- Configure the AK/SK in an MRS cluster. The AK/SK will be exposed in the configuration file in plaintext. Exercise caution when performing this operation. For details, see Configuring a Storage-Compute Decoupled Cluster (AK/SK).
This function is available for components Hadoop, Hive, Spark, Presto, and Flink in clusters of .
(Optional) Step 1: Create an ECS Agency with OBS Access Permissions
- MRS presets MRS_ECS_DEFAULT_AGENCY in the agency list of IAM so that you can select this agency when creating a cluster. This agency has the OBSOperateAccess permission and the CESFullAccess (only available for users who have enabled fine-grained policies), CES Administrator, and KMS Administrator permissions in the region where the cluster is located. Do not modify MRS_ECS_DEFAULT_AGENCY on IAM.
- If you want to use the preset agency, skip the step for creating an agency. If you want to use a custom agency, perform the following steps to create an agency. (To create or modify an agency, you must have the Security Administrator permission.) If you need fine-grained permission control on specified paths in the OBS file system, see Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS and create custom role policies.
- Log in to the management console.
- Choose Service List > Management & Governance > Identity and Access Management.
- Choose Agencies. On the displayed page, click Create Agency.
- Enter an agency name, for example, mrs_ecs_obs.
- Set Agency Type to Cloud service and select ECS BMS to authorize ECS or BMS to invoke OBS.
- Set Validity Period to Unlimited and click Next.
- On the displayed page, search for the OBS OperateAccess and select it.
- Click Next. On the page that is displayed, select the desired scope for the permissions you selected. By default, All resources is selected. Click Show More, select Global resources, and click OK.
- In the dialog box that is displayed, click OK to start authorization. After the message "Authorization successful." is displayed, click Finish. The agency is successfully created.
Step 2: Create a Cluster with Storage and Compute Separated
You can configure an agency when creating a cluster or bind an agency to an existing cluster to separate storage and compute. This section uses a cluster with Kerberos authentication enabled as an example.
Configuring an agency when creating a cluster:
- Click the Custom Config tab.
- On the Custom Config tab page, set software parameters.
- Region: Select a region as required.
- Cluster Name: You can use the default name. However, you are advised to include a project name abbreviation or date for consolidated memory and easy distinguishing.
- Cluster Version: Select a cluster version.
- Cluster Type: Select Analysis cluster or Hybrid cluster and select all components.
- Metadata: Select Local.
- Click Next and set hardware parameters.
- AZ: Use the default value.
- VPC: Use the default value.
- Subnet: Use the default value.
- Security Group: Use the default value.
- EIP: Use the default value.
- Cluster Node: Select the number of cluster nodes and node specifications based on site requirements.
- Click Next and set related parameters.
- Kerberos Authentication: This function is enabled by default. You can enable or disable it.
- Username: The default username is admin, which is used to log in to MRS Manager.
- Password: Set a password for user admin.
- Confirm Password: Enter the password of user admin again.
- Login Mode: Select a method for logging in to ECSs. In this example, select Password.
- Username: The default username is root, which is used to remotely log in to ECSs.
- Password: Set a password for user root.
- Confirm Password: Enter the password of user root again.
- In this example, configure an agency and leave other parameters blank. For details about how to configure other parameters, see Advanced Options.
Agency: Select the agency created in (Optional) Step 1: Create an ECS Agency with OBS Access Permissions or MRS_ECS_DEFAULT_AGENCY preset in IAM.
- Select the check box for secure communications. For details, see Communication Security Authorization.
- Click Apply Now and wait until the cluster is created.
If Kerberos authentication is enabled for a cluster, check whether Kerberos authentication is required. If yes, click Continue. If no, click Back to disable Kerberos authentication and then create a cluster.
Configuring an agency for an existing cluster:
- Log in to the MRS management console. In the left navigation pane, choose Clusters > Active Clusters.
- Click the name of the cluster to enter its details page.
- On the Dashboard page, click Synchronize on the right of IAM User Sync to synchronize IAM users.
- On the Dashboard tab page, click Manage Agency on the right side of Agency to select an agency and click OK to bind it. Alternatively, click Create Agency to go to the IAM console to create an agency and select it.
Step 3: Create an OBS File System for Storing Data
In storage-compute decoupled scenarios, the OBS parallel file system must be used to store data. The cluster performance will be significantly affected if common object buckets are used.
- Log in to the OBS Console.
- Choose Parallel File System > Create Parallel File System.
- Enter the file system name, for example, mrs-word001.
Set other parameters as required.
- Click Create Now.
- In the parallel file system list on the OBS console, click the file system name to go to the details page.
- In the navigation pane, choose Files and create the program and input folders.
- program: Upload the program package to this folder.
- input: Upload the input data to this folder.
Step 4: Access the OBS File System
- Log in to a Master node as user root. For details, see Logging In to an ECS.
- Run the following command to set the environment variables:
- Verify that Hadoop can access OBS.
- View the list of files in the file system mrs-word001.
hadoop fs -ls obs://mrs-word001/
- Check whether the file list is returned. If it is returned, OBS access is successful.
Figure 1 Returned file list
- View the list of files in the file system mrs-word001.
- Verify that Hive can access OBS.
- If Kerberos authentication has been enabled for the cluster, run the following command to authenticate the current user. The current user must have a permission to create Hive tables. If Kerberos authentication is disabled for the current cluster, skip this step.
Example: kinit hiveuser
- Run the client command of the Hive component.
- Access the OBS directory in the beeline. For example, run the following command to create a Hive table and specify that data is stored in the test_obs directory of the file system mrs-word001:
create table test_obs(a int, b string) row format delimited fields terminated by ',' stored as textfile location "obs://mrs-word001/test_obs";
- Run the following command to query all tables. If table test_obs is displayed in the command output, OBS access is successful.
show tables;
Figure 2 Returned table name
- Press Ctrl+C to exit the Hive beeline.
- If Kerberos authentication has been enabled for the cluster, run the following command to authenticate the current user. The current user must have a permission to create Hive tables. If Kerberos authentication is disabled for the current cluster, skip this step.
- Verify that Spark can access OBS.
- Run the client command of the Spark component.
- Access OBS in spark-beeline. For example, create table test in the obs://mrs-word001/table/ directory.
create table test(id int) location 'obs://mrs-word001/table/';
- Run the following command to query all tables. If table test is displayed in the command output, OBS access is successful.
show tables;
Figure 3 Returned table name
- Press Ctrl+C to exit the Spark beeline.
- Verify that Presto can access OBS.
- For normal clusters with Kerberos authentication disabled
- Run the following command to connect to the client:
- On the Presto client, run the following statement to create a schema and set location to an OBS path:
CREATE SCHEMA hive.demo WITH (location = 'obs://mrs-word001/presto-demo002/');
- Create a table in the schema. The table data is stored in the OBS file system. The following is an example.
CREATE TABLE hive.demo.demo_table WITH (format = 'ORC') AS SELECT * FROM tpch.sf1.customer;
Figure 4 Return result
- Run exit to exit the client.
- For security clusters with Kerberos authentication enabled
- Log in to MRS Manager and create a role with the Hive Admin Privilege permissions, for example, prestorole. For details about how to create a role, see Managing Roles.
- Create a user that belongs to the Presto and Hive groups and bind the role created in 6.a to the user, for example, presto001. For details about how to create a user, see Creating a User.
- Authenticate the current user.
- Download the user credential.
- On FusionInsight Manager, choose System > Permission > User. In the row that contains the newly added user, click More > Download Authentication Credential.
- Decompress the downloaded user credential file, and save the obtained krb5.conf and user.keytab files to the client directory, for example, /opt/Bigdata/client/Presto/.
- Run the following command to obtain a user principal:
- For clusters with Kerberos authentication enabled, run the following command to connect to the Presto Server of the cluster:
presto_cli.sh --krb5-config-path {krb5.conf file path} --krb5-principal {user principal} --krb5-keytab-path {user.keytab file path} --user {presto username}
- krb5.conf file path: Replace it with the file path set in 6.e, for example, /opt/Bigdata/client/Presto/krb5.conf.
- user.keytab file path: Replace it with the file path set in 6.e, for example, /opt/Bigdata/client/Presto/user.keytab.
- user principal: Replace it with the result returned in 6.f.
- presto username: Replace it with the name of the user created in 6.b, for example, presto001.
Example: presto_cli.sh --krb5-config-path /opt/Bigdata/client/Presto/krb5.conf --krb5-principal prest001@xxx_xxx_xxx_xxx.COM --krb5-keytab-path /opt/Bigdata/client/Presto/user.keytab --user presto001
- On the Presto client, run the following statement to create a schema and set location to an OBS path:
CREATE SCHEMA hive.demo01 WITH (location = 'obs://mrs-word001/presto-demo002/');
- Create a table in the schema. The table data is stored in the OBS file system. The following is an example.
CREATE TABLE hive.demo01.demo_table WITH (format = 'ORC') AS SELECT * FROM tpch.sf1.customer;
Figure 5 Return result
- Run exit to exit the client.
- For normal clusters with Kerberos authentication disabled
- Verify that Flink can access OBS.
- On the Dashboard page, click Synchronize on the right of IAM User Sync to synchronize IAM users.
- After user synchronization is complete, choose Jobs > Create on the cluster details page to create a Flink job. In Parameters, enter parameters in --input <Job input path> --output <Job output path> format. You can click OBS to select a job input path, and enter a job output path that does not exist, for example, obs://mrs-word001/output/.
- On OBS Console, go to the output path specified during job creation. If the output directory is automatically created and contains the job execution results, OBS access is successful.
Figure 6 Flink job execution result
Step 5: Configure a Lifecycle Rule
In MRS 3.2.0-LTS.1 and later versions, components prevent mis-deletion by default. That is, file data deleted by component users is not directly deleted but stored in the recycle bin directory in the OBS file system.
To save OBS space, you need to enable periodical deletion of file data from the OBS recycle bin by referring to Configuring the Policy for Clearing Component Data in the Recycle Bin.
Reference
For details about how to control permissions to access OBS, see Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot