Using Clusters with Kerberos Authentication Enabled

This section instructs you to use security clusters and run MapReduce, Spark, and Hive programs.

You can get started by reading the following topics:

Creating a Security Cluster and Logging In to Manager
Creating a Role and a User
Running a MapReduce Program
Running a Spark Program
Running a Hive Program

Creating a Security Cluster and Logging In to Manager

Create a security cluster. Enable Kerberos Authentication, configure Password, and confirm the password. This password is used to log in to Manager. Keep it secure.
Log in to the MRS console.
In the navigation pane on the left, choose Active Clusters and click the target cluster name on the right to access the cluster details page.
Click Access Manager on the right of MRS Manager to log in to Manager.
- If you have bound an EIP when creating the cluster, perform the following operations:
  1. Add a security group rule. By default, your public IP address used for accessing port 9022 is filled in the rule. If you want to view, modify, or delete a security group rule, click Manage Security Group Rule.
    It is normal that the automatically generated public IP address is different from your local IP address and no action is required.
    
    If port 9022 is a Knox port, you need to enable the permission to access port 9022 of Knox for accessing Manager.
  2. Select I confirm that xx.xx.xx.xx is a trusted public IP address and MRS Manager can be accessed using this IP address.
- If you have not bound an EIP when creating the cluster, perform the following operations:
  1. Select an EIP from the drop-down list or click Manage EIP to create one.
  2. Add a security group rule. By default, your public IP address used for accessing port 9022 is filled in the rule. If you want to view, modify, or delete a security group rule, click Manage Security Group Rule.
    It is normal that the automatically generated public IP address is different from the local IP address and no action is required.
    
    If port 9022 is a Knox port, you need to enable the permission of port 9022 to access Knox for accessing MRS Manager.
  3. Select I confirm that xx.xx.xx.xx is a trusted public IP address and MRS Manager can be accessed using this IP address.
Click OK. The Manager login page is displayed. To assign other users the permission to access Manager, add the IP addresses as trusted ones.

Before accessing Manager, ensure that the EIP can be pinged. If the ping operation fails, contact technical support.
Enter the default username admin and the password you set when creating the cluster, and click Log In.

Creating a Role and a User

For clusters with Kerberos authentication enabled, perform the following steps to create a user and assign permissions to the user to run programs.

On Manager, choose System > Permission > Role.
Click Create Role.

Specify the following information:
- Enter a role name, for example, mrrole.
- In Configure Resource Permission, select the cluster to be operated, choose Yarn > Scheduler Queue > root, and select Submit and Admin in the Permission column. After you finish configuration, do not click OK but click the name of the target cluster shown in the following figure and then configure other permissions.
- Choose HBase > HBase Scope. Locate the row that contains global, and select create, read, write, and execute in the Permission column. After you finish configuration, do not click OK but click the name of the target cluster shown in the following figure and then configure other permissions.
- Choose HDFS > File System > hdfs://hacluster/ and select Read, Write, and Execute in the Permission column. After you finish configuration, do not click OK but click the name of the target cluster shown in the following figure and then configure other permissions.
- Choose Hive > Hive Read Write Privileges, select Select, Delete, Insert, and Create in the Permission column, and click OK.
Choose System. In the navigation pane on the left, choose Permission > User Group > Create User Group to create a user group for the sample project, for example, mrgroup.
Choose System. In the navigation pane on the left, choose Permission > User > Create to create a user for the sample project.
- Enter a username, for example, test. If you want to run a Hive program, enter hiveuser in Username.
- Set User Type to Human-Machine.
- Enter a password. This password will be used when you run the program.
- In User Group, add mrgroup and supergroup.
- Set Primary Group to supergroup and bind the mrrole role to obtain the permission.
  Click OK.
Choose System. In the navigation pane on the left, choose Permission > User, locate the row where user test locates, and select Download Authentication Credential from the More drop-down list. Save the downloaded package and decompress it to obtain the keytab and krb5.conf files.

Running a MapReduce Program

This section describes how to run a MapReduce program in security cluster mode.

Prerequisites

You have compiled the program and prepared data files, for example, mapreduce-examples-1.0.jar, input_data1.txt, and input_data2.txt..

Procedure

Use a remote login software (for example, MobaXterm) to log in to the master node of the security cluster using SSH (using the EIP).
After the login is successful, run the following commands to create the test folder in the /opt/Bigdata/client directory and create the conf folder in the test directory:
```
cd /opt/Bigdata/client
mkdir test
cd test
mkdir conf
```
Use an upload tool (for example, WinSCP) to copy mapreduce-examples-1.0.jar, input_data1.txt, and input_data2.txt to the test directory, and copy the keytab and krb5.conf files obtained in 5 in Creating Roles and Users to the conf directory.
Run the following commands to configure environment variables and authenticate the created user, for example, test:
```
cd /opt/Bigdata/client
source bigdata_env
export YARN_USER_CLASSPATH=/opt/Bigdata/client/test/conf/
kinit test
```
Enter the password as prompted. If no error message is displayed (you need to change the password as prompted upon the first login), Kerberos authentication is complete.

Run the following commands to import data to the HDFS:

cd test
hdfs dfs -mkdir /tmp/input
hdfs dfs -put input_data* /tmp/input

Run the following commands to run the program:
```
yarn jar mapreduce-examples-1.0.jar com.huawei.bigdata.mapreduce.examples.FemaleInfoCollector /tmp/input /tmp/mapreduce_output
```
In the preceding commands:

/tmp/input indicates the input path in the HDFS.

/tmp/mapreduce_output indicates the output path in the HDFS. This directory must not exist. Otherwise, an error will be reported.
After the program is executed successfully, run the hdfs dfs -ls /tmp/mapreduce_output command. The following command output is displayed.

Figure 1 Program running result

Running a Spark Program

This section describes how to run a Spark program in security cluster mode.

Prerequisites

You have compiled the program and prepared data files, for example, FemaleInfoCollection.jar, input_data1.txt, and input_data2.txt..

Procedure

Use a remote login software (for example, MobaXterm) to log in to the master node of the security cluster using SSH (using the EIP).
After the login is successful, run the following commands to create the test folder in the /opt/Bigdata/client directory and create the conf folder in the test directory:
```
cd /opt/Bigdata/client
mkdir test
cd test
mkdir conf
```
Use an upload tool (for example, WinSCP) to copy FemaleInfoCollection.jar, input_data1.txt, and input_data2.txt to the test directory, and copy the keytab and krb5.conf files obtained in 5 in section Creating Roles and Users to the conf directory.
Run the following commands to configure environment variables and authenticate the created user, for example, test:
```
cd /opt/Bigdata/client
source bigdata_env
export YARN_USER_CLASSPATH=/opt/Bigdata/client/test/conf/
kinit test
```
Enter the password as prompted. If no error message is displayed, Kerberos authentication is complete.

Run the following commands to import data to the HDFS:

cd test
hdfs dfs -mkdir /tmp/input
hdfs dfs -put input_data* /tmp/input

Run the following commands to run the program:

cd /opt/Bigdata/client/Spark/spark
bin/spark-submit --class com.huawei.bigdata.spark.examples.FemaleInfoCollection --master yarn-client /opt/Bigdata/client/test/FemaleInfoCollection-1.0.jar /tmp/input

After the program is run successfully, the following information is displayed.

Figure 2 Program running result

Running a Hive Program

This section describes how to run a Hive program in security cluster mode.

Prerequisites

You have compiled the program and prepared data files, for example, hive-examples-1.0.jar, input_data1.txt, and input_data2.txt..

Procedure

Use a remote login software (for example, MobaXterm) to log in to the master node of the security cluster using SSH (using the EIP).
After the login is successful, run the following commands to create the test folder in the /opt/Bigdata/client directory and create the conf folder in the test directory:
```
cd /opt/Bigdata/client
mkdir test
cd test
mkdir conf
```
Use an upload tool (for example, WinSCP) to copy FemaleInfoCollection.jar, input_data1.txt, and input_data2.txt to the test directory, and copy the keytab and krb5.conf files obtained in 5 in section Creating Roles and Users to the conf directory.
Run the following commands to configure environment variables and authenticate the created user, for example, test:
```
cd /opt/Bigdata/client
source bigdata_env
export YARN_USER_CLASSPATH=/opt/Bigdata/client/test/conf/
kinit test
```
Enter the password as prompted. If no error message is displayed, Kerberos authentication is complete.

Run the following command to run the program:

chmod +x /opt/hive_examples -R   cd /opt/hive_examples   java -cp .:hive-examples-1.0.jar:/opt/hive_examples/conf:/opt/Bigdata/client/Hive/Beeline/lib/*:/opt/Bigdata/client/HDFS/hadoop/lib/* com.huawei.bigdata.hive.example.ExampleMain