Adding a Ranger Access Permission Policy for Spark2x

Scenario

Ranger administrators can use Ranger to set permissions for Spark2x users.

After Ranger authentication is enabled or disabled on Spark2x, you need to restart Spark2x.
Download the client again or manually update client configuration file spark-defaults.conf in the Client installation directory/Spark2x/spark/conf directory.
To enable Ranger authentication, set spark.ranger.plugin.authorization.enable to true and change the value of spark.sql.authorization.enabled to true.

Disable Ranger: spark.ranger.plugin.authorization.enable=false
Spark2x spark-beeline, which connects to JDBCServer, allows for Ranger IP address filtering policy (Policy Conditions in Ranger permission policy), but this feature is not available in spark-submit and spark-sql.
In MRS 3.3.0-LTS and later versions, the Spark2x component is renamed Spark, and the role names of this component are also changed. For example, JobHistory2x is changed to JobHistory. Refer to the descriptions and operations related to the component name and role names in the document based on your MRS version.

Prerequisites

The Ranger service has been installed and is running properly.
Ranger authentication of Hive has been enabled and the Spark Ranger authentication function is reactivated following the sequential reboot of Hive and then Spark. The Spark service is restarted after the Spark Ranger authentication function is reactivated.
You have created users, user groups, or roles for which you want to configure permissions.
The created user has been added to the hive user group.

Procedure

Log in to the Ranger web UI as the Ranger administrator rangeradmin. For details, see Logging In to the Ranger Web UI.
On the home page, click the component plug-in name in the HADOOP SQL area, for example, Hive.
On the Access tab page, click Add New Policy to add a Spark2x permission control policy.

Configure the parameters listed in the table below based on the service demands.

**Table 1** Spark2x permission parameters
Parameter	Description
Policy Name	Policy name, which can be customized and must be unique in the service.
Policy Conditions	IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (), for example, 192.168.1.10,192.168.1.20, or 192.168.1..
Policy Label	A label specified for the current policy. You can search for reports and filter policies based on labels.
database	Name of the Spark2x database to which the policy applies. The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.
table	Name of the Spark2x table to which the policy applies. To add a UDF-based policy, switch to UDF and enter the UDF name. The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.
column	Name of the column to which the policy applies. The value * indicates all columns. The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.
Description	Policy description.
Audit Logging	Whether to audit the policy.
Allow Conditions	Policy allowed condition. You can configure permissions and exceptions allowed by the policy. In the Select Role, Select Group, and Select User columns, select the role, user group, or user to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, and click Add Permissions to add the corresponding permission. select: permission to query data update: permission to update data Create: permission to create data Drop: permission to drop data Alter: permission to alter data Index: permission to index data All: all permissions Read: permission to read data Write: permission to write data Temporary UDF Admin: temporary UDF management permission Select/Deselect All: Select or deselect all. To add multiple permission control rules, click . If users or user groups in the current condition need to manage this policy, select Delegate Admin. These users will become the agent administrators. The agent administrators can update and delete this policy and create sub-policies based on the original policy.
Deny Conditions	Policy rejection condition, which is used to configure the permissions and exceptions to be denied in the policy. The configuration method is similar to that of Allow Conditions.

**Table 2** Setting permissions
Task	Operation
role admin operation	On the home page, click Settings and choose Roles > Add New Role. Set Role Name to admin. In the Users area, click Select User and select a username. Click Add Users, select Is Role Admin in the row where the username is located, and click Save. NOTE: After being bound to the Hive administrator role, perform the following operations during each maintenance operation: Log in to the node where the Hive client is installed as the client installation user. Run the following command to configure environment variables: For example, if the installation directory of the Spark2x client is /opt/client, run the source /opt/client/bigdata_env command. Run the following command to perform user authentication: kinit Spark2xService user Run the following command to log in to the client tool: spark-beeline Run the following command to update the administrator permissions: set role admin;
Creating a database table	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database. (If you want to create a database, enter the name of the database to be created or enter * to indicate a database with any name, and then select the name.) Enter and select the corresponding table name on the right of table and column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select Create.
Deleting a table	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database. (If you want to delete a database, enter the name of the database to be created or enter * to indicate a database with any name, and then select the name.) Enter and select the corresponding table name on the right of table and column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select Drop. NOTE: For CarbonData tables, only the owner of the corresponding database or table can perform the drop operation.
ALTER operation	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select Alter.
LOAD operation	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select update.
INSERT operation	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select update. The user also needs to have the submit-app permission of the Yarn task queue. By default, the Hadoop user group has the submit-app permission of all Yarn task queues. For details about how to load a network instance to a cloud connection, see Adding a Ranger Access Permission Policy for Yarn.
GRANT operation	Enter the policy name in Policy Name. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters () are supported. In the Allow Conditions* area, select a user from the Select User drop-down list. Select Delegate Admin.
ADD JAR operation	Enter the policy name in Policy Name. Click database, and select global from the drop-down list. On the right of global, enter related information and select . In the Allow Conditions* area, select a user from the Select User drop-down list. Click Add Permissions and select Temporary UDF Admin.
VIEW and INDEX permissions	Enter the policy name in Policy Name. On the right side of database, enter the database name and select the corresponding database. (If you want to delete a database, enter the database name and select .) On the right side of table, enter a table name and select the view and index names. On the right side of column, enter a Hive column name, and select . In the Allow Conditions area, select a user from the Select User drop-down list. Click Add Permissions and select permissions for the user as required.
Operations on other user database tables	Perform the preceding operations to add the corresponding permissions. Grant the read, write, and execution permissions on the HDFS paths of other user database tables to the current user. For details, see Adding a Ranger Access Permission Policy for HDFS.

After Spark SQL access policy is added on Ranger, you need to add the corresponding path access policies in the HDFS access policy. Otherwise, data files cannot be accessed. For details, see Adding a Ranger Access Permission Policy for HDFS.

The global policy in the Ranger policy is only used to associate with the Temporary UDF Admin permission to control the upload of UDF packages.
When Ranger is used to control Spark SQL permissions, the empower syntax is not supported.
Ranger policies do not support local paths or HDFS paths containing spaces.
With Ranger authentication activated, default permissions for related tables are required to perform operations on a view. To enable independent authentication for a view, bypassing table permissions, set parameter spark.ranger.plugin.viewaccesscontrol.enable to true.
- When submitting jobs in non-Spark-beeline mode, you need to set this parameter in the Client installation directory/Spark/spark/conf/spark-defaults.conf file.
- When submitting jobs in Spark-beeline mode, you need to set this parameter in the Client installation directory/Spark/spark/conf/spark-defaults.conf file. You need to choose JDBCserver > Customization to add this parameter as well.

Click Add to view the basic information about the policy in the policy list. After the policy takes effect, check whether the related permissions are normal.

To disable a policy, click to edit the policy and set the policy to Disabled.

If a policy is no longer used, click to delete it.

Data Masking of the Spark2x Table

Ranger supports data masking for Spark2x data. It can process the returned result of the select operation you performed to mask sensitive information.

Change the value of spark.ranger.plugin.masking.enable to true on the server and client, respectively.
- Server: Log in to FusionInsight Manager, choose Clusters > Services and click the Spark2x component. On the displayed page, click the Configurations tab and click the All Configurations tab. Search for spark.ranger.plugin.masking.enable and change the value to true. Save the modifications, and restart the service.
- Client: Log in to the Spark client node, go to the Client installation directory/Spark/spark/conf directory, find the spark-defaults.conf file, and change the value of spark.ranger.plugin.masking.enable to true.
Log in to the Ranger WebUI and click the component plug-in name, for example, Hive, in the HADOOP SQL area on the home page.
On the Masking tab page, click Add New Policy to add a Spark2x permission control policy.

Configure the parameters listed in the table below based on the service demands.

**Table 3** Spark2x data masking parameters
Parameter	Description
Policy Name	Policy name, which can be customized and must be unique in the service.
Policy Conditions	IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (), for example, 192.168.1.10,192.168.1.20, or 192.168.1..
Policy Label	A label specified for the current policy. You can search for reports and filter policies based on labels.
Hive Database	Name of the Spark2x database to which the current policy applies.
Hive Table	Name of the Spark2x table to which the current policy applies.
Hive Column	Name of the Spark2x column to which the current policy applies.
Description	Policy description.
Audit Logging	Whether to audit the policy.
Mask Conditions	In the Select Group and Select User columns, select the user group or user to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, then click Add Permissions, and select select. Click Select Masking Option and select a data masking policy. Redact: Use x to mask all letters and 0 to mask all digits. Partial mask: show last 4: Only the last four characters are displayed. Partial mask: show first 4: Only the first four characters are displayed. Hash: Perform hash calculation for data. Nullify: Replace the original value with the NULL value. Unmasked(retain original value): The original data is displayed. Date: show only year: Only the year information is displayed. Custom: You can use any valid Hive UDF (returns the same data type as the data type in the masked column) to customize the policy. To add a multi-column masking policy, click .
Deny Conditions	Policy rejection condition, which is used to configure the permissions and exceptions to be denied in the policy. The configuration method is similar to that of Allow Conditions.

Spark2x Row-Level Data Filtering

Ranger allows you to filter data at the row level when you perform the select operation on Spark2x data tables.

Change the value of spark.ranger.plugin.rowfilter.enable to true on the server and client, respectively.
- Server: Log in to FusionInsight Manager, choose Clusters > Services and click the Spark2x component. On the displayed page, click the Configurations tab and click the All Configurations tab. Search for spark.ranger.plugin.rowfilter.enable and change the value to true. Save the modifications, and restart the service.
- Client: Log in to the Spark client node, go to the Client installation directory/Spark/spark/conf directory, find the spark-defaults.conf file, and change the value of spark.ranger.plugin.rowfilter.enable to true.
Log in to the Ranger WebUI and click the component plug-in name, for example, Hive, in the HADOOP SQL area on the home page.
On the Row Level Filter tab page, click Add New Policy to add a row data filtering policy.

Configure the parameters listed in the table below based on the service demands.

**Table 4** Parameters for filtering Spark2x row data
Parameter	Description
Policy Name	Policy name, which can be customized and must be unique in the service.
Policy Conditions	IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (), for example, 192.168.1.10,192.168.1.20, or 192.168.1..
Policy Label	A label specified for the current policy. You can search for reports and filter policies based on labels.
Hive Database	Name of the Spark database to which the current policy applies. Note that only one database name can be used and the wildcard character (*) is not allowed.
Hive Table	Name of the Spark table to which the current policy applies. Note that only one table name can be specified and the wildcard character (*) is not permitted.
Description	Policy description.
Audit Logging	Whether to audit the policy.
Row Filter Conditions	In the Select Role, Select Group, and Select User columns, select the object to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, then click Add Permissions, and select select. Click Row Level Filter and enter data filtering rules. For example, if you want to filter the data in the zhangsan row in the name column of table A, the filtering rule is name <>'zhangsan'. For more information, see the official Ranger document. To add more rules, click .

Click Add to view the basic information about the policy in the policy list.
After you perform the select operation on a table configured with a data masking policy on the Spark2x client, the system processes and displays the data.