Help Center/ MapReduce Service/ Component Operation Guide (LTS) (Ankara Region)/ Using Ranger/ Adding a Ranger Access Permission Policy for Spark
Updated on 2024-11-29 GMT+08:00

Adding a Ranger Access Permission Policy for Spark

Scenario

Ranger administrators can use Ranger to set permissions for Spark users.

  1. After Ranger authentication is enabled or disabled on Spark, you need to restart Spark.
  2. Download the client again or manually update the client configuration file Client installation directory/Spark/spark/conf/spark-defaults.conf.

    Enable Ranger: spark.ranger.plugin.authorization.enable=true

    Disable Ranger: spark.ranger.plugin.authorization.enable=false

  3. In Spark, spark-beeline (applications connected to JDBCServer) supports the Ranger IP address filtering policy (Policy Conditions in the Ranger permission policy), while spark-submit and spark-sql do not.

Prerequisites

  • The Ranger service has been installed and is running properly.
  • Ranger authentication of Hive has been enabled and Spark is restarted after Hive is restarted.
  • You have created users, user groups, or roles for which you want to configure permissions.
  • The created user has been added to the hive user group.

Procedure

  1. Log in to the Ranger web UI as the Ranger administrator rangeradmin. For details, see Logging In to the Ranger Web UI.
  2. On the home page, click the component plug-in name in the HADOOP SQL area, for example, Hive.

  3. On the Access tab page, click Add New Policy to add a Spark permission control policy.

  4. Configure the parameters listed in the table below based on the service demands.

    Table 1 Spark permission parameters

    Parameter

    Description

    Policy Name

    Policy name, which can be customized and must be unique in the service.

    Policy Conditions

    IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (*), for example, 192.168.1.10,192.168.1.20, or 192.168.1.*.

    Policy Label

    A label specified for the current policy. You can search for reports and filter policies based on labels.

    database

    Name of the Spark database to which the policy applies.

    The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.

    table

    Name of the Spark table to which the policy applies.

    To add a UDF-based policy, switch to UDF and enter the UDF name.

    The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.

    column

    Name of the column to which the policy applies. The value * indicates all columns.

    The Include policy applies to the current input object, and the Exclude policy applies to objects other than the current input object.

    Description

    Policy description.

    Audit Logging

    Whether to audit the policy.

    Allow Conditions

    Policy allowed condition. You can configure permissions and exceptions allowed by the policy.

    In the Select Role, Select Group, and Select User columns, select the role, user group, or user to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, and click Add Permissions to add the corresponding permission.

    • select: permission to query data
    • update: permission to update data
    • Create: permission to create data
    • Drop: permission to drop data
    • Alter: permission to alter data
    • Index: permission to index data
    • All: all permissions
    • Read: permission to read data
    • Write: permission to write data
    • Temporary UDF Admin: temporary UDF management permission
    • Select/Deselect All: Select or deselect all.

    To add multiple permission control rules, click .

    If users or user groups in the current condition need to manage this policy, select Delegate Admin. These users will become the agent administrators. The agent administrators can update and delete this policy and create sub-policies based on the original policy.

    Deny Conditions

    Policy rejection condition, which is used to configure the permissions and exceptions to be denied in the policy. The configuration method is similar to that of Allow Conditions.

    Table 2 Setting permissions

    Task

    Operation

    role admin operation

    1. On the home page, click Settings and choose Roles > Add New Role.
    2. Set Role Name to admin. In the Users area, click Select User and select a username.
    3. Click Add Users, select Is Role Admin in the row where the username is located, and click Save.
    NOTE:

    After being bound to the Hive administrator role, perform the following operations during each maintenance operation:

    1. Log in to the node where the Hive client is installed as the client installation user.
    2. Run the following command to configure environment variables:

      For example, if the Spark client installation directory is /opt/client, run source /opt/client/bigdata_env.

    3. Run the following command to perform user authentication:

      kinit SparkService user

    4. Run the following command to log in to the client tool:

      spark-beeline

    5. Run the following command to update the administrator permissions:

      set role admin;

    Creating a database table

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database. (If you want to create a database, enter the name of the database to be created or enter * to indicate a database with any name, and then select the name.) Enter and select the corresponding table name on the right of table and column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select Create.

    Deleting a table

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database. (If you want to delete a database, enter the name of the database to be created or enter * to indicate a database with any name, and then select the name.) Enter and select the corresponding table name on the right of table and column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select Drop.
      NOTE:

      For CarbonData tables, only the owner of the corresponding database or table can perform the drop operation.

    ALTER operation

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select Alter.

    LOAD operation

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select update.

    INSERT operation

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select update.
    5. The user also needs to have the submit-app permission of the Yarn task queue. By default, the Hadoop user group has the submit-app permission of all Yarn task queues. For details about how to load a network instance to a cloud connection, see Adding a Ranger Access Permission Policy for Yarn.

    GRANT operation

    1. Enter the policy name in Policy Name.
    2. Enter and select the corresponding database on the right of database, enter and select the corresponding table on the right of table, and enter and select the corresponding column name on the right of column. Wildcard characters (*) are supported.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Select Delegate Admin.

    ADD JAR operation

    1. Enter the policy name in Policy Name.
    2. Click database, and select global from the drop-down list. On the right of global, enter related information and select *.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select Temporary UDF Admin.

    VIEW and INDEX permissions

    1. Enter the policy name in Policy Name.
    2. On the right side of database, enter the database name and select the corresponding database. (If you want to delete a database, enter the database name and select *.) On the right side of table, enter a table name and select the view and index names. On the right side of column, enter a Hive column name, and select *.
    3. In the Allow Conditions area, select a user from the Select User drop-down list.
    4. Click Add Permissions and select permissions for the user as required.

    Operations on other user database tables

    1. Perform the preceding operations to add the corresponding permissions.
    2. Grant the read, write, and execution permissions on the HDFS paths of other user database tables to the current user. For details, see Adding a Ranger Access Permission Policy for HDFS.

    After Spark SQL access policy is added on Ranger, you need to add the corresponding path access policies in the HDFS access policy. Otherwise, data files cannot be accessed. For details, see Adding a Ranger Access Permission Policy for HDFS.

    • The global policy in the Ranger policy is only used to associate with the Temporary UDF Admin permission to control the upload of UDF packages.
    • When Ranger is used to control Spark SQL permissions, the empower syntax is not supported.
    • Ranger policies do not support local paths or HDFS paths containing spaces.

  5. Click Add to view the basic information about the policy in the policy list. After the policy takes effect, check whether the related permissions are normal.

    To disable a policy, click to edit the policy and set the policy to Disabled.

    If a policy is no longer used, click to delete it.

Data Masking of the Spark Table

Ranger supports data masking for Spark data. It can process the returned result of the select operation you performed to mask sensitive information.

  1. Log in to the Ranger WebUI and click the component plug-in name, for example, Hive, in the HADOOP SQL area on the home page.
  2. On the Masking tab page, click Add New Policy to add a Spark permission control policy.
  3. Configure the parameters listed in the table below based on the service demands.

    Table 3 Spark data masking parameters

    Parameter

    Description

    Policy Name

    Policy name, which can be customized and must be unique in the service.

    Policy Conditions

    IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (*), for example, 192.168.1.10,192.168.1.20, or 192.168.1.*.

    Policy Label

    A label specified for the current policy. You can search for reports and filter policies based on labels.

    Hive Database

    Name of the Spark database to which the current policy applies.

    Hive Table

    Name of the Spark table to which the current policy applies.

    Hive Column

    Name of the Spark column to which the current policy applies.

    Description

    Policy description.

    Audit Logging

    Whether to audit the policy.

    Mask Conditions

    In the Select Group and Select User columns, select the user group or user to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, then click Add Permissions, and select select.

    Click Select Masking Option and select a data masking policy.

    • Redact: Use x to mask all letters and 0 to mask all digits.
    • Partial mask: show last 4: Only the last four characters are displayed.
    • Partial mask: show first 4: Only the first four characters are displayed.
    • Hash: Perform hash calculation for data.
    • Nullify: Replace the original value with the NULL value.
    • Unmasked(retain original value): The original data is displayed.
    • Date: show only year: Only the year information is displayed.
    • Custom: You can use any valid Hive UDF (returns the same data type as the data type in the masked column) to customize the policy.

    To add a multi-column masking policy, click .

    Deny Conditions

    Policy rejection condition, which is used to configure the permissions and exceptions to be denied in the policy. The configuration method is similar to that of Allow Conditions.

Spark Row-Level Data Filtering

Ranger allows you to filter data at the row level when you perform the select operation on Spark data tables.

  1. Change the value of spark.ranger.plugin.rowfilter.enable to true on the server and client, respectively.

    • Server: Log in to FusionInsight Manager, choose Clusters > Services and click the Spark component. On the displayed page, click the Configurations tab and click the All Configurations tab. Search for spark.ranger.plugin.rowfilter.enable and change the value to true. Save the modifications, and restart the service.
    • Client: Log in to the Spark client node, go to the Client installation directory/Spark/spark/conf/spark-defaults.conf directory, and change the value of spark.ranger.plugin.rowfilter.enable to true.

  2. Log in to the Ranger WebUI and click the component plug-in name, for example, Hive, in the HADOOP SQL area on the home page.
  3. On the Row Level Filter tab page, click Add New Policy to add a row data filtering policy.
  4. Configure the parameters listed in the table below based on the service demands.

    Table 4 Parameters for filtering Spark row data

    Parameter

    Description

    Policy Name

    Policy name, which can be customized and must be unique in the service.

    Policy Conditions

    IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (*), for example, 192.168.1.10,192.168.1.20, or 192.168.1.*.

    Policy Label

    A label specified for the current policy. You can search for reports and filter policies based on labels.

    Hive Database

    Name of the Spark database to which the current policy applies.

    Hive Table

    Name of the Spark table to which the current policy applies.

    Description

    Policy description.

    Audit Logging

    Whether to audit the policy.

    Row Filter Conditions

    In the Select Role, Select Group, and Select User columns, select the object to which the permission is to be granted, click Add Conditions, add the IP address range to which the policy applies, then click Add Permissions, and select select.

    Click Row Level Filter and enter data filtering rules.

    For example, if you want to filter the data in the zhangsan row in the name column of table A, the filtering rule is name <>'zhangsan'. For more information, see the official Ranger document.

    To add more rules, click .

  5. Click Add to view the basic information about the policy in the policy list.
  6. After you perform the select operation on a table configured with a data masking policy on the Spark client, the system processes and displays the data.