Updated on 2024-10-23 GMT+08:00

Configuring Security Authentication for Spark Applications

Scenario

In a safe cluster environment, the communication among components cannot be a simple communication. Components must be authorized by each other before the communication to ensure the security of the communication.

When users are developing the Spark application, the Spark is required to interwork with Hadoop and HBase in certain scenarios. Therefore, security authentication codes must be written into the Spark application to ensure that the Spark application can run properly.

Three security authentication modes are available:

  • Authentication by running command lines:

    Before submitting the Spark application for running or using the CLI to log in to the Spark SQL, run the following command in the Spark client to obtain authentication:

    kinit component service user

  • Authentication by configuring parameters

    You can use any of the following methods to specify the security authentication information.

    • Configure the spark.kerberos.keytab and spark.kerberos.principal parameters in the spark-defaults.conf file on the client.
    • Add the following parameters to the bin/spark-submit command:

      --conf spark.kerberos.keytab=<keytab file path> --conf spark.kerberos.principal=<Principal account>

    • Add the following parameter to the bin/spark-submit command:

      --keytab <keytab file path> --principal <Principal account>

  • Authentication by adding codes:

    Authenticate in the application by obtaining principal and keytab files of the client.

Table 1 lists the authentication method for the sample code in security cluster environment.

Table 1 Authentication methods

Sample Code

Mode

Authentication Method

sparknormal-examples

yarn-client

Command authentication, configuration authentication, or code authentication

yarn-cluster

By running command lines or configuring parameters.

sparksecurity-examples

(containing authentication code)

yarn-client

By adding code.

yarn-cluster

Not supported.

  • In the yarn cluster mode, the security authentication is not supported in the Spark projects. The security authentication needs to be completed before the application is started.
  • For the safety authentication codes of the Python sample project that are not provided, configure the safety authentication parameter in the command that runs the application.

Safety Security Code (Java)

The safety authentication of the sample codes is completed by invoking the LoginUtil class. For the secure login process, see the chapter Security Authentication Interfaces.

In the Spark sample project code, different sample projects use different authentication codes which are basic safety authentication and the basic safety authentication with the ZooKeeper authentication. The example authentication parameters used in the sample project are displayed as Table 2. Modify the parameter value as required.

Table 2 Parameter Description

Parameter

Example Parameter Value

Description

userPrincipal

sparkuser

User Principal for authentication. Use the account prepared in Preparing User Information for Cluster Authentication.

userKeytabPath

/opt/FIclient/user.keytab

Keytab file used for authentication. Copy the user.keytab file of the prepared developer account to the directory indicated by the example parameter value.

ZKServerPrincipal

zookeeper/hadoop.<System domain name>

The principal of the server in ZooKeeper. Contact the administrator to obtain the account.

The following code snippet belongs to the main method of the FemaleInfoCollection class in the com.huawei.bigdata.spark.examples package.

  • Basic safety authentication:
    Spark Core and Spark SQL programs needs only the basic safety authentication codes because both of them do not need to access the HBase or ZooKeeper. Add the following codes in the program, and configure the safety authentication parameter as required.
    String userPrincipal = "sparkuser";
    String userKeytabPath = "/opt/FIclient/user.keytab";
    String krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf";
    Configuration hadoopConf = new Configuration();
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);
  • Basic safety authentication with ZooKeeper Authentication:

    Because the sample projects "Spark Streaming", "access Spark SQL with JDBC", and "Spark on HBase" require not only the basic safety authentication, but also the Principal of the ZooKeeper server to complete the safety authentication. Add the following codes in the project, and configure the safety authentication parameter as required.

    String userPrincipal = "sparkuser";
    String userKeytabPath = "/opt/FIclient/user.keytab";
    String krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf";
    String ZKServerPrincipal = "zookeeper/hadoop.< System domain name >";
    
    String ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME = "Client";
    String ZOOKEEPER_SERVER_PRINCIPAL_KEY = "zookeeper.server.principal";
    
    Configuration hadoopConf = new Configuration();
    LoginUtil.setJaasConf(ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME, userPrincipal, userKeytabPath);
    LoginUtil.setZookeeperServerPrincipal(ZOOKEEPER_SERVER_PRINCIPAL_KEY, ZKServerPrincipal);
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);

Safety Authentication Codes for the Communication of Spark and Zookeeper (Scala)

Currently the safety authentication of the sample codes is completed by invoking the LoginUtil class. For the secure login process, see the chapter about the unified authentication.

In the Spark sample project code, different sample projects use different authentication codes, which are basic safety authentication and basic safety authentication with ZooKeeper authentication. The example authentication parameters used in the sample project are displayed as Table 3. Modify the parameter value as required.

Table 3 Parameter Description

Parameter

Example Parameter Value

Description

userPrincipal

sparkuser

User Principal for authentication. Use the account prepared in Preparing User Information for Cluster Authentication.

userKeytabPath

/opt/FIclient/user.keytab

Keytab file used for authentication. Copy the user.keytab file of the prepared developer account to the directory indicated by the example parameter value.

ZKServerPrincipal

zookeeper/hadoop.<System domain name>

The principal of the server in ZooKeeper. Contact the administrator to obtain the account.

  • Basic safety authentication:
    Spark Core and Spark SQL programs needs only the basic safety authentication codes because both of them do not need to access the HBase or ZooKeeper. Add the following codes in the program, and configure the safety authentication parameter as required.
    val userPrincipal = "sparkuser"
    val userKeytabPath = "/opt/FIclient/user.keytab"
    val krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf"
    val hadoopConf: Configuration  = new Configuration()
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);
  • Basic safety authentication with ZooKeeper Authentication:

    Because the sample projects "Spark Streaming", "access Spark SQL with JDBC", and "Spark on HBase" require not only the basic safety authentication, but also the Principal of the ZooKeeper server to complete the safety authentication. Add the following codes in the project, and configure the safety authentication parameter as required.

    val userPrincipal = "sparkuser"
    val userKeytabPath = "/opt/FIclient/user.keytab"
    val krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf"
    val ZKServerPrincipal = "zookeeper/hadoop.<system domain name>"
    
    val ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME: String = "Client"
    val ZOOKEEPER_SERVER_PRINCIPAL_KEY: String = "zookeeper.server.principal"
    val hadoopConf: Configuration  = new Configuration();
    LoginUtil.setJaasConf(ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME, userPrincipal, userKeytabPath)
    LoginUtil.setZookeeperServerPrincipal(ZOOKEEPER_SERVER_PRINCIPAL_KEY, ZKServerPrincipal)
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);