Updated on 2024-08-16 GMT+08:00

Configuring Security Authentication for Spark Applications

Prerequisites

You have enabled Kerberos authentication for the MRS cluster.

Scenario Description

In a cluster with Kerberos authentication enabled, the components must be mutually authenticated before communicating with each other to ensure communication security.

In some cases, Spark needs to communicate with Hadoop and HBase when users develop Spark applications. Codes for security authentication need to be written into the Spark applications to ensure that the Spark applications can work properly.

Three security authentication modes are available:

  • Command authentication:

    Before running the Spark applications or using the CLI to connect to Spark SQL, run the following command on the Spark client for authentication:

    kinit Component service user

  • Configuration authentication:

    You can specify security authentication information in any of the following ways:

    1. In the spark-default.conf configuration file of the client, set spark.yarn.keytab and spark.yarn.principal to specify the authentication information.
    2. Add the following parameters to the bin/spark-submit command to specify authentication information.

      --conf spark.yarn.keytab=<keytab file path> --conf spark.yarn.principal=<Principal account>

    3. Add the following parameters to the bin/spark-submit command to specify authentication information.

      --keytab <keytab file path> --principal <Principal account>

  • Code authentication:

    Obtain the principal and keytab files of the client for authentication.

    The following table lists the authentication method used by the sample code in the cluster with Kerberos authentication enabled.

    Table 1 Security authentication method

    Sample Code

    Mode

    Security Authentication Method

    spark-examples-normal

    yarn-client

    Command authentication, configuration authentication, or code authentication

    yarn-cluster

    Either command authentication or configuration authentication

    spark-examples-security

    (including security authentication code)

    yarn-client

    Code authentication

    yarn-cluster

    Not supported

  • In the preceding table, the yarn-cluster mode does not support security authentication in the Spark project code, because authentication must be completed before the application is started.
  • The security authentication code of the Python sample project is not provided. You are advised to set security authentication parameters in the command for running applications.

Security Authentication Code (Java)

Currently, the sample code invokes the LoginUtil class for security authentication in a unified manner.

In the Spark sample project code, different sample projects use different authentication codes. Basic security authentication or ZooKeeper authentication is used. The following table describes the example authentication parameters used in the sample project. Change the parameter values based on the site requirements.

Table 2 Parameters

Parameter

Example Value

Description

userPrincipal

sparkuser

Principal account used for authentication. You can obtain the account from the administrator.

userKeytabPath

/opt/FIclient/user.keytab

Keytab file used for authentication. You can obtain the file from the administrator.

krb5ConfPath

/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf

Path and name of the krb5.conf file

ZKServerPrincipal

zookeeper/hadoop.hadoop.com

Principal of the ZooKeeper server. Contact the administrator to obtain the account.

  • Basic security authentication:

    Spark Core and Spark SQL applications do not need to access HBase or ZooKeeper. They need only the basic authentication code. Add the following code to the applications and set security authentication parameters as required:

    String userPrincipal = "sparkuser";
    String userKeytabPath = "/opt/FIclient/user.keytab";
    String krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf";
    Configuration hadoopConf = new Configuration();
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);
  • ZooKeeper authentication:

    The sample projects of Spark Streaming, accessing Spark SQL applications through JDBC, and Spark on HBase do not only require basic security authentication, but also need to add the principal of the ZooKeeper server to complete security authentication. Add the following code to the applications and set security authentication parameters as required:

    String userPrincipal = "sparkuser";
    String userKeytabPath = "/opt/FIclient/user.keytab";
    String krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf";
    String ZKServerPrincipal = "zookeeper/hadoop.hadoop.com";
    String ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME = "Client";
    String ZOOKEEPER_SERVER_PRINCIPAL_KEY = "zookeeper.server.principal";
    Configuration hadoopConf = new Configuration();
    LoginUtil.setJaasConf(ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME, userPrincipal, userKeytabPath);
    LoginUtil.setZookeeperServerPrincipal(ZOOKEEPER_SERVER_PRINCIPAL_KEY, ZKServerPrincipal);
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);

Security Authentication Code (Scala)

Currently, the sample code invokes the LoginUtil class for security authentication in a unified manner.

In the Spark sample project code, different sample projects use different authentication codes. Basic security authentication or ZooKeeper authentication is used. The following table describes the example authentication parameters used in the sample project. Change the parameter values based on the site requirements.

Table 3 Parameters

Parameter

Example Value

Description

userPrincipal

sparkuser

Principal account used for authentication. You can obtain the account from the administrator.

userKeytabPath

/opt/FIclient/user.keytab

Keytab file used for authentication. You can obtain the file from the administrator.

krb5ConfPath

/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf

Path and name of the krb5.conf file

ZKServerPrincipal

zookeeper/hadoop.hadoop.com

Principal of the ZooKeeper server. Contact the administrator to obtain the account.

  • Basic security authentication:

    Spark Core and Spark SQL applications do not need to access HBase or ZooKeeper. They need only the basic authentication code. Add the following code to the applications and set security authentication parameters as required:

    val userPrincipal = "sparkuser"
    val userKeytabPath = "/opt/FIclient/user.keytab"
    val krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf"
    val hadoopConf: Configuration  = new Configuration()
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);
  • ZooKeeper authentication:

    The sample projects of Spark Streaming, accessing Spark SQL applications through JDBC, and Spark on HBase do not only require basic security authentication, but also need to add the principal of the ZooKeeper server to complete security authentication. Add the following code to the applications and set security authentication parameters as required:

    val userPrincipal = "sparkuser"
    val userKeytabPath = "/opt/FIclient/user.keytab"
    val krb5ConfPath = "/opt/FIclient/KrbClient/kerberos/var/krb5kdc/krb5.conf"
    val ZKServerPrincipal = "zookeeper/hadoop.hadoop.com"
    val ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME: String = "Client"
    val ZOOKEEPER_SERVER_PRINCIPAL_KEY: String = "zookeeper.server.principal"
    val hadoopConf: Configuration  = new Configuration();
    LoginUtil.setJaasConf(ZOOKEEPER_DEFAULT_LOGIN_CONTEXT_NAME, userPrincipal, userKeytabPath)
    LoginUtil.setZookeeperServerPrincipal(ZOOKEEPER_SERVER_PRINCIPAL_KEY, ZKServerPrincipal)
    LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, hadoopConf);