Updated on 2024-08-12 GMT+08:00

Scheduling Spark2x to Access HBase and Hive Using Oozie

Prerequisites

Prerequisites in Downloading and Importing Sample Projects have been met.

Preparing a Development Environment

  1. Obtain the OozieSparkHBaseExample and OozieSparkHiveExample sample projects from the sample project folder ooziesecurity-examples in the src\oozie-examples directory where the sample code is decompressed. For details, see Obtaining the MRS Application Development Sample Project.
  2. Copy the keytab file user.keytab and user authentication credential file krb5.conf obtained in Preparing a Developer Account to the \src\main\resources directory of the OozieSparkHBaseExample and OozieSparkHiveExample sample projects.
  3. Modify the parameters in each sample project. For details, see Table 1.

    Table 1 Parameters to be modified

    File Name

    Parameter

    Value

    Example Value

    src\main\resources\application.properties

    submit_user

    User who submits a job.

    developuser

    oozie_url_default

    https://Oozie service IP address:21003/oozie/

    https://10.10.10.176:21003/oozie/

    src\main\resources\job.properties

    userName

    User who submits a job.

    developuser

    examplesRoot

    Use the default value or change the value based on the site requirements.

    myjobs

    oozie.wf.application.path

    Use the default value or change the value based on the site requirements.

    ${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x

    NOTICE:

    Ensure that the path is the same as the path with the <jar> and <spark-opts> tags in the src\main\resources\workflow.xml file.

    src\main\resources\workflow.xml

    <jar> </jar>

    Change OoizeSparkHBase-1.0.jar to the actual JAR package name.

    <jar>${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x/lib/OoizeSparkHBase-1.0.jar</jar>

    Go to the root directory of the project, for example, D:\sample_project\src\oozie-examples\ooziesecurity-examples\OozieSparkHBaseExample, and run the mvn clean package -DskipTests command. After the operation is successful, the package is in the target directory.

  4. Create the following folders on the HDFS client in the configured path:

    hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/lib

    hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hbase

    hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hive

  5. Upload the files listed in Table 2 to the corresponding path.

    Table 2 Files to be uploaded

    Initial File Path

    File

    Destination Path

    Spark client directory (for example, /opt/client/Spark2x/spark/conf)

    hive-site.xml

    /user/developuser/myjobs/apps/spark2x directory in the HDFS.

    hbase-site.xml

    Keytab file obtained in Preparing a Developer Account

    user.keytab

    krb5.conf

    Spark client directory (for example, /opt/client/Spark2x/spark/jars)

    JAR package

    Share HDFS /user/oozie/share/lib/spark2x/ directory of Oozie.

    JAR package of the sample projects to be used, for example, OoizeSparkHBase-1.0.jar

    JAR package

    /user/developuser/myjobs/apps/spark2x/lib/ directory in the HDFS.

    OozieSparkHiveExample sample project directory src\main\resources

    workflow.xml

    /user/developuser/myjobs/apps/spark2x/hive/ directory in the HDFS.

    NOTE:

    Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.

    OozieSparkHBaseExample sample project directory src\main\resources

    workflow.xml

    /user/developuser/myjobs/apps/spark2x/hbase/ directory in the HDFS.

    NOTE:

    Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.

  6. Change the value of hive.security.authenticator.manager in the hive-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of HDFS from org.apache.hadoop.hive.ql.security.SessionStateUserMSGroupAuthenticator to org.apache.hadoop.hive.ql.security.SessionStateUserGroupAuthenticator.
  7. If ZooKeeper SSL is enabled, add the following content to the hbase-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of the HDFS:

    <property>
    <name>HBASE_ZK_SSL_ENABLED</name>
    <value>true</value>
    </property>

  8. Run the following commands to create a Hive table:

    You can enter the following SQL statements in the Hive panel on the Hue UI:

    CREATE DATABASE test;

    CREATE TABLE IF NOT EXISTS `test`.`usr` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

    CREATE TABLE IF NOT EXISTS `test`.`usr2` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

    INSERT INTO TABLE test.usr partition(country='CN') VALUES(1,'maxwell',45),(2,'minwell',30),(3,'mike',22);

    INSERT INTO TABLE test.usr partition(country='USA') VALUES(4,'minbin',35);

  9. Use HBase Shell to run the following commands to create an HBase table:

    create 'SparkHBase',{NAME=>'cf1'}

    put 'SparkHBase','01','cf1:name','Max'

    put 'SparkHBase','01','cf1:age','23'