Scheduling Spark2x to Access HBase and Hive Using Oozie
Prerequisites
Prerequisites in Downloading and Importing Sample Projects have been met.
Preparing a Development Environment
- Obtain the OozieSparkHBaseExample and OozieSparkHiveExample sample projects from the sample project folder ooziesecurity-examples in the src\oozie-examples directory where the sample code is decompressed. For details, see Obtaining Sample Projects from Huawei Mirrors.
- Copy the keytab file user.keytab and user authentication credential file krb5.conf obtained in Preparing the Developer Account to the \src\main\resources directory of the OozieSparkHBaseExample and OozieSparkHiveExample sample projects.
- Modify the parameters in each sample project. For details, see Table 1.
    
    Table 1 Parameters to be modified File Name Parameter Value Example Value src\main\resources\application.properties submit_user User who submits a job. developuser oozie_url_default https://Oozie service IP address:21003/oozie/ https://10.10.10.176:21003/oozie/ src\main\resources\job.properties userName User who submits a job. developuser examplesRoot Use the default value or change the value based on the site requirements. myjobs oozie.wf.application.path Use the default value or change the value based on the site requirements. ${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x NOTICE:Ensure that the path is the same as the path with the <jar> and <spark-opts> tags in the src\main\resources\workflow.xml file. src\main\resources\workflow.xml <jar> </jar> Change OoizeSparkHBase-1.0.jar to the actual JAR package name. <jar>${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x/lib/OoizeSparkHBase-1.0.jar</jar>   Go to the root directory of the project, for example, D:\sample_project\src\oozie-examples\ooziesecurity-examples\OozieSparkHBaseExample, and run the mvn clean package -DskipTests command. After the operation is successful, the package is in the target directory. 
- Create the following folders on the HDFS client in the configured path:
    
    hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/lib hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hbase hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hive 
- Upload the files listed in Table 2 to the corresponding path.
    
    Table 2 Files to be uploaded Initial File Path File Destination Path Spark client directory (for example, /opt/client/Spark2x/spark/conf) hive-site.xml /user/developuser/myjobs/apps/spark2x directory in the HDFS. hbase-site.xml Keytab file obtained in Preparing the Developer Account user.keytab krb5.conf Spark client directory (for example, /opt/client/Spark2x/spark/jars) JAR package Share HDFS /user/oozie/share/lib/spark2x/ directory of Oozie. JAR package of the sample projects to be used, for example, OoizeSparkHBase-1.0.jar JAR package /user/developuser/myjobs/apps/spark2x/lib/ directory in the HDFS. OozieSparkHiveExample sample project directory src\main\resources workflow.xml /user/developuser/myjobs/apps/spark2x/hive/ directory in the HDFS. NOTE:Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path. OozieSparkHBaseExample sample project directory src\main\resources workflow.xml /user/developuser/myjobs/apps/spark2x/hbase/ directory in the HDFS. NOTE:Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path. 
- Change the value of hive.security.authenticator.manager in the hive-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of HDFS from org.apache.hadoop.hive.ql.security.SessionStateUserMSGroupAuthenticator to org.apache.hadoop.hive.ql.security.SessionStateUserGroupAuthenticator.
- If ZooKeeper SSL is enabled, add the following content to the hbase-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of the HDFS:
    
    <property> <name>HBASE_ZK_SSL_ENABLED</name> <value>true</value> </property> 
- Run the following commands to create a Hive table:
    
    You can enter the following SQL statements in the Hive panel on the Hue UI:  CREATE DATABASE test; CREATE TABLE IF NOT EXISTS `test`.`usr` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET; CREATE TABLE IF NOT EXISTS `test`.`usr2` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET; INSERT INTO TABLE test.usr partition(country='CN') VALUES(1,'maxwell',45),(2,'minwell',30),(3,'mike',22); INSERT INTO TABLE test.usr partition(country='USA') VALUES(4,'minbin',35); 
- Use HBase Shell to run the following commands to create an HBase table:
    
    create 'SparkHBase',{NAME=>'cf1'} put 'SparkHBase','01','cf1:name','Max' put 'SparkHBase','01','cf1:age','23' 
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.
 
    