Scheduling Spark2x to Access HBase and Hive Using Oozie

Prerequisites

Prerequisites in Downloading and Importing Sample Projects have been met.

Preparing a Development Environment

Obtain the OozieMapReduceExample, OozieSparkHBaseExample, and OozieSparkHiveExample sample projects from the sample project folder oozienormal-examples in the src\oozie-examples directory where the sample code is decompressed. For details, see Obtaining the MRS Application Development Sample Project.

Modify the parameters in each sample project. For details, see Table 1.

**Table 1** Parameters to be modified
File Name	Parameter	Value	Example Value
src\main\resources\application.properties	submit_user	User who submits a job.	developuser
src\main\resources\application.properties	oozie_url_default	https://Oozie service IP address:21003/oozie/	https://10.10.10.233:21003/oozie/
src\main\resources\job.properties	userName	User who submits a job.	developuser
	examplesRoot	Use the default value or change the value based on the site requirements.	myjobs
	oozie.wf.application.path	Use the default value or change the value based on the site requirements.	${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x NOTICE: Ensure that the path is the same as the path with the <jar> and <spark-opts> tags in the src\main\resources\workflow.xml file.
src\main\resources\workflow.xml	<jar> </jar>	Change OoizeSparkHBase-1.0.jar to the actual JAR package name.	<jar>${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x/lib/OoizeSparkHBase-1.0.jar</jar>

Go to the root directory of the project, for example, D:\sample_project\src\oozie-examples\oozienormal-examples\OozieSparkHBaseExample, and run the mvn clean package -DskipTests command. After the operation is successful, the package is in the target directory.

Create the following folders on the HDFS client in the configured path:

/user/developuser/myjobs/apps/spark2x/lib

/user/developuser/myjobs/apps/spark2x/hbase

/user/developuser/myjobs/apps/spark2x/hive

Upload the files listed in Table 2 to the corresponding path.

**Table 2** Files to be uploaded
Initial File Path	File	Destination Path
Spark client directory (for example, /opt/client/Spark2x/spark/conf)	hive-site.xml	/user/developuser/myjobs/apps/spark2x directory in the HDFS.
	hbase-site.xml
Spark client directory (for example, /opt/client/Spark2x/spark/jars)	JAR package	Share HDFS /user/oozie/share/lib/spark2x directory of Oozie. NOTE: This file must be uploaded as user oozie. Run the su - oozie command to switch to user oozie. After the file is uploaded, restart the Oozie service.
JAR package of the sample projects to be used	JAR package	/user/developuser/myjobs/apps/spark2x/lib/ directory in the HDFS.
OozieSparkHiveExample sample project directory src\main\resources	workflow.xml	/user/developuser/myjobs/apps/spark2x/hive directory in the HDFS. NOTE: Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.
OozieSparkHBaseExample sample project directory src\main\resources	workflow.xml	/user/developuser/myjobs/apps/spark2x/hbase directory in the HDFS. NOTE: Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.

Change the value of hive.security.authenticator.manager in the hive-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of HDFS from org.apache.hadoop.hive.ql.security.SessionStateUserMSGroupAuthenticator to org.apache.hadoop.hive.ql.security.SessionStateUserGroupAuthenticator.
Run the following commands to create a Hive table:

Enter the following SQL statements in the Hive panel on the Hue UI:

CREATE DATABASE test;

CREATE TABLE IF NOT EXISTS `test`.`usr` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

CREATE TABLE IF NOT EXISTS `test`.`usr2` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

INSERT INTO TABLE test.usr partition(country='CN') VALUES(1,'maxwell',45),(2,'minwell',30),(3,'mike',22);

INSERT INTO TABLE test.usr partition(country='USA') VALUES(4,'minbin',35);
Use HBase Shell to run the following commands to create an HBase table:

create 'SparkHBase',{NAME=>'cf1'}

put 'SparkHBase','01','cf1:name','Max'

put 'SparkHBase','01','cf1:age','23'