Scheduling Spark2x to Access HBase and Hive Using Oozie

Prerequisites

Prerequisites in Downloading and Importing Sample Projects have been met.

Preparing a Development Environment

Obtain the OozieMapReduceExample, OozieSparkHBaseExample, and OozieSparkHiveExample sample projects from the sample project folder oozienormal-examples in the src\oozie-examples directory where the sample code is decompressed. For details, see Obtaining the MRS Application Development Sample Project.

Modify the parameters in each sample project. For details, see Table 1.

**Table 1** Parameters to be modified
File Name	Parameter	Value	Example Value
src\main\resources\application.properties	submit_user	User who submits a job.	developuser
src\main\resources\application.properties	oozie_url_default	https://Oozie service IP address:21003/oozie/	https://10.10.10.233:21003/oozie/
src\main\resources\job.properties	userName	User who submits a job.	developuser
	examplesRoot	Use the default value or change the value based on the site requirements.	myjobs
	oozie.wf.application.path	Use the default value or change the value based on the site requirements.	${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x NOTICE: Ensure that the path is the same as the path with the <jar> and <spark-opts> tags in the src\main\resources\workflow.xml file.
src\main\resources\workflow.xml	<jar> </jar>	Change OoizeSparkHBase-1.0.jar to the actual JAR package name.	<jar>${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x/lib/OoizeSparkHBase-1.0.jar</jar>

Go to the root directory of the project, for example, D:\sample_project\src\oozie-examples\oozienormal-examples\OozieSparkHBaseExample, and run the mvn clean package -DskipTests command. After the operation is successful, the package is in the target directory.

Create the following folders on the HDFS client in the configured path:

/user/developuser/myjobs/apps/spark2x/lib

/user/developuser/myjobs/apps/spark2x/hbase

/user/developuser/myjobs/apps/spark2x/hive

Upload the files listed in Table 2 to the corresponding path.

**Table 2** Files to be uploaded
Initial File Path	File	Destination Path
Spark client directory (for example, /opt/client/Spark2x/spark/conf)	hive-site.xml	/user/developuser/myjobs/apps/spark2x directory in the HDFS.
	hbase-site.xml
Spark client directory (for example, /opt/client/Spark2x/spark/jars)	JAR package	Share HDFS /user/oozie/share/lib/spark2x directory of Oozie. NOTE: This file must be uploaded as user oozie. Run the su - oozie command to switch to user oozie. After the file is uploaded, restart the Oozie service.
JAR package of the sample projects to be used	JAR package	/user/developuser/myjobs/apps/spark2x/lib/ directory in the HDFS.
OozieSparkHiveExample sample project directory src\main\resources	workflow.xml	/user/developuser/myjobs/apps/spark2x/hive directory in the HDFS. NOTE: Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.
OozieSparkHBaseExample sample project directory src\main\resources	workflow.xml	/user/developuser/myjobs/apps/spark2x/hbase directory in the HDFS. NOTE: Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.

Change the value of hive.security.authenticator.manager in the hive-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of HDFS from org.apache.hadoop.hive.ql.security.SessionStateUserMSGroupAuthenticator to org.apache.hadoop.hive.ql.security.SessionStateUserGroupAuthenticator.
Run the following commands to create a Hive table:

Enter the following SQL statements in the Hive panel on the Hue UI:

CREATE DATABASE test;

CREATE TABLE IF NOT EXISTS `test`.`usr` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

CREATE TABLE IF NOT EXISTS `test`.`usr2` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;

INSERT INTO TABLE test.usr partition(country='CN') VALUES(1,'maxwell',45),(2,'minwell',30),(3,'mike',22);

INSERT INTO TABLE test.usr partition(country='USA') VALUES(4,'minbin',35);
Use HBase Shell to run the following commands to create an HBase table:

create 'SparkHBase',{NAME=>'cf1'}

put 'SparkHBase','01','cf1:name','Max'

put 'SparkHBase','01','cf1:age','23'

Parent topic: Developing the Project

Previous topic: Sample Code

Next topic: Commissioning the Application

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.