Scheduling Spark2x to Access HBase and Hive Using Oozie
Prerequisites
Prerequisites in Downloading and Importing Sample Projects have been met.
Preparing a Development Environment
- Obtain the OozieSparkHBaseExample and OozieSparkHiveExample sample projects from the sample project folder ooziesecurity-examples in the src\oozie-examples directory where the sample code is decompressed. For details, see Obtaining Sample Projects from Huawei Mirrors.
- Copy the keytab file user.keytab and user authentication credential file krb5.conf obtained in Preparing the Developer Account to the \src\main\resources directory of the OozieSparkHBaseExample and OozieSparkHiveExample sample projects.
- Modify the parameters in each sample project. For details, see Table 1.
Table 1 Parameters to be modified File Name
Parameter
Value
Example Value
src\main\resources\application.properties
submit_user
User who submits a job.
developuser
oozie_url_default
https://Oozie service IP address:21003/oozie/
https://10.10.10.176:21003/oozie/
src\main\resources\job.properties
userName
User who submits a job.
developuser
examplesRoot
Use the default value or change the value based on the site requirements.
myjobs
oozie.wf.application.path
Use the default value or change the value based on the site requirements.
${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x
NOTICE:Ensure that the path is the same as the path with the <jar> and <spark-opts> tags in the src\main\resources\workflow.xml file.
src\main\resources\workflow.xml
<jar> </jar>
Change OoizeSparkHBase-1.0.jar to the actual JAR package name.
<jar>${nameNode}/user/${userName}/${examplesRoot}/apps/spark2x/lib/OoizeSparkHBase-1.0.jar</jar>
Go to the root directory of the project, for example, D:\sample_project\src\oozie-examples\ooziesecurity-examples\OozieSparkHBaseExample, and run the mvn clean package -DskipTests command. After the operation is successful, the package is in the target directory.
- Create the following folders on the HDFS client in the configured path:
hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/lib
hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hbase
hdfs dfs -mkdir -p /user/developuser/myjobs/apps/spark2x/hive
- Upload the files listed in Table 2 to the corresponding path.
Table 2 Files to be uploaded Initial File Path
File
Destination Path
Spark client directory (for example, /opt/client/Spark2x/spark/conf)
hive-site.xml
/user/developuser/myjobs/apps/spark2x directory in the HDFS.
hbase-site.xml
Keytab file obtained in Preparing the Developer Account
user.keytab
krb5.conf
Spark client directory (for example, /opt/client/Spark2x/spark/jars)
JAR package
Share HDFS /user/oozie/share/lib/spark2x/ directory of Oozie.
JAR package of the sample projects to be used, for example, OoizeSparkHBase-1.0.jar
JAR package
/user/developuser/myjobs/apps/spark2x/lib/ directory in the HDFS.
OozieSparkHiveExample sample project directory src\main\resources
workflow.xml
/user/developuser/myjobs/apps/spark2x/hive/ directory in the HDFS.
NOTE:Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.
OozieSparkHBaseExample sample project directory src\main\resources
workflow.xml
/user/developuser/myjobs/apps/spark2x/hbase/ directory in the HDFS.
NOTE:Change the path of spark-archive-2x.zip in <spark-opts> based on the actual HDFS file path.
- Change the value of hive.security.authenticator.manager in the hive-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of HDFS from org.apache.hadoop.hive.ql.security.SessionStateUserMSGroupAuthenticator to org.apache.hadoop.hive.ql.security.SessionStateUserGroupAuthenticator.
- If ZooKeeper SSL is enabled, add the following content to the hbase-site.xml file in the /user/developuser/myjobs/apps/spark2x directory of the HDFS:
<property> <name>HBASE_ZK_SSL_ENABLED</name> <value>true</value> </property>
- Run the following commands to create a Hive table:
You can enter the following SQL statements in the Hive panel on the Hue UI:
CREATE DATABASE test;
CREATE TABLE IF NOT EXISTS `test`.`usr` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;
CREATE TABLE IF NOT EXISTS `test`.`usr2` (user_id int comment 'userID',user_name string comment 'userName',age int comment 'age')PARTITIONED BY (country string)STORED AS PARQUET;
INSERT INTO TABLE test.usr partition(country='CN') VALUES(1,'maxwell',45),(2,'minwell',30),(3,'mike',22);
INSERT INTO TABLE test.usr partition(country='USA') VALUES(4,'minbin',35);
- Use HBase Shell to run the following commands to create an HBase table:
create 'SparkHBase',{NAME=>'cf1'}
put 'SparkHBase','01','cf1:name','Max'
put 'SparkHBase','01','cf1:age','23'
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.