Updated on 2022-07-11 GMT+08:00

Instance

Scenario

Spark on HBase allows users to create HBase tables on the JDBCServer, store data to JDBCServer tables by running the HBase command, and perform other operations.

Data Preparation

  1. Ensure that the JDBCServer service has been started in multi-active instance HA mode and at least one instance provides connections for client. Create the /home/data file on every available instance nodes of the JDBCServer. The file content is as follows:

    Miranda,32
    Karlie,23
    Candice,27

  2. Ensure that the user whose starts the JDBCServer has the read and write permission on the file.
  3. Ensure that the hive-site.xml file exists in classpath, and set parameters required for the client connection. For details about parameters required for the JDBCServer, see JDBCServer Interface.

Development Idea

  1. Create a child table in the default database.
  2. Add data in /home/data to the child table.
  3. Query data in the child table.
  4. Delete the child table.

Configuration Operations Before Running

In security mode, the Spark Core sample code needs to read two files (user.keytab and krb5.conf). The user.keytab and krb5.conf files are authentication files in the security mode. Download the authentication credentials of the user principal on FusionInsight Manager. The user in the example code is sparkuser, change the value to the prepared development user name.

Packaging the Project

  • Upload the krb5.conf and user.keytab files to the server where the client is located.
  • Use the Maven tool provided by IDEA to pack the project and generate a JAR file. For details, see Compiling and Running the Application.
    • Before compilation and packaging, change the paths of the user.keytab and krb5.conf files in the sample code to the actual paths on the client server where the files are located. Example: /opt/female/user.keytab and /opt/female/krb5.conf.
  • Upload the JAR file to any directory (for example, /opt/female/) on the server where the Spark client is located.

Running Tasks

Go to the Spark client directory and run the java -cp command to run the code (the class name and file name must be the same as those in the actual code. The following is only an example).

  • Run the Java example code:

    java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerJavaExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf

  • Run the Scala example code:

    java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf

    After the SSL feature of ZooKeeper is enabled for the cluster (check the ssl.enabled parameter of the ZooKeeper service), add the -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty parameter to the command:

    java -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerJavaExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf