Updated on 2024-10-23 GMT+08:00

Development Plan

Scenarios

You can customize JDBCServer clients and use JDBC connections to create, load data to, query, and delete data tables.

Data Preparation

Upload the data file to HDFS.

  1. Ensure that the JDBCServer service has been started in multi-active instance HA mode and at least one instance provides connections for client. On the HDFS client of the Linux OS, create a text file named data. The file content is as follows:

    Miranda,32
    Karlie,23
    Candice,27

  2. Create a directory in the HDFS directory, for example, /home, and run the following commands to upload the data file to the directory:

    1. Log in to the HDFS client node and run the following commands:

      cd Client installation directory

      source bigdata_env

      kinit Service user for authentication

    2. Run the following command to create the /home directory:

      hdfs dfs -mkdir /home

    3. Run the following command to upload the data file:

      hdfs dfs -put data /home

  3. Ensure that the user whose starts the JDBCServer has the read and write permission on the file.
  4. Ensure that the hive-site.xml file exists in classpath, and set parameters required for the client connection. For details about parameters required for the JDBCServer, see Spark JDBCServer APIs.

Development Idea

  1. Create a child table in the default database.
  2. Add data in /home/data to the child table.
  3. Query data in the child table.
  4. Delete the child table.

Configuration Operations Before Running

In security mode, the Spark Core sample code needs to read two files (user.keytab and krb5.conf). The user.keytab and krb5.conf files are authentication files in the security mode. Download the authentication credentials of the user principal on FusionInsight Manager. The user in the sample code is sparkuser, change the value to the prepared development user name.

Packaging the Project

  • Upload the krb5.conf and user.keytab files to the server where the client is located.
  • Use the Maven tool provided by IDEA to pack the project and generate a JAR file. For details, see Writing and Running the Spark Program in the Linux Environment.

    Before compilation and packaging, change the paths of the user.keytab and krb5.conf files in the sample code to the actual paths on the client server where the files are located. Example: /opt/female/user.keytab and /opt/female/krb5.conf.

  • Upload the JAR file to any directory (for example, /opt/female/) on the server where the Spark client is located.

Running Tasks

Go to the Spark client directory and run the java -cp command to run the code (the class name and file name must be the same as those in the actual code. The following is only an example).

  • Run the Java sample code:

    java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerJavaExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf

  • Run the Scala sample code:

    java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf

    After the SSL feature of ZooKeeper is enabled for the cluster (check the ssl.enabled parameter of the ZooKeeper service), add the -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty parameter to the command:

    java -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/hive/*:$SPARK_HOME/conf:/opt/female/SparkThriftServerJavaExample-1.0.jar com.huawei.bigdata.spark.examples.ThriftServerQueriesTest $SPARK_HOME/conf/hive-site.xml $SPARK_HOME/conf/spark-defaults.conf