Overview
Scenarios
This section describes how to use Spark to perform operations such as data insertion, query, update, incremental query, query at a specific time point, and data deletion on Hudi.
For details, see the sample code.
Packaging the Project
- Upload the user.keytab and krb5.conf files to the server where the client is located.
- Use the Maven tool provided by IDEA to package the project and generate the JAR file. For details, see Compiling and Running the Application.
- Before compilation and packaging, change the paths of the user.keytab and krb5.conf files in the sample code to the actual paths on the client server.
- The Python sample code does not need to be packaged using Maven.
- Upload the generated JAR file to any directory (for example, /opt/example/) on the server where the Spark client is located.
Running tasks
After compiling and building the sample code, you can use the spark-submit command to perform the write, update, query, and delete operations in sequence.
- Run the Java sample program.
spark-submit --keytab <user_keytab_path> --principal=<principal_name> --class com.huawei.bigdata.hudi.examples.HoodieWriteClientExample /opt/example/hudi-java-security-examples-1.0.jar hdfs://hacluster/tmp/example/hoodie_java hoodie_java
<user_keytab_path> indicates the authentication file path, <principal_name> indicates the authentication user name, /opt/example/hudi-java-examples-1.0.jar indicates the JAR file path, hdfs://hacluster/tmp/example/hoodie_java indicates the storage path of the Hudi table, and hoodie_java indicates the name of the Hudi table.
- Run the Scala sample program.
spark-submit --keytab <user_keytab_path> --principal=<principal_name> --class com.huawei.bigdata.hudi.examples.HoodieDataSourceExample /opt/example/hudi-scala-security-examples-1.0.jar hdfs://hacluster/tmp/example/hoodie_scala hoodie_scala
/opt/example/hudi-scala-examples-1.0.jar indicates the JAR file path, <user_keytab_path> indicates the authentication file path, <principal_name> indicates the authentication user name, hdfs://hacluster/tmp/example/hoodie_scala indicates the storage path of the Hudi table, and hoodie_Scala indicates the name of the Hudi table.
- Run the Python sample program.
spark-submit /opt/example/HudiPythonExample.py hdfs://hacluster/tmp/huditest/example/python hudi_trips_cow
hdfs://hacluster/tmp/huditest/example/python indicates the storage path of the Hudi table, and hudi_trips_cow indicates the name of the Hudi table.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.