Spark Client CLI
For how to use the Spark CLIs, visit the official website http://spark.apache.org/docs/3.1.1/quick-start.html.
Common CLIs
Common Spark CLIs are as follows:
- spark-shell
This command offers an easy method to familiarize oneself with APIs, much like interactive data analysis tools. It supports two languages including Scala and Python. In the Spark directory, run the ./bin/spark-shell command to access the interactive interface of Scala, obtain data from HDFS, and then perform the RDD.
Example: A line of code can be used to collect statistics on all words in a file.
scala> sc.textFile("hdfs://10.96.1.57:9000//wordcount_data.txt").flatMap(l => l.split(" ")).map(w => (w,1)).reduceByKey(_+_).collect()
You can directly specify the Keytab and Principal in the command line to obtain authentication, and regularly update the keytab and authorized tokens to avoid the authentication expiry. The following command is used as an example.
spark-shell --principal spark2x/hadoop.<System domain name>@<System domain name> --keytab ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/keytab/spark2x/SparkResource/spark2x.keytab --master yarn
- spark-submit
This command is used to submit Spark applications to a Spark cluster for running and return the running result. The class, master, JAR file, and input parameters must be specified.
Example: Run GroupByTest in the JAR file, use four input parameters, and set the cluster running mode to local single-core.
./bin/spark-submit --class org.apache.spark.examples.GroupByTest --master local[1] examples/jars/spark-examples_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar 6 10 10 3
You can directly specify the Keytab and Principal in the command line to obtain authentication, and regularly update the keytab and authorized tokens to avoid the authentication expiry. The following command is used as an example.
spark-submit --class org.apache.spark.examples.GroupByTest --master yarn --principal spark2x/hadoop.<System domain name>@<System domain name> --keytab ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/keytab/spark2x/SparkResource/spark2x.keytab examples/jars/spark-examples_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar 6 10 10 3
- spark-sql
It is used to run the Hive metadata service and query command lines in the local or cluster mode. If its logical plan needs to be queried, add "explain extended" before the SQL statement.
The following is an example:
Select key from src group by key
You can directly specify the Keytab and Principal in the command line to obtain authentication, and regularly update the keytab and authorized tokens to avoid the authentication expiry. The following command is used as an example.
spark-sql --principal spark2x/hadoop.<System domain name>@<System domain name> --keytab ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/keytab/spark2x/SparkResource/spark2x.keytab --master yarn
- run-example
This command is used to run or debug the built-in examples in the Spark open source community.
Example: Run SparkPi.
./run-example SparkPi 100
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot