Spark Client CLI

Common Spark CLIs are described as follows:

spark-shell
It provides an easy way to learn APIs, which is similar to the tool for interactive data analysis. It supports two languages including Scala and Python. In the Spark directory, run the ./bin/spark-shell command to log in the interactive interface of Scala, obtain data from HDFS, and perform the RDD.

For example: a row of codes can count all words in a file.

scala> sc.textFile("hdfs://10.96.1.57:9000//wordcount_data.txt").flatMap(l => l.split(" ")).map(w => (w,1)).reduceByKey(_+_).collect()
spark-submit
It is used to submit the Spark application to the Spark cluster for running and return the running results. The class, master, jar and input parameter need to be specified.

For example: Run the GroupByTest example in the jar. There are four input parameters and the specified running mode of the cluster is local single platform.

./bin/spark-submit --class org.apache.spark.examples.GroupByTest --master local[1] examples/jars/spark-examples_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar 6 10 10 3
spark-sql
It is used to run the Hive metadata service and query command lines in the local or cluster mode. If its logical plan needs to be queried, add "explain extended" before the SQL statement.

For example:

Select key from src group by key
run-example
It is used to run or debug the default example in the Spark open-source community.

For example: Run the SparkPi.

./run-example SparkPi 100

Parent topic: Common Spark APIs

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.