Common CLIs

For details about how to use the Spark CLIs, visit the official website http://spark.apache.org/docs/3.1.1/quick-start.html.

Common Spark CLIs are described as follows:

spark-shell
It provides an easy way to learn APIs, which is similar to the tool for interactive data analysis. It supports two languages including Scala and Python. In the Spark directory, run ./bin/spark-shell to log in the interactive interface of Scala, obtain data from the HDFS and perform the RDD.

For example: a row of codes can count all words in a file.

scala> sc.textFile("hdfs://10.96.1.57:9000//wordcount_data.txt").flatMap(l => l.split(" ")).map(w => (w,1)).reduceByKey(_+_).collect()
spark-submit
It is used to submit the Spark application to the Spark cluster for running and return the running results. The class, master, jar and input parameter need to be specified.

For example: Run the GroupByTest example in the jar. There are four input parameters and the specified running mode of the cluster is local single platform.

./bin/spark-submit --class org.apache.spark.examples.GroupByTest --master local[1] examples/jars/spark-examples_2.12-3.1.1-hw-ei-311001.jar 6 10 10 3
spark-sql
It is used to perform the Hive metadata service and query command lines in the local mode. If its logical plan needs to be queried, add "explain extended" before the SQL statement.

For example:

Select key from src group by key
run-example
It is used to run or debug the default example in the Spark open-source community.

For example: Run the SparkPi.

./run-example SparkPi 100