Updated on 2024-10-23 GMT+08:00

Spark Client CLI

For how to use the Spark CLIs, visit the official website http://archive.apache.org/dist/spark/docs/3.3.1/quick-start.html.

Common CLI

Common Spark CLIs are described as follows:

  • spark-shell

    It provides an easy way to learn APIs, which is similar to the tool for interactive data analysis. It supports two languages including Scala and Python. In the Spark directory, run the ./bin/spark-shell command to log in the interactive interface of Scala, obtain data from HDFS, and perform the RDD.

    For example: a row of codes can count all words in a file.

    scala> sc.textFile("hdfs://10.96.1.57:9000//wordcount_data.txt").flatMap(l => l.split(" ")).map(w => (w,1)).reduceByKey(_+_).collect()

  • spark-submit

    It is used to submit the Spark application to the Spark cluster for running and return the running results. The class, master, jar and input parameter need to be specified.

    For example: Run the GroupByTest example in the jar. There are four input parameters and the specified running mode of the cluster is local single platform.

    ./bin/spark-submit --class org.apache.spark.examples.GroupByTest --master local[1] examples/jars/spark-examples_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar 6 10 10 3

  • spark-sql

    It is used to run the Hive metadata service and query command lines in the local or cluster mode. If its logical plan needs to be queried, add "explain extended" before the SQL statement.

    For example:

    Select key from src group by key

  • run-example

    It is used to run or debug the default example in the Spark open-source community.

    For example: Run the SparkPi.

    ./run-example SparkPi 100