Using Spark on CCE
Running SparkPi on CCE
The following describes how to submit a Spark-Pi job to CCE.
spark-submit \
--master k8s://https://aa.bb.cc.dd:5443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=swr.cn-east-3.myhuaweicloud.com/batch/spark:2.4.5-obs \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
Configuration description:
- aa.bb.cc.dd is the master address specified in ~/.kube/config. You can run the kubectl cluster-info command to obtain the master address.
- spark.kubernetes.container.image is the address of the pushed image. If the image is a private image, you also need to configure spark.kubernetes.container.image.pullSecrets.
- All parameters that can be specified using --conf are read from the ~/spark-obs/conf/spark-defaults.conf file by default. Therefore, the general configuration can be written to be the default settings, the same way as OBS access configuration.
Accessing OBS
Use spark-submit to deliver an HDFS job. Change the value of obs://bucket-name/filename at the end of the script to the actual file name of the tenant.
spark-submit \ --master k8s://https://aa.bb.cc.dd:5443 \ --deploy-mode cluster \ --name spark-hdfs-test \ --class org.apache.spark.examples.HdfsTest \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=swr.cn-east-3.myhuaweicloud.com/batch/spark:2.4.5-obs \ local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar obs://bucket-name/filename
Support for Spark Shell Commands to Interact with Spark-Scala
spark-shell \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image= swr.cn-east-3.myhuaweicloud.com/batch/spark:2.4.5-obs \ --master k8s://https://aa.bb.cc.dd:5443
Run the following commands to define the algorithms of Spark computing jobs linecount and wordcount:
def linecount(input:org.apache.spark.sql.Dataset[String]):Long=input.filter(line => line.length()>0).count()
def wordcount(input:org.apache.spark.sql.Dataset[String]):Long=input.flatMap(value => value.split("\\s+")).groupByKey(value => value).count().count()
Run the following commands to define data sources:
var alluxio = spark.read.textFile("alluxio://alluxio-master:19998/sample-1g")
var obs = spark.read.textFile("obs://gene-container-gtest/sample-1g")
var hdfs = spark.read.textFile("hdfs://192.168.1.184:9000/user/hadoop/books/sample-1g")
Run the following command to start computing jobs:
spark.time(wordcount(obs)) spark.time(linecount(obs))
Last Article: Installing Spark
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.