Installing Spark
Prerequisites
JDK 1.8 or later must be configured in the environment.
Obtaining the SDK Package
As OBS matches Hadoop 2.8.3 and 3.1.1, the spark-2.4.5-bin-hadoop2.8.tgz package is used in this example.
git clone -b v2.4.5 https://github.com/apache/spark.git
./dev/make-distribution.sh --name hadoop2.8 --tgz -Pkubernetes -Pyarn -Dhadoop.version=2.8.3
Obtaining the HUAWEI CLOUD OBS JAR Package
The hadoop-huaweicloud-2.8.3-hw-40.jar package is used, which can be obtained from https://github.com/huaweicloud/obsa-hdfs/tree/master/release.
Configuring Spark Running Environment
To simplify the operation, use the root user to place spark-2.4.5-bin-hadoop2.8.tgz in the /root directory on the operation node.
Run the following command to install Spark:
tar -zxvf spark-2.4.5-bin-hadoop2.8.tgz
mv spark-2.4.5-bin-hadoop2.8 spark-obs
cat >> ~/.bashrc <<EOF
PATH=/root/spark-obs/bin:\$PATH
PATH=/root/spark-obs/sbin:\$PATH
export SPARK_HOME=/root/spark-obs
EOF
source ~/.bashrc
At this time, the spark-submit script is available. You can run the spark-submit --version command to check the Spark version.

Interconnecting Spark with OBS
- Copy the HUAWEI CLOUD OBS JAR package to the corresponding directory.
- Modify Spark ConfigMaps.
To interconnect Spark with OBS, add ConfigMaps for Spark as follows:
cp ~/spark-obs/conf/spark-defaults.conf.template ~/spark-obs/conf/spark-defaults.conf cat >> ~/spark-obs/conf/spark-defaults.conf <<EOF spark.hadoop.fs.obs.readahead.inputstream.enabled=true spark.hadoop.fs.obs.buffer.max.range=6291456 spark.hadoop.fs.obs.buffer.part.size=2097152 spark.hadoop.fs.obs.threads.read.core=500 spark.hadoop.fs.obs.threads.read.max=1000 spark.hadoop.fs.obs.write.buffer.size=8192 spark.hadoop.fs.obs.read.buffer.size=8192 spark.hadoop.fs.obs.connection.maximum=1000 spark.hadoop.fs.obs.access.key=****** spark.hadoop.fs.obs.secret.key=****** spark.hadoop.fs.obs.endpoint=****** spark.hadoop.fs.obs.buffer.dir=/root/hadoop-obs/obs-cache spark.hadoop.fs.obs.impl=org.apache.hadoop.fs.obs.OBSFileSystem spark.hadoop.fs.obs.connection.ssl.enabled=false spark.hadoop.fs.obs.fast.upload=true spark.hadoop.fs.obs.socket.send.buffer=65536 spark.hadoop.fs.obs.socket.recv.buffer=65536 spark.hadoop.fs.obs.max.total.tasks=20 spark.hadoop.fs.obs.threads.max=20 EOF vim ~/spark-obs/conf/spark-defaults.confChange the values of AK_OF_YOUR_ACCOUNT, SK_OF_YOUR_ACCOUNT, and OBS_ENDPOINT to the actual values.
Pushing an Image to SWR
Running Spark on Kubernetes requires the Spark image of the same version. A Dockerfile file has been generated during compilation. You can use this file to create an image and push it to SWR.
- Create an image.
docker build -t spark:2.4.5-obs -f kubernetes/dockerfiles/spark/Dockerfile .
- Push the image.
Log in to the SWR console and obtain the login command.

Log in to the node where the image is created and run the login command.

docker tag [{Image name}:{Image tag}] swr.cn-east-3.myhuaweicloud.com/{Organization name}/{Image name}:{Image tag}
docker push swr.cn-east-3.myhuaweicloud.com/{Organization name}/{Image name}:{Image tag}
For example:

Record the image access address for later use.
For example, swr.cn-east-3.myhuaweicloud.com/batch/spark:2.4.5-obs.
Configuring Spark History Server
cat >> ~/spark-obs/conf/spark-defaults.conf <<EOF spark.eventLog.enabled=true spark.eventLog.dir=obs://****** EOF
Ensure that the bucket name and directory in the preceding command are valid.
For example, obs://spark-sh1/history-obs/ is a valid OBS directory.
Modify the ~/spark-obs/conf/spark-env.sh file. If the file does not exist, run the command to copy the template as a file:
cp ~/spark-obs/conf/spark-env.sh.template ~/spark-obs/conf/spark-env.sh
cat >> ~/spark-obs/conf/spark-env.sh <<EOF
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=obs://******"
EOF
The OBS directory must be the same as that in spark-default.conf.
start-history-server.sh
Start Spark History Server.

After the startup, you can access the server over port 18080.

Last Article: Running Spark on CCE
Next Article: Using Spark on CCE
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.