Help Center> Cloud Container Engine> Best Practices> Batch Computing> Running Spark on CCE> Installing Spark

Installing Spark

Prerequisites

JDK 1.8 or later must be configured in the environment.

Obtaining the SDK Package

As OBS matches Hadoop 2.8.3 and 3.1.1, the spark-2.4.5-bin-hadoop2.8.tgz package is used in this example.

git clone -b v2.4.5 https://github.com/apache/spark.git

./dev/make-distribution.sh --name hadoop2.8 --tgz -Pkubernetes -Pyarn -Dhadoop.version=2.8.3

Obtaining the HUAWEI CLOUD OBS JAR Package

The hadoop-huaweicloud-2.8.3-hw-40.jar package is used, which can be obtained from https://github.com/huaweicloud/obsa-hdfs/tree/master/release.

Configuring Spark Running Environment

To simplify the operation, use the root user to place spark-2.4.5-bin-hadoop2.8.tgz in the /root directory on the operation node.

Run the following command to install Spark:

    tar -zxvf spark-2.4.5-bin-hadoop2.8.tgz
    mv spark-2.4.5-bin-hadoop2.8 spark-obs
    cat >> ~/.bashrc <<EOF
PATH=/root/spark-obs/bin:\$PATH
PATH=/root/spark-obs/sbin:\$PATH
export SPARK_HOME=/root/spark-obs
EOF
 
    source ~/.bashrc

At this time, the spark-submit script is available. You can run the spark-submit --version command to check the Spark version.

Click to enlarge

Interconnecting Spark with OBS

Copy the HUAWEI CLOUD OBS JAR package to the corresponding directory.
cp hadoop-huaweicloud-2.8.3-hw-40.jar /root/spark-obs/jars/

Modify Spark ConfigMaps.

To interconnect Spark with OBS, add ConfigMaps for Spark as follows:

    cp ~/spark-obs/conf/spark-defaults.conf.template ~/spark-obs/conf/spark-defaults.conf
    cat >> ~/spark-obs/conf/spark-defaults.conf <<EOF
spark.hadoop.fs.obs.readahead.inputstream.enabled=true
spark.hadoop.fs.obs.buffer.max.range=6291456
spark.hadoop.fs.obs.buffer.part.size=2097152
spark.hadoop.fs.obs.threads.read.core=500
spark.hadoop.fs.obs.threads.read.max=1000
spark.hadoop.fs.obs.write.buffer.size=8192
spark.hadoop.fs.obs.read.buffer.size=8192
spark.hadoop.fs.obs.connection.maximum=1000
spark.hadoop.fs.obs.access.key=******
spark.hadoop.fs.obs.secret.key=******
spark.hadoop.fs.obs.endpoint=******
spark.hadoop.fs.obs.buffer.dir=/root/hadoop-obs/obs-cache
spark.hadoop.fs.obs.impl=org.apache.hadoop.fs.obs.OBSFileSystem
spark.hadoop.fs.obs.connection.ssl.enabled=false
spark.hadoop.fs.obs.fast.upload=true
spark.hadoop.fs.obs.socket.send.buffer=65536
spark.hadoop.fs.obs.socket.recv.buffer=65536
spark.hadoop.fs.obs.max.total.tasks=20
spark.hadoop.fs.obs.threads.max=20
EOF 
    
    vim ~/spark-obs/conf/spark-defaults.conf

Change the values of AK_OF_YOUR_ACCOUNT, SK_OF_YOUR_ACCOUNT, and OBS_ENDPOINT to the actual values.

Pushing an Image to SWR

Running Spark on Kubernetes requires the Spark image of the same version. A Dockerfile file has been generated during compilation. You can use this file to create an image and push it to SWR.

Create an image.
cd ~/spark-obs

docker build -t spark:2.4.5-obs -f kubernetes/dockerfiles/spark/Dockerfile .
Push the image.
Log in to the SWR console and obtain the login command.

Log in to the node where the image is created and run the login command.

docker tag [{Image name}:{Image tag}] swr.cn-east-3.myhuaweicloud.com/{Organization name}/{Image name}:{Image tag}

docker push swr.cn-east-3.myhuaweicloud.com/{Organization name}/{Image name}:{Image tag}

For example:

Record the image access address for later use.

For example, swr.cn-east-3.myhuaweicloud.com/batch/spark:2.4.5-obs.

Configuring Spark History Server

cat >> ~/spark-obs/conf/spark-defaults.conf <<EOF
spark.eventLog.enabled=true
spark.eventLog.dir=obs://******
EOF

Ensure that the bucket name and directory in the preceding command are valid.

For example, obs://spark-sh1/history-obs/ is a valid OBS directory.

Modify the ~/spark-obs/conf/spark-env.sh file. If the file does not exist, run the command to copy the template as a file:

    cp ~/spark-obs/conf/spark-env.sh.template ~/spark-obs/conf/spark-env.sh
 
    cat >> ~/spark-obs/conf/spark-env.sh <<EOF
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=obs://******"
    EOF

The OBS directory must be the same as that in spark-default.conf.

start-history-server.sh

Start Spark History Server.

Click to enlarge

After the startup, you can access the server over port 18080.

Click to enlarge

Parent topic: Running Spark on CCE

Last Article: Running Spark on CCE

Next Article: Using Spark on CCE

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Installing Spark

Prerequisites

Obtaining the SDK Package

Obtaining the HUAWEI CLOUD OBS JAR Package

Configuring Spark Running Environment

Interconnecting Spark with OBS

Pushing an Image to SWR

Configuring Spark History Server