Submitting a Spark Jar Job Using Livy
Introduction to DLI Livy
DLI Livy is an Apache Livy-based client tool used to submit Spark jobs to DLI.
Preparations
- Create an elastic resource pool and create queues within it. When creating a queue, select General-purpose, which is the compute resources used to run Spark jobs.
- Prepare a Linux ECS for installing DLI Livy.
- Enable ports 30000 to 32767 and port 8998 on the ECS. For details, see Adding a Security Group Rule.
- Install JDK on the ECS. JDK 1.8 is recommended. Configure Java environment variable JAVA_HOME.
- View the ECS details to obtain its private IP address.
- Use an enhanced datasource connection to connect the DLI queue to the VPC where the Livy instance is located.
Step 1: Download and Install DLI Livy

The software package used in the following operations is apache-livy-0.7.2.0107-bin.tar.gz. Replace it with the latest one.
- Download the DLI Livy software package.
- Use WinSCP to upload the obtained software package to the prepared ECS directory.
- Log in to ECS as user root and perform the following steps to install DLI Livy:
- Run the following command to create an installation directory:
mkdir Livy installation directory
For example, to create the /opt/livy directory, run the mkdir /opt/livy command. The following operations use the /opt/livy installation directory as an example. Replace it as required.
- Run the following command to decompress the software package to the installation directory:
tar --extract --file apache-livy-0.7.2.0107-bin.tar.gz --directory /opt/livy --strip-components 1 --no-same-owner
- Run the following commands to change the configuration file name:
mv livy-client.conf.template livy-client.conf
mv livy.conf.template livy.conf
mv livy-env.sh.template livy-env.sh
mv log4j.properties.template log4j.properties
mv spark-blacklist.conf.template spark-blacklist.conf
touch spark-defaults.conf
- Run the following command to create an installation directory:
Step 2: Modify the DLI Livy Configuration File
- Upload the specified DLI Livy JAR package to the OBS bucket directory.
- Log in to OBS console and create a directory for storing the DLI Livy JAR package in the specified OBS bucket, for example: obs://bucket/livy/jars/.
- Go to the installation directory of the ECS where the DLI-Livy tool has been installed in 3.a, obtain Livy JAR packages, and upload them to the OBS bucket directory created in 1.a:
For example, if the installation path is /opt/livy, the JAR packages you need to upload are as follows:
/opt/livy/rsc-jars/livy-api-0.7.2.0107.jar /opt/livy/rsc-jars/livy-rsc-0.7.2.0107.jar /opt/livy/repl_2.11-jars/livy-core_2.11-0.7.2.0107.jar /opt/livy/repl_2.11-jars/livy-repl_2.11-0.7.2.0107.jar
- Modify the DLI Livy configuration file.
- Run the following command to modify the /opt/livy/conf/livy-client.conf configuration file:
vi /opt/livy/conf/livy-client.conf
Add the following content to the file and modify the configuration items as required:# Set the private IP address of the ECS, which can be obtained by running the ifconfig command. livy.rsc.launcher.address = X.X.X.X # Set the ports enabled on the ECS. livy.rsc.launcher.port.range = 30000~32767
- Run the following command to modify the /opt/livy/conf/livy.conf configuration file:
vi /opt/livy/conf/livy.conf
Add the following content to the file and modify the configuration items as required:livy.server.port = 8998 livy.spark.master = yarn livy.server.contextLauncher.custom.class=org.apache.livy.rsc.DliContextLauncher livy.server.batch.custom.class=org.apache.livy.server.batch.DliBatchSession livy.server.interactive.custom.class=org.apache.livy.server.interactive.DliInteractiveSession livy.server.sparkApp.custom.class=org.apache.livy.utils.SparkDliApp livy.server.recovery.mode = recovery livy.server.recovery.state-store = filesystem # Change the following file directory of DLI Livy as needed: livy.server.recovery.state-store.url = file:///opt/livy/store/ livy.server.session.timeout-check = true livy.server.session.timeout = 1800s livy.server.session.state-retain.sec = 1800s livy.dli.spark.version = 2.3.2 livy.dli.spark.scala-version = 2.11 # Enter the OBS bucket path that stores the Livy JAR file. livy.repl.jars = obs://bucket/livy/jars/livy-core_2.11-0.7.2.0107.jar, obs://bucket/livy/jars/livy-repl_2.11-0.7.2.0107.jar livy.rsc.jars = obs://bucket/livy/jars/livy-api-0.7.2.0107.jar, obs://bucket/livy/jars/livy-rsc-0.7.2.0107.jar
- Run the following command to modify the /opt/livy/conf/spark-defaults.conf configuration file:
vi /opt/livy/conf/spark-defaults.conf
Add the following content to the file. Set the parameters based on Table 1.
# The following parameters can be overwritten when a job is submitted. spark.yarn.isPython=true spark.pyspark.python=python3 # Enter the production environment URL of DLI. spark.dli.user.uiBaseAddress=https://console.huaweicloud.com/dli/web # Set the region where the queue is located. spark.dli.user.regionName=XXXX # Set the DLI endpoint address. spark.dli.user.dliEndPoint=XXXX # Enter the name of the created DLI queue. spark.dli.user.queueName=XXXX # Set the project ID used for submitting a job. spark.dli.user.projectId=XXXX
Table 1 Mandatory parameters in spark-defaults.conf Parameter
Description
spark.dli.user.regionName
Name of the region where the DLI queue is.
spark.dli.user.dliEndPoint
Endpoint where the DLI queue is located.
spark.dli.user.queueName
Queue name.
spark.dli.user.access.key
User's AK/SK. The user must have Spark job permissions. For details, see Permissions Management.
For how to obtain the AK/SK, see Obtaining the AK/SK.
spark.dli.user.secret.key
spark.dli.user.projectId
Project ID. Obtain it by referring to Obtaining a Project ID.
The following parameters are optional. Set them based on the parameter description and site requirements. For details about these parameters, see Spark Configuration.
Table 2 Optional parameters in spark-defaults.conf Spark Job Parameter
Spark Batch Processing Parameter
Remarks
spark.dli.user.file
file
Not required for connecting to the notebook tool
spark.dli.user.className
class_name
Not required for connecting to the notebook tool
spark.dli.user.scType
sc_type
Same as the native Livy configuration
spark.dli.user.args
args
Same as the native Livy configuration
spark.submit.pyFiles
python_files
Same as the native Livy configuration
spark.files
files
Same as the native Livy configuration
spark.dli.user.modules
modules
-
spark.dli.user.image
image
Custom image used for submitting a job. This parameter is available for container clusters only and is not set by default.
spark.dli.user.autoRecovery
auto_recovery
-
spark.dli.user.maxRetryTimes
max_retry_times
-
spark.dli.user.catalogName
catalog_name
To access metadata, set this parameter to dli.
- Run the following command to modify the /opt/livy/conf/livy-client.conf configuration file:
Step 3: Start DLI Livy
- Run the following command to go to the DLI Livy installation directory:
Example: cd /opt/livy
- Run the following command to start DLI Livy:
./bin/livy-server start
Step 4: Submit a Spark Job to DLI Using DLI Livy
The following demonstrates how to submit a Spark job to DLI using DLI Livy and by running the curl command.
- Upload the JAR file of the developed Spark job program to the OBS directory.
For example, upload spark-examples_2.11-XXXX.jar to the obs://bucket/path directory.
To write the output data of a Spark Jar job to OBS, AK/SK is required for accessing OBS. To ensure the security of AK/SK data, you can use Data Encryption Workshop (DEW) and Cloud Secret Management Service (CSMS) for unified management of AK/SK, effectively avoiding sensitive information leakage and business risks caused by hard-coded or plaintext configuration of programs.
For details, see Obtaining Temporary Credentials from a Spark Job's Agency for Accessing Other Cloud Services.
- Log in to the ECS server where DLI Livy is installed as user root.
- Run the curl command to submit a Spark job request to DLI using DLI Livy.
ECS_IP indicates the private IP address of the ECS where DLI Livy is installed.
curl --location --request POST 'http://ECS_IP:8998/batches' \ --header 'Content-Type: application/json' \ --data '{ "driverMemory": "3G", "driverCores": 1, "executorMemory": "2G", "executorCores": 1, "numExecutors": 1, "args": [ "1000" ], "file": "obs://bucket/path/spark-examples_2.11-XXXX.jar", "className": "org.apache.spark.examples.SparkPi", "conf": { "spark.dynamicAllocation.minExecutors": 1, "spark.executor.instances": 1, "spark.dynamicAllocation.initialExecutors": 1, "spark.dynamicAllocation.maxExecutors": 2 } }'
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.