Submitting a Spark Jar Job Using Livy
Introduction to DLI Livy
DLI Livy is an Apache Livy-based client tool used to submit Spark jobs to DLI.
Preparations
- Create a queue. Set Queue Usage to For general purpose, that is, the computing resources of the Spark job.
- Prepare a Linux ECS for installing DLI Livy.
- Enable ports 30000 to 32767 and port 8998 on the ECS. For details, see Adding a Security Group Rule.
- Install JDK on the ECS. JDK 1.8 is recommended. Configure Java environment variable JAVA_HOME.
- View the ECS details to obtain its private IP address.
- Use an enhanced datasource connection to connect the DLI queue to the VPC where the Livy instance is located.
Downloading and Installing DLI Livy
data:image/s3,"s3://crabby-images/81435/81435b8f99c96639d72f97757f299b4fce1da9f1" alt=""
The software package used in the following operations is apache-livy-0.7.2.0103-bin.tar.gz. Replace it with the latest one.
- Download the DLI Livy software package.
- Use WinSCP to upload the obtained software package to the prepared ECS directory.
- Log in to ECS as user root and perform the following steps to install DLI Livy:
- Run the following command to create an installation directory:
mkdir Livy installation directory
For example, to create the /opt/livy directory, run the mkdir /opt/livy command. The following operations use the /opt/livy installation directory as an example. Replace it as required.
- Run the following command to decompress the software package to the installation directory:
tar --extract --file apache-livy-0.7.2.0103-bin.tar.gz --directory /opt/livy --strip-components 1 --no-same-owner
- Run the following commands to change the configuration file name:
mv livy-client.conf.template livy-client.conf
mv livy.conf.template livy.conf
mv livy-env.sh.template livy-env.sh
mv log4j.properties.template log4j.properties
mv spark-blacklist.conf.template spark-blacklist.conf
touch spark-defaults.conf
- Run the following command to create an installation directory:
Modifying the DLI Livy Configuration File
- Upload the specified DLI Livy JAR package to the OBS bucket directory.
- Log in to OBS console and create a directory for storing the DLI Livy JAR package in the specified OBS bucket, for example: obs://bucket/livy/jars/.
- Go to the installation directory of the ECS where the DLI-Livy tool has been installed in 3.a, obtain Livy JAR packages, and upload them to the OBS bucket directory created in 1.a:
For example, if the installation path is /opt/livy, the JAR packages you need to upload are as follows:
/opt/livy/rsc-jars/livy-api-0.7.2.0103.jar /opt/livy/rsc-jars/livy-rsc-0.7.2.0103.jar /opt/livy/repl_2.11-jars/livy-core_2.11-0.7.2.0103.jar /opt/livy/repl_2.11-jars/livy-repl_2.11-0.7.2.0103.jar
- Modify the DLI Livy configuration file.
- Run the following command to modify the /opt/livy/conf/livy-client.conf configuration file:
vi /opt/livy/conf/livy-client.conf
Add the following content to the file and modify the configuration items as required:# Set the private IP address of the ECS, which can be obtained by running the ifconfig command. livy.rsc.launcher.address = X.X.X.X # Set the ports enabled on the ECS. livy.rsc.launcher.port.range = 30000~32767
- Run the following command to modify the /opt/livy/conf/livy.conf configuration file:
vi /opt/livy/conf/livy.conf
Add the following content to the file and modify the configuration items as required:livy.server.port = 8998 livy.spark.master = yarn livy.server.contextLauncher.custom.class=org.apache.livy.rsc.DliContextLauncher livy.server.batch.custom.class=org.apache.livy.server.batch.DliBatchSession livy.server.interactive.custom.class=org.apache.livy.server.interactive.DliInteractiveSession livy.server.sparkApp.custom.class=org.apache.livy.utils.SparkDliApp livy.server.recovery.mode = recovery livy.server.recovery.state-store = filesystem # Change the following file directory of DLI Livy as needed: livy.server.recovery.state-store.url = file:///opt/livy/store/ livy.server.session.timeout-check = true livy.server.session.timeout = 1800s livy.server.session.state-retain.sec = 1800s livy.dli.spark.version = 2.3.2 livy.dli.spark.scala-version = 2.11 # Enter the OBS bucket directory created in 1.a for storing the Livy JAR package. livy.repl.jars = obs://bucket/livy/jars/livy-core_2.11-0.7.2.0103.jar, obs://bucket/livy/jars/livy-repl_2.11-0.7.2.0103.jar livy.rsc.jars = obs://bucket/livy/jars/livy-api-0.7.2.0103.jar, obs://bucket/livy/jars/livy-rsc-0.7.2.0103.jar
- Run the following command to modify the /opt/livy/conf/spark-defaults.conf configuration file:
vi /opt/livy/conf/spark-defaults.conf
Add the following parameters to the file. For details about the parameter configurations, see Table 1.
# The following parameters can be overwritten when a job is submitted. spark.yarn.isPython=true spark.pyspark.python=python3 # Enter the production environment URL of DLI. spark.dli.user.uiBaseAddress=https://console.huaweicloud.com/dli/web # Set the region where the queue is located. spark.dli.user.regionName=XXXX # Set the DLI endpoint address. spark.dli.user.dliEndPoint=XXXX # Enter the name of the created DLI queue. spark.dli.user.queueName=XXXX # Set the AK used for submitting a job. spark.dli.user.access.key=XXXX # Set the SK used for submitting a job. spark.dli.user.secret.key=XXXX # Set the project ID used for submitting a job. spark.dli.user.projectId=XXXX
Table 1 Mandatory parameters in spark-defaults.conf Parameter
Description
spark.dli.user.regionName
Name of the region where the DLI queue is located.
you can obtain the region name from .
spark.dli.user.dliEndPoint
Endpoint where the DLI queue is located. You can obtain the endpoint from .
spark.dli.user.queueName
Queue name.
spark.dli.user.access.key
Access key of a user. The user must have the Spark job permissions. For details, see .
For details about how to obtain the AK/SK, see .
spark.dli.user.secret.key
spark.dli.user.projectId
Project ID, which can be obtained by refering to "Obtaining a Project ID".
The following parameters are optional. Set them based on the parameter description and site requirements. For details about these parameters, see Spark Configuration.
Table 2 Optional parameters in spark-defaults.conf Spark Job Parameter
Spark Batch Processing Parameter
Remarks
spark.dli.user.file
file
Not required for connecting to the notebook tool
spark.dli.user.className
class_name
Not required for connecting to the notebook tool
spark.dli.user.scType
sc_type
Same as the native Livy configuration
spark.dli.user.args
args
Same as the native Livy configuration
spark.submit.pyFiles
python_files
Same as the native Livy configuration
spark.files
files
Same as the native Livy configuration
spark.dli.user.modules
modules
-
spark.dli.user.image
image
Custom image used for submitting a job. This parameter is available for container clusters only and is not set by default.
spark.dli.user.autoRecovery
auto_recovery
-
spark.dli.user.maxRetryTimes
max_retry_times
-
spark.dli.user.catalogName
catalog_name
To access metadata, set this parameter to dli.
- Run the following command to modify the /opt/livy/conf/livy-client.conf configuration file:
Starting DLI Livy
- Run the following command to go to the DLI Livy installation directory:
Example: cd /opt/livy
- Run the following command to start DLI Livy:
./bin/livy-server start
Submitting a Spark Job Using DLI Livy
The following demonstrates how to submit a Spark job to DLI using DLI Livy and by running the curl command.
- Upload the JAR file of the developed Spark job program to the OBS directory.
For example, upload spark-examples_2.11-XXXX.jar to the obs://bucket/path directory.
- Log in to the ECS server where DLI Livy is installed as user root.
- Run the curl command to submit a Spark job request to DLI using DLI Livy.
ECS_IP indicates the private IP address of the ECS where DLI Livy is installed.
curl --location --request POST 'http://ECS_IP:8998/batches' \ --header 'Content-Type: application/json' \ --data-raw '{ "driverMemory": "3G", "driverCores": 1, "executorMemory": "2G", "executorCores": 1, "numExecutors": 1, "args": [ "1000" ], "file": "obs://bucket/path/spark-examples_2.11-XXXX.jar", "className": "org.apache.spark.examples.SparkPi", "conf": { "spark.dynamicAllocation.minExecutors": 1, "spark.executor.instances": 1, "spark.dynamicAllocation.initialExecutors": 1, "spark.dynamicAllocation.maxExecutors": 2 } }'
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.