Updated on 2024-08-20 GMT+08:00

Overview

DLI Job Type

DLI provides the following job types:

  • SQL job: SQL jobs provide you with standard SQL statements and are compatible with Spark SQL and Presto SQL (based on Presto). You can query and analyze heterogeneous data sources on the cloud through visualized APIs, JDBC, ODBC, or Beeline. SQL jobs are compatible with mainstream data formats such as CSV, JSON, Parquet, Carbon, and ORC.
  • Flink job: Flink jobs are real-time streaming big data analysis service jobs running on the public cloud. In full hosting mode, you only need to focus on Stream SQL services and execute jobs instantly without being aware of compute clusters. Flink jobs are fully compatible with Apache Flink APIs.
  • Spark job: Spark jobs provide fully-managed Spark compute services. You can submit jobs through the GUI or RESTful APIs. Full-stack Spark jobs, such as Spark Core, DataSet, Streaming, MLlib, and GraphX jobs, are supported.

Constraints

  • DLI supports the following types of jobs: Spark SQL, Spark Jar, Flink SQL, and Flink Jar.
  • DLI supports the following Spark versions: Spark 3.3.1, Spark 3.1.1 (EOM), Spark 2.4.5 (EOM), and Spark 2.3 (EOS).
  • DLI supports the following Flink versions: Flink Jar 1.15, Flink 1.12 (EOM), Flink 1.10 (EOS), and Flink 1.7 (EOS).
  • SQL jobs support the Spark and Trino engines.
    • Spark: displays jobs whose execution engine is Spark.
    • Trino: displays jobs whose execution engine is Trino.
  • SparkUI can only display the latest 100 jobs.
  • A maximum of 1,000 job results can be displayed on the console. To view more or all jobs, export the job data to OBS.
  • To export job run logs, you must have the permission to access OBS buckets. You need to configure a DLI job bucket on the Global Configuration > Project page in advance.
  • The View Log button is not available for synchronization jobs and jobs running on the default queue.
  • Only Spark jobs support custom images.
  • An elastic resource pool supports a maximum of 32,000 CUs.
  • Minimum CUs of a queue that can be created in an elastic resource pool:
    • General purpose queue: 4 CUs
    • SQL queue: Spark SQL queue: 8 CUs; Trino SQL queue: 16 CUs