Updated on 2024-08-20 GMT+08:00

Notes and Constraints

On Jobs

  • DLI supports the following types of jobs: Spark SQL, Spark Jar, Flink SQL, and Flink Jar.
  • DLI supports the following Spark versions: Spark 3.3.1, Spark 3.1.1 (EOM), Spark 2.4.5 (EOM), and Spark 2.3 (EOS).
  • DLI supports the following Flink versions: Flink Jar 1.15, Flink 1.12 (EOM), Flink 1.10 (EOS), and Flink 1.7 (EOS).
  • SQL jobs support the Spark and Trino engines.
    • Spark: displays jobs whose execution engine is Spark.
    • Trino: displays jobs whose execution engine is Trino.
  • SparkUI can only display the latest 100 jobs.
  • A maximum of 1,000 job results can be displayed on the console. To view more or all jobs, export the job data to OBS.
  • To export job run logs, you must have the permission to access OBS buckets. You need to configure a DLI job bucket on the Global Configuration > Project page in advance.
  • The View Log button is not available for synchronization jobs and jobs running on the default queue.
  • Only Spark jobs support custom images.
  • An elastic resource pool supports a maximum of 32,000 CUs.
  • Minimum CUs of a queue that can be created in an elastic resource pool:
    • General purpose queue: 4 CUs
    • SQL queue: Spark SQL queue: 8 CUs; Trino SQL queue: 16 CUs

For more notes and constraints on jobs, see Job Management.

On Queues

  • A queue named default is preset in DLI for you to experience. Resources are allocated on demand. You are billed based on the amount of data scanned in each job (unit: GB).
  • Queue types:
    • For SQL: Spark SQL jobs can be submitted to SQL queues.
    • For general purpose: The queue is used to run Spark programs, Flink SQL jobs, and Flink Jar jobs.

    The queue type cannot be changed. If you want to use another queue type, purchase a new queue.

  • The billing mode of a queue cannot be changed.
  • The region of a queue cannot be changed.
  • Queues with 16 CUs do not support scale-out or scale-in.
  • Queues with 64 CUs do not support scale-in.
  • When creating a queue, you can only select cross-AZ active-active for yearly/monthly queues and pay-per-use dedicated queues. The price of a cross-AZ queue is twice that of a single-AZ queue.
  • Newly created queues need to run jobs before they can be scaled in or out.
  • DLI queues cannot access the Internet.

    For how to access the Internet from an elastic resource pool, see Configuring the Connection Between a DLI Queue and a Data Source on the Internet.

For more notes and constraints on using a DLI queue, see Notes and Constraints on Using a Queue.

On DLI Storage Resources

DLI can store databases and tables. DLI storage is billed based on the amount of stored data.

On Resources

  • Database
    • default is the database built in DLI. You cannot create a database named default.
    • DLI supports a maximum of 50 databases.
  • Table
    • DLI supports a maximum of 5,000 tables.
    • DLI supports the following table types:
      • MANAGED: Data is stored in a DLI table.
      • EXTERNAL: Data is stored in an OBS table.
      • View: A view can only be created using SQL statements.
      • Datasource table: The table type is also EXTERNAL.
    • You cannot specify a storage path when creating a DLI table.
  • Data import
    • Only OBS data can be imported to DLI or OBS.
    • You can import data in CSV, Parquet, ORC, JSON, or Avro format from OBS to tables created on DLI.
    • To import data in CSV format to a partitioned table, place the partition column in the last column of the data source.
    • The encoding format of imported data can only be UTF-8.
  • Data export
    • Data in DLI tables (whose table type is MANAGED) can only be exported to OBS buckets, and the export path must contain a folder.
    • The exported file is in JSON format, and the text format can only be UTF-8.
    • Data can be exported across accounts. That is, after account B authorizes account A, account A has the permission to read the metadata and permission information of account B's OBS bucket as well as the read and write permissions on the path. Account A can export data to the OBS path of account B.
  • Package
    • A package can be deleted, but a package group cannot be deleted.
    • The following types of packages can be uploaded:
      • JAR: JAR file
      • PyFile: User Python file
      • File: User file
      • ModelFile: User AI model file

For more notes and constraints on resources, see Data Management.

On Enhanced Datasource Connections

  • Datasource connections cannot be created for the default queue.
  • Flink jobs can directly access DIS, OBS, and SMN data sources without using datasource connections.
  • Enhanced connections can only be created for yearly/monthly and pay-per-use queues.
  • VPC Administrator permissions are required for enhanced connections to use VPCs, subnets, routes, VPC peering connections.

    You can set these permissions by referring to Service Authorization.

  • If you use an enhanced datasource connection, the CIDR block of the elastic resource pool or queue cannot overlap with that of the data source.
  • Only queues bound with datasource connections can access datasource tables.
  • Datasource tables do not support the preview function.
  • When checking the connectivity of datasource connections, the notes and constraints on IP addresses are:
    • The IP address must be valid, which consists of four decimal numbers separated by periods (.). The value ranges from 0 to 255.
    • During the test, you can add a port after the IP address and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, 192.168.xx.xx or 192.168.xx.xx:8181.

  • When checking the connectivity of datasource connections, the notes and constraints on domain names are:
    • The domain name can contain 1 to 255 characters. Only letters, numbers, underscores (_), and hyphens (-) are allowed.
    • The top-level domain name must contain at least two letters, for example, .com, .net, and .cn.
    • During the test, you can add a port after the domain name and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, example.com:8080.

For more notes and constraints on enhanced datasource connections, see Enhanced Datasource Connection Overview.

On Datasource Authentication

  • Only Spark SQL and Flink OpenSource SQL 1.12 jobs support datasource authentication.
  • Flink jobs can use datasource authentication only on queues created after May 1, 2023.
  • DLI supports four types of datasource authentication. Select an authentication type specific to each data source.
    • CSS: applies to 6.5.4 or later CSS clusters with the security mode enabled.
    • Kerberos: applies to MRS security clusters with Kerberos authentication enabled.
    • Kafka_SSL: applies to Kafka with SSL enabled.
    • Password: applies to GaussDB(DWS), RDS, DDS, and DCS.

For more notes and constraints on datasource authentication, see Datasource Authentication Introduction.

On SQL Syntax

  • Constraints on the SQL syntax:
    • You are not allowed to specify a storage path when creating a DLI table using SQL statements.
  • Constraints on the size of SQL statements:
    • Each SQL statement should contain less than 500,000 characters.
    • The size of each SQL statement must be less than 1 MB.

Other

  • For quota notes and constraints, see Quotas.
  • Recommended browsers for logging in to DLI:
    • Google Chrome 43.0 or later
    • Mozilla Firefox 38.0 or later
    • Internet Explorer 9.0 or later

    For details about the compatibility list of more browsers, see Which Browsers Are Supported?