- What's New
- Function Overview
- Product Bulletin
- Service Overview
-
Billing
- Billing Overview
- Billing for Compute Resources
- Billing for Storage Resources
- Billing for Scanned Data
- Package Billing
- Billing Examples
- Renewing Subscriptions
- Bills
- Arrears
- Billing Termination
-
Billing FAQ
- What Billing Modes Does DLI Offer?
- When Is a Data Lake Queue Idle?
- How Do I Troubleshoot DLI Billing Issues?
- Why Am I Still Being Billed on a Pay-per-Use Basis After I Purchased a Package?
- How Do I View the Usage of a Package?
- How Do I View a Job's Scanned Data Volume?
- Would a Pay-Per-Use Elastic Resource Pool Not Be Billed if No Job Is Submitted for Execution?
- Do I Need to Pay Extra Fees for Purchasing a Queue Billed Based on the Scanned Data Volume?
- How Is the Usage Beyond the Package Limit Billed?
- What Are the Actual CUs, CU Range, and Specifications of an Elastic Resource Pool?
- Change History
- Getting Started
-
User Guide
- DLI Job Development Process
- Preparations
-
Creating an Elastic Resource Pool and Queues Within It
- Overview of DLI Elastic Resource Pools and Queues
- Creating an Elastic Resource Pool and Creating Queues Within It
- Managing Elastic Resource Pools
-
Managing Queues
- Viewing Basic Information About a Queue
- Queue Permission Management
- Allocating a Queue to an Enterprise Project
- Creating an SMN Topic
- Managing Queue Tags
- Setting Queue Properties
- Testing Address Connectivity
- Deleting a Queue
- Auto Scaling of Standard Queues
- Setting a Scheduled Auto Scaling Task for a Standard Queue
- Changing the CIDR Block for a Standard Queue
- Example Use Case: Creating an Elastic Resource Pool and Running Jobs
- Example Use Case: Configuring Scaling Policies for Queues in an Elastic Resource Pool
- Creating a Non-Elastic Resource Pool Queue (Discarded and Not Recommended)
- Creating Databases and Tables
-
Data Migration and Transmission
- Overview
-
Migrating Data from External Data Sources to DLI
- Overview of Data Migration Scenarios
- Using CDM to Migrate Data to DLI
- Example Typical Scenario: Migrating Data from Hive to DLI
- Example Typical Scenario: Migrating Data from Kafka to DLI
- Example Typical Scenario: Migrating Data from Elasticsearch to DLI
- Example Typical Scenario: Migrating Data from RDS to DLI
- Example Typical Scenario: Migrating Data from GaussDB(DWS) to DLI
-
Configuring DLI to Read and Write Data from and to External Data Sources
- Configuring DLI to Read and Write External Data Sources
- Configuring the Network Connection Between DLI and Data Sources (Enhanced Datasource Connection)
- Using DEW to Manage Access Credentials for Data Sources
- Using DLI Datasource Authentication to Manage Access Credentials for Data Sources
-
Managing Enhanced Datasource Connections
- Viewing Basic Information About an Enhanced Datasource Connection
- Enhanced Connection Permission Management
- Binding an Enhanced Datasource Connection to an Elastic Resource Pool
- Unbinding an Enhanced Datasource Connection from an Elastic Resource Pool
- Adding a Route for an Enhanced Datasource Connection
- Deleting the Route for an Enhanced Datasource Connection
- Modifying Host Information in an Elastic Resource Pool
- Enhanced Datasource Connection Tag Management
- Deleting an Enhanced Datasource Connection
- Example Typical Scenario: Connecting DLI to a Data Source on a Private Network
- Example Typical Scenario: Connecting DLI to a Data Source on a Public Network
- Configuring an Agency to Allow DLI to Access Other Cloud Services
- Submitting a SQL Job Using DLI
- Submitting a Flink Job Using DLI
- Submitting a Spark Job Using DLI
- Submitting a DLI Job Using a Notebook Instance
- Using Cloud Eye to Monitor DLI
- Using CTS to Audit DLI
- Permissions Management
- Common DLI Management Operations
-
Best Practices
- Overview
- Analyzing Driving Behavior Data in IoV Scenarios Using DLI
- Converting Data Format from CSV to Parquet
- Analyzing E-Commerce BI Reports Using DLI
- Analyzing Billing Consumption Data Using DLI
- Analyzing Real-time E-Commerce Business Data Using DLI
-
Connecting BI Tools to DLI for Data Analysis
- Overview
- Connecting DBeaver to DLI for Data Analysis and Query
- Connecting DBT to DLI for Data Scheduling and Analysis
- Connecting Yonghong BI to DLI for Data Query and Analysis
- Connecting Power BI to DLI Using Kyuubi for Data Analysis and Query
- Connecting FineBI to DLI Using Kyuubi for Data Analysis and Query
- Connecting Superset to DLI Using Kyuubi for Data Analysis and Query
- Connecting Tableau to DLI Using Kyuubi for Data Analysis and Query
- Connecting Beeline to DLI Using Kyuubi for Data Analysis and Query
-
Developer Guide
- Connecting to DLI Using a Client
- SQL Jobs
-
Flink Jobs
- Stream Ecosystem
-
Flink OpenSource SQL Jobs
- Reading Data from Kafka and Writing Data to RDS
- Reading Data from Kafka and Writing Data to GaussDB(DWS)
- Reading Data from Kafka and Writing Data to Elasticsearch
- Reading Data from MySQL CDC and Writing Data to GaussDB(DWS)
- Reading Data from PostgreSQL CDC and Writing Data to GaussDB(DWS)
- Configuring High-Reliability Flink Jobs (Automatic Restart upon Exceptions)
- Flink Jar Job Examples
- Writing Data to OBS Using Flink Jar
- Using Flink Jar to Connect to Kafka that Uses SASL_SSL Authentication
- Using Flink Jar to Read and Write Data from and to DIS
- Flink Job Agencies
-
Spark Jar Jobs
- Using Spark Jar Jobs to Read and Query OBS Data
- Using the Spark Job to Access DLI Metadata
- Using Spark Jobs to Access Data Sources of Datasource Connections
- Spark Jar Jobs Using DEW to Acquire Access Credentials for Reading and Writing Data from and to OBS
- Obtaining Temporary Credentials from a Spark Job's Agency for Accessing Other Cloud Services
-
Spark SQL Syntax Reference
- Common Configuration Items
- Spark SQL Syntax
- Spark Open Source Commands
- Databases
-
Tables
- Creating an OBS Table
- Creating a DLI Table
- Deleting a Table
- Viewing a Table
- Modifying a Table
-
Partition-related Syntax
- Adding Partition Data (Only OBS Tables Supported)
- Renaming a Partition (Only OBS Tables Supported)
- Deleting a Partition
- Deleting Partitions by Specifying Filter Criteria (Only Supported on OBS Tables)
- Altering the Partition Location of a Table (Only OBS Tables Supported)
- Updating Partitioned Table Data (Only OBS Tables Supported)
- Updating Table Metadata with REFRESH TABLE
- Backing Up and Restoring Data of Multiple Versions
- Table Lifecycle Management
- Data
- Exporting Query Results
-
Datasource Connections
- Creating a Datasource Connection with an HBase Table
- Creating a Datasource Connection with an OpenTSDB Table
- Creating a Datasource Connection with a DWS Table
- Creating a Datasource Connection with an RDS Table
- Creating a Datasource Connection with a CSS Table
- Creating a Datasource Connection with a DCS Table
- Creating a Datasource Connection with a DDS Table
- Creating a Datasource Connection with an Oracle Table
- Views
- Viewing the Execution Plan
- Data Permissions
- Data Types
- User-Defined Functions
-
Built-In Functions
-
Date Functions
- Overview
- add_months
- current_date
- current_timestamp
- date_add
- dateadd
- date_sub
- date_format
- datediff
- datediff1
- datepart
- datetrunc
- day/dayofmonth
- from_unixtime
- from_utc_timestamp
- getdate
- hour
- isdate
- last_day
- lastday
- minute
- month
- months_between
- next_day
- quarter
- second
- to_char
- to_date
- to_date1
- to_utc_timestamp
- trunc
- unix_timestamp
- weekday
- weekofyear
- year
-
String Functions
- Overview
- ascii
- concat
- concat_ws
- char_matchcount
- encode
- find_in_set
- get_json_object
- instr
- instr1
- initcap
- keyvalue
- length
- lengthb
- levenshtein
- locate
- lower/lcase
- lpad
- ltrim
- parse_url
- printf
- regexp_count
- regexp_extract
- replace
- regexp_replace
- regexp_replace1
- regexp_instr
- regexp_substr
- repeat
- reverse
- rpad
- rtrim
- soundex
- space
- substr/substring
- substring_index
- split_part
- translate
- trim
- upper/ucase
- Mathematical Functions
- Aggregate Functions
- Window Functions
- Other Functions
-
Date Functions
- SELECT
-
Identifiers
- aggregate_func
- alias
- attr_expr
- attr_expr_list
- attrs_value_set_expr
- boolean_expression
- class_name
- col
- col_comment
- col_name
- col_name_list
- condition
- condition_list
- cte_name
- data_type
- db_comment
- db_name
- else_result_expression
- file_format
- file_path
- function_name
- groupby_expression
- having_condition
- hdfs_path
- input_expression
- input_format_classname
- jar_path
- join_condition
- non_equi_join_condition
- number
- num_buckets
- output_format_classname
- partition_col_name
- partition_col_value
- partition_specs
- property_name
- property_value
- regex_expression
- result_expression
- row_format
- select_statement
- separator
- serde_name
- sql_containing_cte_name
- sub_query
- table_comment
- table_name
- table_properties
- table_reference
- view_name
- view_properties
- when_expression
- where_condition
- window_function
- Operators
-
Flink SQL Syntax Reference
-
Flink OpenSource SQL 1.15 Syntax Reference
- Constraints and Definitions
- Overview
- Flink OpenSource SQL 1.15 Usage
- Formats
- Connectors
- DML Snytax
-
Functions
- UDFs
- Type Inference
- Parameter Transfer
-
Built-In Functions
- Comparison Functions
- Logical Functions
- Arithmetic Functions
- String Functions
- Temporal Functions
- Conditional Functions
- Type Conversion Functions
- Collection Functions
- JSON Functions
- Value Construction Functions
- Value Retrieval Functions
- Grouping Functions
- Hash Functions
- Aggregate Functions
- Table-Valued Functions
- Flink OpenSource SQL 1.12 Syntax Reference
-
Flink Opensource SQL 1.10 Syntax Reference
- Constraints and Definitions
- Flink OpenSource SQL 1.10 Syntax
-
Data Definition Language (DDL)
- Creating a Source Table
-
Creating a Result Table
- ClickHouse Result Table
- Kafka Result Table
- Upsert Kafka Result Table
- DIS Result Table
- JDBC Result Table
- GaussDB(DWS) Result Table
- Redis Result Table
- SMN Result Table
- HBase Result Table
- Elasticsearch Result Table
- OpenTSDB Result Table
- User-defined Result Table
- Print Result Table
- File System Result Table
- Creating a Dimension Table
- Data Manipulation Language (DML)
- Functions
-
Flink OpenSource SQL 1.15 Syntax Reference
-
HetuEngine SQL Syntax Reference
-
HetuEngine SQL Syntax
- Before You Start
- Data Type
-
DDL Syntax
- CREATE SCHEMA
- CREATE TABLE
- CREATE TABLE AS
- CREATE TABLE LIKE
- CREATE VIEW
- ALTER TABLE
- ALTER VIEW
- ALTER SCHEMA
- DROP SCHEMA
- DROP TABLE
- DROP VIEW
- TRUNCATE TABLE
- COMMENT
- VALUES
- SHOW Syntax Overview
- SHOW SCHEMAS (DATABASES)
- SHOW TABLES
- SHOW TBLPROPERTIES TABLE|VIEW
- SHOW TABLE/PARTITION EXTENDED
- SHOW FUNCTIONS
- SHOW PARTITIONS
- SHOW COLUMNS
- SHOW CREATE TABLE
- SHOW VIEWS
- SHOW CREATE VIEW
- DML Syntax
- DQL Syntax
- Auxiliary Command Syntax
- Reserved Keywords
-
SQL Functions and Operators
- Logical Operators
- Comparison Functions and Operators
- Condition Expression
- Lambda Expression
- Conversion Functions
- Mathematical Functions and Operators
- Bitwise Functions
- Decimal Functions and Operators
- String Functions and Operators
- Regular Expressions
- Binary Functions and Operators
- JSON Functions and Operators
- Date and Time Functions and Operators
- Aggregate Functions
- Window Functions
- Array Functions and Operators
- Map Functions and Operators
- URL Function
- UUID Function
- Color Function
- Teradata Function
- Data Masking Functions
- IP Address Functions
- Quantile Digest Functions
- T-Digest Functions
- Implicit Data Type Conversion
- Appendix
-
HetuEngine SQL Syntax
- Delta SQL Syntax Reference
-
API Reference
- Before You Start
- Overview
- Calling APIs
- Getting Started
- Permission-related APIs
- Global Variable-related APIs
- APIs Related to Resource Tags
-
APIs Related to Enhanced Datasource Connections
- Creating an Enhanced Datasource Connection
- Deleting an Enhanced Datasource Connection
- Listing Enhanced Datasource Connections
- Querying an Enhanced Datasource Connection
- Binding a Queue
- Unbinding a Queue
- Modifying Host Information
- Querying Authorization of an Enhanced Datasource Connection
- Creating a Route
- Deleting a Route
- Datasource Authentication-related APIs
-
APIs Related to Elastic Resource Pools
- Creating an Elastic Resource Pool
- Querying All Elastic Resource Pools
- Deleting an Elastic Resource Pool
- Modifying Elastic Resource Pool Information
- Querying All Queues in an Elastic Resource Pool
- Associating a Queue with an Elastic Resource Pool
- Viewing Scaling History of an Elastic Resource Pool
- Modifying the Scaling Policy of a Queue Associated with an Elastic Resource Pool
- Queue-related APIs (Recommended)
- SQL Job-related APIs
- SQL Template-related APIs
-
Flink Job-related APIs
- Creating a SQL Job
- Updating a SQL Job
- Creating a Flink Jar job
- Updating a Flink Jar Job
- Running Jobs in Batches
- Listing Jobs
- Querying Job Details
- Querying the Job Execution Plan
- Stopping Jobs in Batches
- Deleting a Job
- Deleting Jobs in Batches
- Exporting a Flink Job
- Importing a Flink Job
- Generating a Static Stream Graph for a Flink SQL Job
- APIs Related to Flink Job Templates
- Flink Job Management APIs
- Spark Job-related APIs
- APIs Related to Spark Job Templates
- Permissions Policies and Supported Actions
-
Out-of-Date APIs
- Agency-related APIs (Discarded)
-
Package Group-related APIs (Discarded)
- Uploading a Package Group (Discarded)
- Listing Package Groups (Discarded)
- Uploading a JAR Package Group (Discarded)
- Uploading a PyFile Package Group (Discarded)
- Uploading a File Package Group (Discarded)
- Querying Resource Packages in a Group (Discarded)
- Deleting a Resource Package from a Group (Discarded)
- Changing the Owner of a Group or Resource Package (Discarded)
- APIs Related to Spark Batch Processing (Discarded)
- SQL Job-related APIs (Discarded)
- Resource-related APIs (Discarded)
- Permission-related APIs (Discarded)
- Queue-related APIs (Discarded)
- Datasource Authentication-related APIs (Discarded)
- APIs Related to Enhanced Datasource Connections (Discarded)
- Template-related APIs (Discarded)
- Table-related APIs (Discarded)
- APIs Related to SQL Jobs (Discarded)
- APIs Related to Data Upload (Discarded)
- Cluster-related APIs
- APIs Related to Flink Jobs (Discarded)
- Public Parameters
- SDK Reference
-
FAQs
-
DLI Basics
- What Are the Differences Between DLI Flink and MRS Flink?
- What Are the Differences Between MRS Spark and DLI Spark?
- How Do I Upgrade the Engine Version of a DLI Job?
- Where Can Data Be Stored in DLI?
- Can I Import OBS Bucket Data Shared by Other Tenants into DLI?
- Regions and AZs
- Can a Member Account Use Global Variables Created by Other Member Accounts?
- Is DLI Affected by the Apache Spark Command Injection Vulnerability (CVE-2022-33891)?
- How Do I Manage Jobs Running on DLI?
- How Do I Change the Field Names of an Existing Table on DLI?
-
DLI Elastic Resource Pools and Queues
- How Can I Check the Actual and Used CUs for an Elastic Resource Pool as Well as the Required CUs for a Job?
- How Do I Check for a Backlog of Jobs in the Current DLI Queue?
- How Do I View the Load of a DLI Queue?
- How Do I Monitor Job Exceptions on a DLI Queue?
- How Do I Migrate an Old Version Spark Queue to a General-Purpose Queue?
- How Do I Do If I Encounter a Timeout Exception When Executing DLI SQL Statements on the default Queue?
-
DLI Databases and Tables
- Why Am I Unable to Query a Table on the DLI Console?
- How Do I Do If the Compression Rate of an OBS Table Is High?
- How Do I Do If Inconsistent Character Encoding Leads to Garbled Characters?
- Do I Need to to Regrant Permissions to Users and Projects After Deleting and Recreating a Table With the Same Name?
- How Do I Do If Files Imported Into a DLI Partitioned Table Lack Data for the Partition Columns, Causing Query Failures After the Import Is Completed?
- How Do I Fix Incorrect Data in an OBS Foreign Table Caused by Newline Characters in OBS File Fields?
- How Do I Prevent a Cartesian Product Query and Resource Overload Due to Missing "ON" Conditions in Table Joins?
- How Do I Do If I Can't Query Data After Manually Adding It to the Partition Directory of an OBS Table?
- Why Does the "insert overwrite" Operation Affect All Data in a Partitioned Table Instead of Just the Targeted Partition?
- Why Does the "create_date" Field in an RDS Table (Datetime Data Type) Appear as a Timestamp in DLI Queries?
- How Do I Do If Renaming a Table After a SQL Job Causes Incorrect Data Size?
- How Can I Resolve Data Inconsistencies When Importing Data from DLI to OBS?
-
Enhanced Datasource Connections
- How Do I Do If I Can't Bind an Enhanced Datasource Connection to a Queue?
- How Do I Resolve a Failure in Connecting DLI to GaussDB(DWS) Through an Enhanced Datasource Connection?
- How Do I Do If the Datasource Connection Is Successfully Created but the Network Connectivity Test Fails?
- How Do I Configure Network Connectivity Between a DLI Queue and a Data Source?
- Why Is Creating a VPC Peering Connection Necessary for Enhanced Datasource Connections in DLI?
- How Do I Do If Creating a Datasource Connection in DLI Gets Stuck in the "Creating" State When Binding It to a Queue?
- How Do I Resolve the "communication link failure" Error When Using a Newly Created Datasource Connection That Appears to Be Activated?
- How Do I Troubleshoot a Connection Timeout Issue That Isn't Recorded in Logs When Accessing MRS HBase Through a Datasource Connection?
- How Do I Fix the "Failed to get subnet" Error When Creating a Datasource Connection in DLI?
- How Do I Do If I Encounter the "Incorrect string value" Error When Executing insert overwrite on a Datasource RDS Table?
- How Do I Resolve the Null Pointer Error When Creating an RDS Datasource Table?
- Error Message "org.postgresql.util.PSQLException: ERROR: tuple concurrently updated" Is Displayed When the System Executes insert overwrite on a Datasource GaussDB(DWS) Table
- RegionTooBusyException Is Reported When Data Is Imported to a CloudTable HBase Table Through a Datasource Table
- How Do I Do If A Null Value Is Written Into a Non-Null Field When Using a DLI Datasource Connection to Connect to a GaussDB(DWS) Table?
- How Do I Do If an Insert Operation Failed After the Schema of the GaussDB(DWS) Source Table Is Updated?
- How Do I Insert Data into an RDS Table with an Auto-Increment Primary Key Using DLI?
-
SQL Jobs
-
SQL Job Development
- SQL Jobs
- How Do I Merge Small Files?
- How Do I Use DLI to Access Data in an OBS Bucket?
- How Do I Specify an OBS Path When Creating an OBS Table?
- How Do I Create a Table Using JSON Data in an OBS Bucket?
- How Can I Use the count Function to Perform Aggregation?
- How Do I Synchronize DLI Table Data Across Regions?
- How Do I Insert Table Data into Specific Fields of a Table Using a SQL Job?
- How Do I Troubleshoot Slow SQL Jobs?
- How Do I View DLI SQL Logs?
- How Do I View SQL Execution Records in DLI?
- How Do I Do When Data Skew Occurs During the Execution of a SQL Job?
- Why Does a SQL Job That Has Join Operations Stay in the Running State?
- Why Is a SQL Job Stuck in the Submitting State?
-
SQL Job O&M
- Why Is Error "path obs://xxx already exists" Reported When Data Is Exported to OBS?
- Why Is Error "SQL_ANALYSIS_ERROR: Reference 't.id' is ambiguous, could be: t.id, t.id.;" Displayed When Two Tables Are Joined?
- Why Is Error "The current account does not have permission to perform this operation,the current account was restricted. Restricted for no budget." Reported when a SQL Statement Is Executed?
- Why Is Error "There should be at least one partition pruning predicate on partitioned table XX.YYY" Reported When a Query Statement Is Executed?
- Why Is Error "IllegalArgumentException: Buffer size too small. size" Reported When Data Is Loaded to an OBS Foreign Table?
- Why Is Error "DLI.0002 FileNotFoundException" Reported During SQL Job Running?
- Why Is a Schema Parsing Error Reported When I Create a Hive Table Using CTAS?
- Why Is Error "org.apache.hadoop.fs.obs.OBSIOException" Reported When I Run DLI SQL Scripts on DataArts Studio?
- Why Is Error "UQUERY_CONNECTOR_0001:Invoke DLI service api failed" Reported in the Job Log When I Use CDM to Migrate Data to DLI?
- Why Is Error "File not Found" Reported When I Access a SQL Job?
- Why Is Error "DLI.0003: AccessControlException XXX" Reported When I Access a SQL Job?
- Why Is Error "DLI.0001: org.apache.hadoop.security.AccessControlException: verifyBucketExists on {{bucket name}}: status [403]" Reported When I Access a SQL Job?
- Why Am I Seeing the Error Message "The current account does not have permission to perform this operation,the current account was restricted. Restricted for no budget" When Executing a SQL Statement?
-
SQL Job Development
-
Flink Jobs
-
Flink Job Consulting
- How Do I Authorize a Subuser to View Flink Jobs?
- How Do I Configure Auto Restart upon Exception for a Flink Job?
- How Do I Save Logs for Flink Jobs?
- Why Is Error "No such user. userName:xxxx." Reported on the Flink Job Management Page When I Grant Permission to a User?
- How Do I Restore a Flink Job from a Specific Checkpoint After Manually Stopping the Job?
- Why Is a Message Displayed Indicating That the SMN Topic Does Not Exist When I Use the SMN Topic in DLI?
-
Flink SQL Jobs
- How Do I Map an OBS Table to a DLI Partitioned Table?
- How Do I Change the Number of Kafka Partitions in a Flink SQL Job Without Stopping It?
- How Do I Fix the DLI.0005 Error When Using EL Expressions to Create a Table in a Flink SQL Job?
- Why Is No Data Queried in the DLI Table Created Using the OBS File Path When Data Is Written to OBS by a Flink Job Output Stream?
- Why Does a Flink SQL Job Fails to Be Executed, and Is "connect to DIS failed java.lang.IllegalArgumentException: Access key cannot be null" Displayed in the Log?
- Data Writing Fails After a Flink SQL Job Consumed Kafka and Sank Data to the Elasticsearch Cluster
- How Does Flink Opensource SQL Parse Nested JSON?
- Why Is the Time Read by a Flink OpenSource SQL Job from the RDS Database Is Different from the RDS Database Time?
- Why Does Job Submission Fail When the failure-handler Parameter of the Elasticsearch Result Table for a Flink Opensource SQL Job Is Set to retry_rejected?
- How Do I Configure Connection Retries for Kafka Sink If it is Disconnected?
- How Do I Write Data to Different Elasticsearch Clusters in a Flink Job?
- Why Does DIS Stream Not Exist During Job Semantic Check?
- Why Is Error "Timeout expired while fetching topic metadata" Repeatedly Reported in Flink JobManager Logs?
-
Flink Jar Jobs
- Can I Upload Configuration Files for Flink Jar Jobs?
- Why Does a Flink Jar Package Conflict Result in Job Submission Failure?
- Why Does a Flink Jar Job Fail to Access GaussDB(DWS) and a Message Is Displayed Indicating Too Many Client Connections?
- Why Is Error Message "Authentication failed" Displayed During Flink Jar Job Running?
- Why Is Error Invalid OBS Bucket Name Reported After a Flink Job Submission Failed?
- Why Does the Flink Submission Fail Due to Hadoop JAR File Conflict?
- How Do I Locate a Flink Job Submission Error?
-
Flink Job Performance Tuning
- What Is the Recommended Configuration for a Flink Job?
- Flink Job Performance Tuning
- How Do I Prevent Data Loss After Flink Job Restart?
- How Do I Locate a Flink Job Running Error?
- How Can I Check if a Flink Job Can Be Restored From a Checkpoint After Restarting It?
- Why Are Logs Not Written to the OBS Bucket After a DLI Flink Job Fails to Be Submitted for Running?
- Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?
-
Flink Job Consulting
-
Spark Jobs
-
Spark Job Development
- Spark Jobs
- How Do I Use Spark to Write Data into a DLI Table?
- How Do I Set Up AK/SK So That a General Queue Can Access Tables Stored in OBS?
- How Do I View the Resource Usage of DLI Spark Jobs?
- How Do I Use Python Scripts to Access the MySQL Database If the pymysql Module Is Missing from the Spark Job Results Stored in MySQL?
- How Do I Run a Complex PySpark Program in DLI?
- How Do I Use JDBC to Set the spark.sql.shuffle.partitions Parameter to Improve the Task Concurrency?
- How Do I Read Uploaded Files for a Spark Jar Job?
- Why Can't I Find the Specified Python Environment After Adding the Python Package?
- Why Is a Spark Jar Job Stuck in the Submitting State?
-
Spark Job O&M
- What Can I Do When Receiving java.lang.AbstractMethodError in the Spark Job?
- Why Do I Get "ResponseCode: 403" and "ResponseStatus: Forbidden" Errors When a Spark Job Accesses OBS Data?
- Why Do I Encounter the Error "verifyBucketExists on XXXX: status [403]" When Using a Spark Job to Access an OBS Bucket That I Have Permission to Access?
- Why Does a Job Running Timeout Occur When Processing a Large Amount of Data with a Spark Job?
- Why Does a Spark Job Fail to Execute with an Abnormal Access Directory Error When Accessing Files in SFTP?
- Why Does the Job Fail to Be Executed Due to Insufficient Database and Table Permissions?
- Why Is the global_temp Database Missing in the Job Log of Spark 3.x?
- Why Does Using DataSource Syntax to Create an OBS Table of Avro Type Fail When Accessing Metadata With Spark 2.3.x?
-
Spark Job Development
- DLI Resource Quotas
-
DLI Permissions Management
- How Do I Do If I Receive an Error Message Stating That I Do Not Have Sufficient Permissions When Creating a Table After Upgrading the Engine Version of a Queue?
- What Is Column-Level Authorization for DLI Partitioned Tables?
- How Do I Do If I Encounter Insufficient Permissions While Updating Packages?
- Why Is Error "DLI.0003: Permission denied for resource..." Reported When I Run a SQL Statement?
- How Do I Do If I Can't Query Table Data After Being Granted Table Permissions?
- Will Granting Duplicate Permissions to a Table After Inheriting Database Permissions Cause an Error?
- Why Can't I Query a View After I'm Granted the Select Table Permission on the View?
- How Do I Do If I Receive a Message Saying I Don't Have Sufficient Permissions to Submit My Jobs to the Job Bucket?
- How Do I Resolve an Unauthorized OBS Bucket Error?
- DLI APIs
-
DLI Basics
-
More Documents
-
User Guide (ME-Abu Dhabi Region)
- DLI Introduction
- Getting Started
- DLI Console Overview
- SQL Editor
- Job Management
- Queue Management
- Data Management
- Job Templates
- Datasource Connections
- Global Configuration
- UDFs
- Permissions Management
- Change History
-
API Reference (ME-Abu Dhabi Region)
- Before You Start
- Overview
- Calling APIs
- Getting Started
- Permission-related APIs
- Queue-related APIs (Recommended)
- APIs Related to SQL Jobs
- Package Group-related APIs
-
APIs Related to Flink Jobs
- Granting OBS Permissions to DLI
- Creating a SQL Job
- Updating a SQL Job
- Creating a Flink Jar job
- Updating a Flink Jar Job
- Running Jobs in Batches
- Querying the Job List
- Querying Job Details
- Querying the Job Execution Plan
- Stopping Jobs in Batches
- Deleting a Job
- Deleting Jobs in Batches
- Exporting a Flink Job
- Importing a Flink Job
- APIs Related to Spark jobs
- APIs Related to Flink Job Templates
-
APIs Related to Enhanced Datasource Connections
- Creating an Enhanced Datasource Connection
- Deleting an Enhanced Datasource Connection
- Querying an Enhanced Datasource Connection List
- Querying an Enhanced Datasource Connection
- Binding a Queue
- Unbinding a Queue
- Modifying the Host Information
- Querying Authorization of an Enhanced Datasource Connection
- Global Variable-related APIs
- Public Parameters
- Change History
-
SQL Syntax Reference (ME-Abu Dhabi Region)
-
Spark SQL Syntax Reference
- Common Configuration Items of Batch SQL Jobs
- SQL Syntax Overview of Batch Jobs
- Spark Open Source Commands
- Databases
- Creating an OBS Table
- Creating a DLI Table
- Deleting a Table
- Checking Tables
- Modifying a Table
-
Syntax for Partitioning a Table
- Adding Partition Data (Only OBS Tables Supported)
- Renaming a Partition (Only OBS Tables Supported)
- Deleting a Partition
- Deleting Partitions by Specifying Filter Criteria (Only Supported on OBS Tables)
- Altering the Partition Location of a Table (Only OBS Tables Supported)
- Updating Partitioned Table Data (Only OBS Tables Supported)
- Updating Table Metadata with REFRESH TABLE
- Importing Data to the Table
- Inserting Data
- Clearing Data
- Exporting Search Results
- Backing Up and Restoring Data of Multiple Versions
- Table Lifecycle Management
- Creating a Datasource Connection with an HBase Table
- Creating a Datasource Connection with an OpenTSDB Table
- Creating a Datasource Connection with a DWS table
- Creating a Datasource Connection with an RDS Table
- Creating a Datasource Connection with a CSS Table
- Creating a Datasource Connection with a DCS Table
- Creating a Datasource Connection with a DDS Table
- Creating a Datasource Connection with an Oracle Table
- Views
- Checking the Execution Plan
- Data Permissions Management
- Data Types
- User-Defined Functions
-
Built-in Functions
-
Date Functions
- Overview
- add_months
- current_date
- current_timestamp
- date_add
- dateadd
- date_sub
- date_format
- datediff
- datediff1
- datepart
- datetrunc
- day/dayofmonth
- from_unixtime
- from_utc_timestamp
- getdate
- hour
- isdate
- last_day
- lastday
- minute
- month
- months_between
- next_day
- quarter
- second
- to_char
- to_date
- to_date1
- to_utc_timestamp
- trunc
- unix_timestamp
- weekday
- weekofyear
- year
-
String Functions
- Overview
- ascii
- concat
- concat_ws
- char_matchcount
- encode
- find_in_set
- get_json_object
- instr
- instr1
- initcap
- keyvalue
- length
- lengthb
- levenshtein
- locate
- lower/lcase
- lpad
- ltrim
- parse_url
- printf
- regexp_count
- regexp_extract
- replace
- regexp_replace
- regexp_replace1
- regexp_instr
- regexp_substr
- repeat
- reverse
- rpad
- rtrim
- soundex
- space
- substr/substring
- substring_index
- split_part
- translate
- trim
- upper/ucase
- Mathematical Functions
- Aggregate Functions
- Window Functions
- Other Functions
-
Date Functions
- Basic SELECT Statements
- Filtering
- Sorting
- Grouping
- JOIN
- Subquery
- Alias
- Set Operations
- WITH...AS
- CASE...WHEN
- OVER Clause
- Flink OpenSource SQL 1.12 Syntax Reference
-
Flink Opensource SQL 1.10 Syntax Reference
- Constraints and Definitions
- Flink OpenSource SQL 1.10 Syntax
-
Data Definition Language (DDL)
- Creating a Source Table
-
Creating a Result Table
- ClickHouse Result Table
- Kafka Result Table
- Upsert Kafka Result Table
- DIS Result Table
- JDBC Result Table
- GaussDB(DWS) Result Table
- Redis Result Table
- SMN Result Table
- HBase Result Table
- Elasticsearch Result Table
- OpenTSDB Result Table
- User-defined Result Table
- Print Result Table
- File System Result Table
- Creating a Dimension Table
- Data Manipulation Language (DML)
- Functions
-
Historical Versions
-
Flink SQL Syntax
- SQL Syntax Constraints and Definitions
- SQL Syntax Overview of Stream Jobs
- Creating a Source Stream
-
Creating a Sink Stream
- MRS OpenTSDB Sink Stream
- CSS Elasticsearch Sink Stream
- DCS Sink Stream
- DDS Sink Stream
- DIS Sink Stream
- DMS Sink Stream
- DWS Sink Stream (JDBC Mode)
- DWS Sink Stream (OBS-based Dumping)
- MRS HBase Sink Stream
- MRS Kafka Sink Stream
- Open-Source Kafka Sink Stream
- File System Sink Stream (Recommended)
- OBS Sink Stream
- RDS Sink Stream
- SMN Sink Stream
- Creating a Temporary Stream
- Creating a Dimension Table
- Custom Stream Ecosystem
- Data Type
- Built-In Functions
- User-Defined Functions
- Geographical Functions
- SELECT
- Condition Expression
- Window
- JOIN Between Stream Data and Table Data
- Configuring Time Models
- Pattern Matching
- StreamingML
- Reserved Keywords
-
Flink SQL Syntax
-
Identifiers
- aggregate_func
- alias
- attr_expr
- attr_expr_list
- attrs_value_set_expr
- boolean_expression
- col
- col_comment
- col_name
- col_name_list
- condition
- condition_list
- cte_name
- data_type
- db_comment
- db_name
- else_result_expression
- file_format
- file_path
- function_name
- groupby_expression
- having_condition
- input_expression
- join_condition
- non_equi_join_condition
- number
- partition_col_name
- partition_col_value
- partition_specs
- property_name
- property_value
- regex_expression
- result_expression
- select_statement
- separator
- sql_containing_cte_name
- sub_query
- table_comment
- table_name
- table_properties
- table_reference
- when_expression
- where_condition
- window_function
- Operators
-
Spark SQL Syntax Reference
- User Guide (Paris Region)
-
API Reference (Paris Region)
- Before You Start
- Overview
- Calling APIs
- Getting Started
- Permission-related APIs
- Queue-related APIs (Recommended)
- APIs Related to SQL Jobs
- Package Group-related APIs
-
APIs Related to Flink Jobs
- Granting OBS Permissions to DLI
- Creating a SQL Job
- Updating a SQL Job
- Creating a Flink Jar job
- Updating a Flink Jar Job
- Running Jobs in Batches
- Querying the Job List
- Querying Job Details
- Querying the Job Execution Plan
- Stopping Jobs in Batches
- Deleting a Job
- Deleting Jobs in Batches
- Exporting a Flink Job
- Importing a Flink Job
- APIs Related to Spark jobs
- APIs Related to Flink Job Templates
-
APIs Related to Enhanced Datasource Connections
- Creating an Enhanced Datasource Connection
- Deleting an Enhanced Datasource Connection
- Querying an Enhanced Datasource Connection List
- Querying an Enhanced Datasource Connection
- Binding a Queue
- Unbinding a Queue
- Modifying the Host Information
- Querying Authorization of an Enhanced Datasource Connection
- Global Variable-related APIs
- Public Parameters
- Change History
-
SQL Syntax Reference (Paris Region)
-
Spark SQL Syntax Reference
- Common Configuration Items of Batch SQL Jobs
- SQL Syntax Overview of Batch Jobs
- Spark Open Source Commands
- Databases
- Creating an OBS Table
- Creating a DLI Table
- Deleting a Table
- Checking Tables
- Modifying a Table
-
Syntax for Partitioning a Table
- Adding Partition Data (Only OBS Tables Supported)
- Renaming a Partition (Only OBS Tables Supported)
- Deleting a Partition
- Deleting Partitions by Specifying Filter Criteria (Only Supported on OBS Tables)
- Altering the Partition Location of a Table (Only OBS Tables Supported)
- Updating Partitioned Table Data (Only OBS Tables Supported)
- Updating Table Metadata with REFRESH TABLE
- Importing Data to the Table
- Inserting Data
- Clearing Data
- Exporting Search Results
- Table Lifecycle Management
- Creating a Datasource Connection with an HBase Table
- Creating a Datasource Connection with an OpenTSDB Table
- Creating a Datasource Connection with a DWS table
- Creating a Datasource Connection with an RDS Table
- Creating a Datasource Connection with a CSS Table
- Creating a Datasource Connection with a DCS Table
- Creating a Datasource Connection with a DDS Table
- Creating a Datasource Connection with an Oracle Table
- Views
- Checking the Execution Plan
- Data Permissions Management
- Data Types
- User-Defined Functions
-
Built-in Functions
-
Date Functions
- Overview
- add_months
- current_date
- current_timestamp
- date_add
- dateadd
- date_sub
- date_format
- datediff
- datediff1
- datepart
- datetrunc
- day/dayofmonth
- from_unixtime
- from_utc_timestamp
- getdate
- hour
- isdate
- last_day
- lastday
- minute
- month
- months_between
- next_day
- quarter
- second
- to_char
- to_date
- to_date1
- to_utc_timestamp
- trunc
- unix_timestamp
- weekday
- weekofyear
- year
-
String Functions
- Overview
- ascii
- concat
- concat_ws
- char_matchcount
- encode
- find_in_set
- get_json_object
- instr
- instr1
- initcap
- keyvalue
- length
- lengthb
- levenshtein
- locate
- lower/lcase
- lpad
- ltrim
- parse_url
- printf
- regexp_count
- regexp_extract
- replace
- regexp_replace
- regexp_replace1
- regexp_instr
- regexp_substr
- repeat
- reverse
- rpad
- rtrim
- soundex
- space
- substr/substring
- substring_index
- split_part
- translate
- trim
- upper/ucase
- Mathematical Functions
- Aggregate Functions
- Window Functions
- Other Functions
-
Date Functions
- Basic SELECT Statements
- Filtering
- Sorting
- Grouping
- JOIN
- Subquery
- Alias
- Set Operations
- WITH...AS
- CASE...WHEN
- OVER Clause
- Flink OpenSource SQL 1.12 Syntax Reference
-
Flink Opensource SQL 1.10 Syntax Reference
- Constraints and Definitions
- Flink OpenSource SQL 1.10 Syntax
-
Data Definition Language (DDL)
- Creating a Source Table
-
Creating a Result Table
- ClickHouse Result Table
- Kafka Result Table
- Upsert Kafka Result Table
- DIS Result Table
- JDBC Result Table
- GaussDB(DWS) Result Table
- Redis Result Table
- SMN Result Table
- HBase Result Table
- Elasticsearch Result Table
- OpenTSDB Result Table
- User-defined Result Table
- Print Result Table
- File System Result Table
- Creating a Dimension Table
- Data Manipulation Language (DML)
- Functions
-
Historical Versions
-
Flink SQL Syntax
- SQL Syntax Constraints and Definitions
- SQL Syntax Overview of Stream Jobs
- Creating a Source Stream
-
Creating a Sink Stream
- MRS OpenTSDB Sink Stream
- CSS Elasticsearch Sink Stream
- DCS Sink Stream
- DDS Sink Stream
- DIS Sink Stream
- DMS Sink Stream
- DWS Sink Stream (JDBC Mode)
- DWS Sink Stream (OBS-based Dumping)
- MRS HBase Sink Stream
- MRS Kafka Sink Stream
- Open-Source Kafka Sink Stream
- File System Sink Stream (Recommended)
- OBS Sink Stream
- RDS Sink Stream
- SMN Sink Stream
- Creating a Temporary Stream
- Creating a Dimension Table
- Custom Stream Ecosystem
- Data Type
- Built-In Functions
- User-Defined Functions
- Geographical Functions
- SELECT
- Condition Expression
- Window
- JOIN Between Stream Data and Table Data
- Configuring Time Models
- Pattern Matching
- StreamingML
- Reserved Keywords
-
Flink SQL Syntax
-
Identifiers
- aggregate_func
- alias
- attr_expr
- attr_expr_list
- attrs_value_set_expr
- boolean_expression
- col
- col_comment
- col_name
- col_name_list
- condition
- condition_list
- cte_name
- data_type
- db_comment
- db_name
- else_result_expression
- file_format
- file_path
- function_name
- groupby_expression
- having_condition
- input_expression
- join_condition
- non_equi_join_condition
- number
- partition_col_name
- partition_col_value
- partition_specs
- property_name
- property_value
- regex_expression
- result_expression
- select_statement
- separator
- sql_containing_cte_name
- sub_query
- table_comment
- table_name
- table_properties
- table_reference
- when_expression
- where_condition
- window_function
- Operators
-
Spark SQL Syntax Reference
-
User Guide (Kuala Lumpur Region)
- DLI Introduction
- Getting Started
- DLI Console Overview
- SQL Editor
- Job Management
- Queue Management
- Data Management
- Job Templates
- Datasource Connections
- Global Configuration
- UDFs
- Permissions Management
- Change History
-
API Reference (Kuala Lumpur Region)
- Before You Start
- Overview
- Calling APIs
- Getting Started
- Permission-related APIs
- Queue-related APIs (Recommended)
- APIs Related to SQL Jobs
- Package Group-related APIs
-
APIs Related to Flink Jobs
- Granting OBS Permissions to DLI
- Creating a SQL Job
- Updating a SQL Job
- Creating a Flink Jar job
- Updating a Flink Jar Job
- Running Jobs in Batches
- Querying the Job List
- Querying Job Details
- Querying the Job Execution Plan
- Stopping Jobs in Batches
- Deleting a Job
- Deleting Jobs in Batches
- Exporting a Flink Job
- Importing a Flink Job
- APIs Related to Spark jobs
- APIs Related to Flink Job Templates
-
APIs Related to Enhanced Datasource Connections
- Creating an Enhanced Datasource Connection
- Deleting an Enhanced Datasource Connection
- Querying an Enhanced Datasource Connection List
- Querying an Enhanced Datasource Connection
- Binding a Queue
- Unbinding a Queue
- Modifying the Host Information
- Querying Authorization of an Enhanced Datasource Connection
- Global Variable-related APIs
- Permissions Policies and Supported Actions
- Public Parameters
- Change History
-
SQL Syntax Reference (Kuala Lumpur Region)
-
Spark SQL Syntax Reference
- Common Configuration Items of Batch SQL Jobs
- SQL Syntax Overview of Batch Jobs
- Spark Open Source Commands
- Databases
- Creating an OBS Table
- Creating a DLI Table
- Deleting a Table
- Checking Tables
- Modifying a Table
-
Syntax for Partitioning a Table
- Adding Partition Data (Only OBS Tables Supported)
- Renaming a Partition (Only OBS Tables Supported)
- Deleting a Partition
- Deleting Partitions by Specifying Filter Criteria (Only Supported on OBS Tables)
- Altering the Partition Location of a Table (Only OBS Tables Supported)
- Updating Partitioned Table Data (Only OBS Tables Supported)
- Updating Table Metadata with REFRESH TABLE
- Importing Data to the Table
- Inserting Data
- Clearing Data
- Exporting Search Results
- Backing Up and Restoring Data of Multiple Versions
- Table Lifecycle Management
- Creating a Datasource Connection with an HBase Table
- Creating a Datasource Connection with an OpenTSDB Table
- Creating a Datasource Connection with a DWS table
- Creating a Datasource Connection with an RDS Table
- Creating a Datasource Connection with a CSS Table
- Creating a Datasource Connection with a DCS Table
- Creating a Datasource Connection with a DDS Table
- Creating a Datasource Connection with an Oracle Table
- Views
- Checking the Execution Plan
- Data Permissions Management
- Data Types
- User-Defined Functions
-
Built-in Functions
-
Date Functions
- Overview
- add_months
- current_date
- current_timestamp
- date_add
- dateadd
- date_sub
- date_format
- datediff
- datediff1
- datepart
- datetrunc
- day/dayofmonth
- from_unixtime
- from_utc_timestamp
- getdate
- hour
- isdate
- last_day
- lastday
- minute
- month
- months_between
- next_day
- quarter
- second
- to_char
- to_date
- to_date1
- to_utc_timestamp
- trunc
- unix_timestamp
- weekday
- weekofyear
- year
-
String Functions
- Overview
- ascii
- concat
- concat_ws
- char_matchcount
- encode
- find_in_set
- get_json_object
- instr
- instr1
- initcap
- keyvalue
- length
- lengthb
- levenshtein
- locate
- lower/lcase
- lpad
- ltrim
- parse_url
- printf
- regexp_count
- regexp_extract
- replace
- regexp_replace
- regexp_replace1
- regexp_instr
- regexp_substr
- repeat
- reverse
- rpad
- rtrim
- soundex
- space
- substr/substring
- substring_index
- split_part
- translate
- trim
- upper/ucase
- Mathematical Functions
- Aggregate Functions
- Window Functions
- Other Functions
-
Date Functions
- Basic SELECT Statements
- Filtering
- Sorting
- Grouping
- JOIN
- Subquery
- Alias
- Set Operations
- WITH...AS
- CASE...WHEN
- OVER Clause
- Flink OpenSource SQL 1.12 Syntax Reference
-
Flink Opensource SQL 1.10 Syntax Reference
- Constraints and Definitions
- Flink OpenSource SQL 1.10 Syntax
-
Data Definition Language (DDL)
- Creating a Source Table
-
Creating a Result Table
- ClickHouse Result Table
- Kafka Result Table
- Upsert Kafka Result Table
- DIS Result Table
- JDBC Result Table
- GaussDB(DWS) Result Table
- Redis Result Table
- SMN Result Table
- HBase Result Table
- Elasticsearch Result Table
- OpenTSDB Result Table
- User-defined Result Table
- Print Result Table
- File System Result Table
- Creating a Dimension Table
- Data Manipulation Language (DML)
- Functions
-
Historical Versions
-
Flink SQL Syntax
- SQL Syntax Constraints and Definitions
- SQL Syntax Overview of Stream Jobs
- Creating a Source Stream
-
Creating a Sink Stream
- CloudTable HBase Sink Stream
- CloudTable OpenTSDB Sink Stream
- MRS OpenTSDB Sink Stream
- CSS Elasticsearch Sink Stream
- DCS Sink Stream
- DDS Sink Stream
- DIS Sink Stream
- DMS Sink Stream
- DWS Sink Stream (JDBC Mode)
- DWS Sink Stream (OBS-based Dumping)
- MRS HBase Sink Stream
- MRS Kafka Sink Stream
- Open-Source Kafka Sink Stream
- File System Sink Stream (Recommended)
- OBS Sink Stream
- RDS Sink Stream
- SMN Sink Stream
- Creating a Temporary Stream
- Creating a Dimension Table
- Custom Stream Ecosystem
- Data Type
- Built-In Functions
- User-Defined Functions
- Geographical Functions
- SELECT
- Condition Expression
- Window
- JOIN Between Stream Data and Table Data
- Configuring Time Models
- Pattern Matching
- StreamingML
- Reserved Keywords
-
Flink SQL Syntax
-
Identifiers
- aggregate_func
- alias
- attr_expr
- attr_expr_list
- attrs_value_set_expr
- boolean_expression
- col
- col_comment
- col_name
- col_name_list
- condition
- condition_list
- cte_name
- data_type
- db_comment
- db_name
- else_result_expression
- file_format
- file_path
- function_name
- groupby_expression
- having_condition
- input_expression
- join_condition
- non_equi_join_condition
- number
- partition_col_name
- partition_col_value
- partition_specs
- property_name
- property_value
- regex_expression
- result_expression
- select_statement
- separator
- sql_containing_cte_name
- sub_query
- table_comment
- table_name
- table_properties
- table_reference
- when_expression
- where_condition
- window_function
- Operators
-
Spark SQL Syntax Reference
-
User Guide (ME-Abu Dhabi Region)
- Videos
- General Reference
- Scenario
- Procedure
- Preparations
- Step 1: Develop a JAR File and Upload It to OBS
- Step 2: Buy an Elastic Resource Pool and Add Queues to the Pool
- Step 3: Use DEW to Manage Access Credentials
- Step 4: Create a Custom Agency to Allow DLI to Access DEW and Read Credentials
- Step 5: Create a Flink Jar Job and Configure Job Information
Show all
Copied.
Using DLI to Submit a Flink Jar Job
Scenario
Flink Jar jobs are suitable for data analysis scenarios that require custom stream processing logic, complex state management, or integration with specific libraries. You need to write and build a Jar job package. Before submitting a Flink Jar job, upload the Jar job package to OBS and submit it together with the data and job parameters to run the job.
This example introduces the basic process of submitting a Flink Jar job package through the DLI console. Due to different service requirements, the specific writing of the Jar package may vary. It is recommended that you refer to the sample code provided by DLI and edit and customize it according to your actual business scenario. Get DLI Sample Code.
Procedure
Table 1 describes the procedure for submitting a Flink Jar job using DLI.
Complete the preparations in Preparations before performing the following operations.
Procedure |
Description |
---|---|
Prepare a Flink Jar job package and upload it to OBS. |
|
Step 2: Buy an Elastic Resource Pool and Add Queues to the Pool |
Create compute resources required for submitting the Flink Jar job. |
In cross-source analysis scenarios, use DEW to manage access credentials of data sources. |
|
Step 4: Create a Custom Agency to Allow DLI to Access DEW and Read Credentials |
Create an agency to allow DLI to access DEW. |
Step 5: Create a Flink Jar Job and Configure Job Information |
Create a Flink Jar job to analyze data. |
Preparations
- Register a Huawei ID and enable Huawei Cloud services. Make sure your account is not in arrears or frozen.
- Configure an agency for DLI.
To use DLI, you need to access services such as Object Storage Service (OBS), Virtual Private Cloud (VPC), and Simple Message Notification (SMN). If it is your first time using DLI, you will need to configure an agency to allow access to these dependent services.
- Log in to the DLI management console using your account. In the navigation pane on the left, choose Global Configuration > Service Authorization.
- On the agency settings page, select the agency permissions under Basic Usage, Datasource, and O&M and click Update.
- View and understand the notes for updating the agency and click OK. The DLI agency permissions are updated.
Figure 1 Configuring an agency for DLI
- Once configured, you can check the agency dli_management_agency in the agency list on the IAM console.
Step 1: Develop a JAR File and Upload It to OBS
Develop a JAR file offline as the DLI console does not support this capability. For development examples, refer to "Flink Jar Job Examples".
Develop a Flink Jar job program by referring to Flink Job Sample Code, compile it, and pack it into flink-examples.jar. Perform the following steps to upload the program:
Before submitting a Flink job, upload data files to OBS.
- Log in to the DLI console.
- In the service list, click Object Storage Service under Storage.
- Create a bucket. In this example, name it dli-test-obs01.
- On the displayed Buckets page, click Create Bucket in the upper right corner.
- On the displayed Create Bucket page, set Region and Bucket Name. Retain the default values for other parameters or modify them as needed.
NOTE:
Select a region that matches the location of the DLI console.
- Click Create Now.
- In the bucket list, click the name of the dli-test-obs01 bucket you just created to access its Objects tab.
- Click Upload Object. In the displayed dialog box, drag or add files or folders, for example, flink-examples.jar, to the upload area. Then, click Upload.
In this example, the path after upload is obs://dli-test-obs01/flink-examples.jar.
For more operations on the OBS console, see the Object Storage Service User Guide.
Step 2: Buy an Elastic Resource Pool and Add Queues to the Pool
To execute SQL jobs in datasource scenarios, you must use your own SQL queue as the existing default queue cannot be used. In this example, create an elastic resource pool named dli_resource_pool and a queue named dli_queue_01.
- Log in to the DLI management console.
- In the navigation pane on the left, choose Resources > Resource Pool.
- On the displayed page, click Buy Resource Pool in the upper right corner.
- On the displayed page, set the parameters.
In this example, we will buy the resource pool in the CN East-Shanghai2 region. Table 2 describes the parameters.
Table 2 Parameters Parameter
Description
Example Value
Region
Select a region where you want to buy the elastic resource pool.
CN East-Shanghai2
Project
Project uniquely preset by the system for each region
Default
Name
Name of the elastic resource pool
dli_resource_pool
Specifications
Specifications of the elastic resource pool
Standard
CU Range
The maximum and minimum CUs allowed for the elastic resource pool
64-64
CIDR Block
CIDR block the elastic resource pool belongs to. If you use an enhanced datasource connection, this CIDR block cannot overlap that of the data source. Once set, this CIDR block cannot be changed.
172.16.0.0/19
Enterprise Project
Select an enterprise project for the elastic resource pool.
default
- Click Buy.
- Click Submit.
- In the elastic resource pool list, locate the pool you just created and click Add Queue in the Operation column.
- Set the basic parameters listed below.
Table 3 Basic parameters for adding a queue Parameter
Description
Example Value
Name
Name of the queue to add
dli_queue_01
Type
Type of the queue
- To execute SQL jobs, select For SQL.
- To execute Flink or Spark jobs, select For general purpose.
_
Engine
SQL queue engine. The options are Spark and HetuEngine.
_
Enterprise Project
Select an enterprise project.
default
- Click Next and configure scaling policies for the queue.
Click Create to add a scaling policy with varying priority, period, minimum CUs, and maximum CUs.
Figure 2 shows the scaling policy configured in this example.Table 4 Scaling policy parameters Parameter
Description
Example Value
Priority
Priority of the scaling policy in the current elastic resource pool. A larger value indicates a higher priority. In this example, only one scaling policy is configured, so its priority is set to 1 by default.
1
Period
The first scaling policy is the default policy, and its Period parameter configuration cannot be deleted or modified.
The period for the scaling policy is from 00 to 24.
00–24
Min CU
Minimum number of CUs allowed by the scaling policy
16
Max CU
Maximum number of CUs allowed by the scaling policy
64
- Click OK.
Step 3: Use DEW to Manage Access Credentials
In cross-source analysis scenarios, you need to set attributes such as the username and password in the connector. However, information such as usernames and passwords is highly sensitive and needs to be encrypted to ensure user data privacy.
Data Encryption Workshop (DEW) is a secure, reliable, and easy-to-use solution for encrypting and decrypting private data while ensuring its security.
This example introduces how to create a shared secret in DEW.
- Log in to the DEW management console.
- In the navigation pane on the left, choose Cloud Secret Management Service > Secrets.
- Click Create Secret. On the displayed page, configure basic secret information.
- In this example, the key in the first line is the user's access key ID (AK).
- In this example, the key in the second line is the user's secret access key (SK).
Figure 3 Configuring access credentials in DEW
- Set access credential parameters on the DLI Flink Jar job editing page.
flink.hadoop.fs.obs.bucket.USER_BUCKET_NAME.dew.access.key=USER_AK_CSMS_KEY_obstest1 flink.hadoop.fs.obs.bucket.USER_BUCKET_NAME.dew.secret.key=USER_SK_CSMS_KEY_obstest1 flink.hadoop.fs.obs.security.provider=com.dli.provider.UserObsBasicCredentialProvider flink.hadoop.fs.dew.csms.secretName=obsAksKflink.hadoop.fs.dew.endpoint=kmsendpoint flink.hadoop.fs.dew.csms.version=v6flink.hadoop.fs.dew.csms.cache.time.second=3600flink.dli.job.agency.name=agencyname
Step 4: Create a Custom Agency to Allow DLI to Access DEW and Read Credentials
- Log in to the management console.
- In the upper right corner of the page, hover over the username and select Identity and Access Management.
- In the navigation pane of the IAM console, choose Agencies.
- On the displayed page, click Create Agency.
- On the Create Agency page, set the following parameters:
- Agency Name: Enter an agency name, for example, dli_dew_agency_access.
- Agency Type: Select Cloud service.
- Cloud Service: This parameter is available only when you select Cloud service for Agency Type. Select Data Lake Insight (DLI) from the drop-down list.
- Validity Period: Select Unlimited.
- Description: You can enter Agency with OBS OperateAccess permissions. This parameter is optional.
- Click Next.
- Click the agency name. On the displayed page, click the Permissions tab. Click Authorize. On the displayed page, click Create Policy.
- Configure policy information.
- Enter a policy name, for example, dli-dew-agency.
- Select JSON.
- In the Policy Content area, paste a custom policy.
{ "Version": "1.1", "Statement": [ { "Effect": "Allow", "Action": [ "csms:secretVersion:get", "csms:secretVersion:list", "kms:dek:decrypt" ] } ] }
- Enter a policy description as required.
- Click Next.
- On the Select Policy/Role page, select Custom policy from the first drop-down list and select the custom policy created in 8.
- Click Next. On the Select Scope page, set the authorization scope. In this example, select All resources.
For details about authorization operations, see Creating a User Group and Assigning Permissions.
- Click OK.
It takes 15 to 30 minutes for the authorization to be in effect.
Step 5: Create a Flink Jar Job and Configure Job Information
- Create a Flink Jar job.
- In the navigation pane on the left of the DLI management console, choose Job Management > Flink Jobs.
- On the displayed page, click Create Job in the upper right corner.
In this example, set Type to Flink Jar and Name to Flink_Jar_for_test.Figure 4 Creating a Flink Jar job
- Click OK.
- Configure basic job information.
Configure basic job information based on Table 5.
Table 5 Parameters Parameter
Mandatory
Description
Queue
Yes
Select a queue where you want to run your job.
Application
Yes
Select the custom package in Step 1: Develop a JAR File and Upload It to OBS.
Main Class
Yes
Class name of the JAR file to load
This parameter specifies the entry for the Flink job, that is, the class that contains the main method. This is the class that is executed first when a Flink job is started.
If the application program is of type .jar, the main class name must be provided.
The main class name is case-sensitive and must be correct.
- Default: Specified based on the Manifest file in the JAR file.
- Manually assign: You must enter the class name and confirm the class arguments (separated by spaces).
NOTE:
When a class belongs to a package, the package path must be carried, for example, packagePath.KafkaMessageStreaming.
Flink Version
Yes
Flink version used for job running
If you choose to use Flink 1.15, make sure to configure the agency information for the cloud service that DLI is allowed to access in the job.
Agency
No
If you choose to use Flink 1.15, configure the agency name yourself to ensure smooth job running.
- Configure advanced settings for the Flink Jar job.
Configure the Flink Jar job based on Table 6.
Table 6 Advanced settings for the Flink Jar job Parameter
Mandatory
Description
CUs
Yes
One CU consists of one vCPU and 4 GB of memory. The number of CUs ranges from 2 to 400.
Job Manager CUs
Yes
Number of CUs allowed for the job manager. The value ranges from 1 to 4. The default value is 1.
Parallelism
Yes
Maximum number of parallel operators in a job
NOTE:
- The value cannot exceed four times the number of compute units (CUs – Job Manager CUs).
- You are advised to set this parameter to a value greater than that configured in the code. Otherwise, job submission may fail.
Task Manager Config
No
Whether Task Manager resource parameters are set
If this option is selected, you need to set the following parameters:
- CU(s) per TM: Number of resources occupied by each Task Manager.
- Slot(s) per TM: Number of slots contained in each Task Manager.
Save Job Log
No
Whether job running logs are saved to OBS
If this option is selected, you need to set the following parameters:
OBS Bucket: Select an OBS bucket to store job logs. If the OBS bucket you selected is unauthorized, click Authorize.
Alarm on Job Exception
No
Whether to notify users of any job exceptions, such as running exceptions or arrears, via SMS or email.
If this option is selected, you need to set the following parameters:
SMN Topic
Select a custom SMN topic. For how to create a custom SMN topic, see "Creating a Topic" in the Simple Message Notification User Guide.
Auto Restart on Exception
No
Whether automatic restart is enabled. If enabled, jobs will be automatically restarted and restored when exceptions occur.
If this option is selected, you need to set the following parameters:
- Max. Retry Attempts: maximum number of retries upon an exception. The unit is times/hour.
- Unlimited: The number of retries is unlimited.
- Limited: The number of retries is user-defined.
- Restore Job from Checkpoint: Restore the job from the latest checkpoint.
If you select this parameter, you also need to set Checkpoint Path.
Checkpoint Path: Select a path for storing checkpoints. This path must match that configured in the application package. Each job must have a unique checkpoint path, or, you will not be able to obtain the checkpoint.
- Click Save in the upper right of the page.
- Click Start in the upper right corner.
- On the displayed Start Flink Job page, confirm the job specificationand fee and click Start Now to start the job.
Once the job is started, the system automatically switches to the Flink Jobs page. Locate the job you created and check its status in the Status column.
Once a job is successfully submitted, its status changes from Submitting to Running. After the execution is complete, the status changes to Completed.
If the job status is Submission failed or Running exception, the job fails to submit or run. In this case, you can hover over the status icon in the Status column of the job list to view the error details. You can click
to copy these details. Rectify the fault based on the error information and resubmit the job.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot