Step 5: Configure the Parameters
Before migrating CS data to DLI, configure the parameters according to change of CS APIs, SDKs, and Flink versions.
API
For services involving CS API calls, you are advised to use DLI APIs related to Flink job management (streaming is added to the original URL).
SDK
Services developed based on CS SDKs can be replaced with DLI SDKs. For details about how to use the DLI SDK, see Data Lake Insight SDK Reference.
Component Version
After CS services are migrated to DLI, the Flink version will be upgraded from 1.5.3 to 1.7.2. Flink 1.7.2 APIs are mostly compatible with Flink 1.5.3 APIs. Ensure that you do a test and custom JAR jobs can work properly in DLI. The component version change affects only the user-defined Flink JAR jobs. The SQL jobs do not need to be modified. For details about Flink version changes, see Flink 1.7 Release Notes.
In Flink 1.7.2, the legacy mode is removed and the CUs and parallelism parameters in the user-defined job configurations cannot control the number of resources occupied by jobs and the number of concurrent operators. Instead, they control the number of slots that can be started by each TaskManager, that is, the taskmanager.numberOfTaskSlots parameter. To customize a job, you need to configure the parallelism in the code. When Flink is running, it applies for resources based on the total number of concurrent operators and the number of slots started by each TaskManager. Flink SQL jobs are not affected.
Security Authentication for SQL Kafka
In CS, the SSL certificate of Kafka is uploaded through Cluster Management. DLI provides datasource authentication. You can choose Datasource Connections > Datasource Authentication > Create to create Kafka_SSL authentication information, including the certificate and password required for Kafka authentication. When using Kafka, replace parameters such as ssl.truststore.location and ssl.truststore.password with kafka_certificate_name = datasource name in the with configuration item.
SQL Character Escape
Escape characters in DLI do not need to be escaped again. For example, if the character ^ is used as the delimiter, it is expressed in CS as follows:
string_to_array($field, '\\\\^')
In DLI, it is simplified into:
string_to_array($field, '\\^')
Resource Package Management
DLI provides package management. Before creating a custom job, choose Data Management > Package Management > Create to upload the job JAR package, dependency package, and configuration files to DLI. Then, the job can be referenced.
If a resource package is updated, you can create a resource package with the same name on the Package Management page to overwrite the existing one.
When you submit a job, DLI does not directly reference the resource file in the OBS path. Therefore, if the resource file is modified, you need to create a resource package with the same name on the Package Management page.
Configuration File
In CS, user-defined jobs can be uploaded to nodes and referenced via the absolute path /opt/cs/user_files/fileName in the code. In DLI, you can specify a resource package by setting the Other Dependencies parameter in a job and reference the resource package in the code by using ClassName.class.getClassLoader().getResource("userData/fileName").
Custom Job Dependencies
Flink 1.7.2 and Hadoop 3.1.1 in DLI correspond to Flink 1.5.3 and Hadoop 2.8.3 in CS. However, the versions of the built-in dependency JAR packages are changed. Therefore, the version of the built-in JAR package for custom jobs in CS may conflict with that of the built-in JAR package in DLI. In this case, you need to exclude the existing JAR packages in DLI or set the package dependency scope to provided when packaging a custom job project.
For details about the built-in dependency packages in DLI, see Data Lake Insight User Guide.
Last Article: Step 4: Migrate User Data
Next Article: Step 6: Verify the Migration
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.