Updated on 2024-05-07 GMT+08:00

Using Kettle to Import Data

Kettle is an open-source ETL tool. You can use Kettle to extract, transform, import, and load data.

During massive data migration, the data import speed of using the data import plug-in provided by Kettle is about 1500 records per second, resulting in a long migration time. While in the same environment, the data import speed of using the custom data import plug-in integrated with the dws-client is about 22,000 records per second, which is 15 times faster.

Therefore, when using Kettle to migrate data, you can use the dws-client-integrated custom import plug-in to greatly improve the data migration speed.

Currently, dws-kettle-plugin supports only version pdi-ce-9.4.0.0-343. Whether the new version is compatible depends on the verification result. Version pdi-ce-9.4.0.0-343 is recommended.

Preparing the Kettle Environment

  1. Install JDK1.8 and configure related environment variables.
  2. Visit Download Address to download Kettle, and decompress it.

Installing the dws-kettle-plugin Custom Plug-in

  1. Download dws-client plug-in.

    1. dws-kettle-plugin.jar: Visit Download Address to obtain the latest version.
    2. dws-client.jar: Visit Download Address to obtain the latest version.

    3. caffeine.jar: Visit Download Address and select the version that dws-client depends on. For example, if dws-client is 1.0.10, obtain caffeine.jar 1.0.10.

      You must select the version on which dws-client depends. Otherwise, compatibility issues may occur.

    4. huaweicloud-dws-jdbc.jar: Visit Download Address and select the version that dws-client depends on. For example, if dws-client is 1.0.10, obtain huaweicloud-dws-jdbc.jar 1.0.10.

  2. Create a directory, for example, dws-plugin, in the Kettle installation directory data-integration\plugins.
  3. Place dws-kettle-plugin.jar in the data-integration\plugins\dws-plugin directory.
  4. Place dws-client.jar and caffeine.jar in the data-integration\plugins\dws-plugin\lib directory.
  5. Place huaweicloud-dws-jdbc.jar in the data-integration\lib directory.

Using Kettle to Import Data

  1. Double-click Spoon.bat in the Kettle installation directory pdi-ce-9.4.0.0-343\data-integration to open Spoon and click Create > Conversion to create a conversion task.
  2. Add the Table Input node and configure the database connection, tables to be migrated, and table fields. When creating a database connection, you can click Test on the corresponding page to check whether the connection parameters are correctly set. After configuring the SQL statements corresponding to the tables to be migrated and the table fields, you can click Preview to view the preview data to be migrated.

  3. Add the DWS Table Output node, and configure the database connection, destination table, mapping between destination table fields and data source fields, and other parameters. The DWS Table Output node supports only PostgreSQL as the data source.

  4. Save the conversion task and start the task.

  5. View the running result and check whether the total number of migrated data records and detailed data in the destination data table are the same as those before migration.