Updated on 2025-04-30 GMT+08:00

dws-client

Description

dws-client is a high-performance and convenient data import tool based on GaussDB(DWS) JDBC. Ensure that JDBC can be connected when using GaussDB(DWS) client. Using dws-client to import data has the following advantages:

  1. dws-client limits the cache space and time and supports batch import to improve the data import performance, meeting the real-time data import requirements during peak and off-peak hours.

    In scenarios that do not have strict real-time requirements, operations on single data records are cached until they form a batch. Then, they will be performed in a batch. This improves the write performance.

  2. dws-client supports concurrent data import.
  3. dws-client supports multiple high-performance import modes and primary key conflict policies to meet import requirements in various scenarios.
  4. dws-client supports API-based interaction, making it easy to use.

Dependency

dws-client has been added to the Maven repository. You can select the latest version from the repository. For details, visit https://mvnrepository.com/artifact/com.huaweicloud.dws/dws-client.
1
2
3
4
5
<dependency>
   <groupId>com.huaweicloud.dws</groupId>
   <artifactId>dws-client</artifactId>
   <version>${version}</version>
</dependency>

Core Features

Version 2.x is used as an example. In compatibility mode, the version is 1.x.

Initializing client

Initialize the client to create an instance for tasks like importing data into the database.

All dws-client parameters are listed in com.huaweicloud.dws.client.config.DwsClientConfigs. Each ConfigOp constant represents a parameter. The system stores these using a map. The key in ConfigOp serves as the storage key. If you use a configuration file, the key is also the file's key.

  1. The following is a simple example. You only need to configure the database connection. Retain the default values for other parameters.
    public DwsClient getClient() throws Exception {
        DwsConfig config = DwsConfig.of()
            .with(DwsClientConfigs.JDBC_URL, System.getenv("db_url"))
            .with(DwsClientConfigs.JDBC_PASSWORD, System.getenv("db_pwd"))
            .with(DwsClientConfigs.JDBC_USERNAME, System.getenv("db_username"));
        return new DwsClient(config);
    }
  2. Use the configuration file.
    Create the client.properties configuration file.
    dws.client.jdbc.url=jdbc:gaussdb://xxxx:8000/gaussdb
    dws.client.jdbc.password=****
    dws.client.jdbc.username=dbadmin
    Initialize the configuration file.
    public DwsClient getClientByProperties() throws Exception {
        URL resource = this.getClass().getClassLoader().getResource("client.properties");
        DwsConfig config = new DwsConfig(resource.getFile());
        return new DwsClient(config);
    }
  3. Use the map parameter.
    public DwsClient getClientByMap() throws Exception {
        Map<String, Object> config = new HashMap<>();
        config.put(DwsClientConfigs.JDBC_URL.key(), System.getenv("db_url"));
        config.put(DwsClientConfigs.JDBC_PASSWORD.key(), System.getenv("db_pwd"));
        config.put(DwsClientConfigs.JDBC_USERNAME.key(), System.getenv("db_username"));
        return new DwsClient(new DwsConfig(config));
    }
  4. Configure to be compatible with version 1.x.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    public DwsClient getClient(){
            DwsConfig config = DwsConfig
                    .builder()
                    .withUrl("jdbc:gaussdb://***/gaussdb")
                    .withUsername("***")
                    .withPassword("****")
                    .build();
            return new DwsClient(config);
         }
    
  5. Configure table-level parameters.
    When importing multiple tables to the client's database, you can set table-specific parameters. To do this, use the withTable("xxx") method to get the table-level parameter constructor based on the global settings. This initializes the table-level parameters using the global settings, but any new parameters will replace the existing ones. Once set, these parameters are added to the global settings. The interface then returns the updated global parameters, allowing for chained calls.
    public DwsClient getClientTable() throws Exception {
        DwsConfig config = DwsConfig.of()
            .with(DwsClientConfigs.JDBC_URL, System.getenv("db_url"))
            .with(DwsClientConfigs.JDBC_PASSWORD, System.getenv("db_pwd"))
            .with(DwsClientConfigs.JDBC_USERNAME, System.getenv("db_username"))
            .with(DwsClientConfigs.WRITE_AUTO_FLUSH_BATCH_SIZE, 10000)
            .withTable("test")
            .with(DwsClientConfigs.WRITE_CONFLICT_STRATEGY, ConflictStrategy.INSERT_OR_IGNORE)
            .build()
            .withTable("test1")
            .with(DwsClientConfigs.WRITE_AUTO_FLUSH_BATCH_SIZE, 200)
            .build();
        return new DwsClient(config);
    }
  6. 1.x compatibility
    1
    2
    3
    4
    5
    6
    7
    return DwsConfig.builder()
                    .withUrl(System.getenv("db_url"))
                    .withPassword(System.getenv("db_pwd"))
                    .withUsername(System.getenv("db_username"))
    .withAutoFlushBatchSize(1000) // The default batch size is 1000.
                    .withTableConfig("test.t_c_batch_size_2", new TableConfig()
    .withAutoFlushBatchSize(500)); //The batch size is 500 for table test.t_c_batch_size_2;
    

Using a database connection to execute SQL statements

This API is mainly used for some special services when the currently supported functions cannot meet the requirements. For example, to query data, you can directly use the native JDBC connection to operate the database.

The API parameter is a function-based interface. The interface provides a database connection. The return value can be of any type, which is determined by the return type of the service.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
public void sql() throws DwsClientException {
        Integer id = getClient().sql(connection -> {
            try (ResultSet resultSet = connection.createStatement().executeQuery("select id from test.user where name = 'zhangsan'")) {
                if (resultSet.next()) {
                    return resultSet.getInt("id");
                }
            }
            return null;
        });
        System.out.println("zhangsan id = " + id);
     }

Obtaining table information

The API can obtain the table structure (cached) based on a table name affixed with a schema name. The table structure definitions include all columns and primary keys.
1
2
3
public void getTableSchema() throws DwsClientException {
        TableSchema tableSchema = getClient().getTableSchema(TableName.valueOf("test.test"));
     }

Data import

The client provides a write API for importing data to the database. The Operate API is used to operate table columns. If the API is submitted, the table operation is complete and the client start importing the data to the database. You can select synchronous or asynchronous when submitting the operation. When setting a field, you can choose whether to ignore the setting when a primary key conflict occurs.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
public void write() throws DwsClientException {
        getClient().write("test.test")
                .setObject("id", 1)
                .setObject("name", "test")
//This setting takes effect only when data is inserted. If a primary key conflict occurs, the setting is not updated.
                .setObject("age", 38, true)
// Asynchronously save the data to the database. The result is returned after data is stored in the background cache.
                //.commit()
//  The result is returned after data is successfully saved to the database.
                .syncCommit();
     }
If the table structure stays the same, the global schema reduces metadata queries in the database. This eases the load on GaussDB(DWS) and speeds up data imports.
public void testWrite() throws Exception {
    try (DwsClient client = getClient()) {
        client.sql((conn) -> {
            conn.createStatement().execute("DROP Table IF EXISTS test.dws_client_test;"
                + "create table  test.dws_client_test (id integer, name varchar(10), age int);");
            return null;
        });
        TableSchema tableSchema = client.getTableSchema(TableName.valueOf("test.dws_client_test"));
        log.info("table schema {}", tableSchema);
        for (int i = 0; i < 100; i++) {
            Operate operate = client.write(tableSchema)
                .setObject("id", i)
                .setObject("name", "name_" + i)
                .setObject("age", i);
            operate.commit();
        }
    }
}

The column index helps write data to the client by cutting down on hash calculations. This relieves CPU pressure and boosts client write throughput.

A column's subscript is its position in the database and in the tableSchema.getColumns() set.
public void testWrite() throws Exception {
    try (DwsClient client = getClient()) {
        client.sql((conn) -> {
            conn.createStatement().execute("DROP Table IF EXISTS test.dws_client_test;"
                + "create table  test.dws_client_test (id integer, name varchar(10), age int);");
            return null;
        });
        TableSchema tableSchema = client.getTableSchema(TableName.valueOf("test.dws_client_test"));
        log.info("table schema {}", tableSchema);
        for (int i = 0; i < 100; i++) {
            Operate operate = client.write(tableSchema)
                .setObject(0, i)
                .setObject(1, "name_" + i)
                .setObject(2, i);
            operate.commit();
        }
    }
}

Data deletion

The deletion API and import API are carried by Operate. However, the primary key column must be set during deletion, and the "column update does not take effect" is ignored.
public void delete() throws DwsClientException {
        getClient().delete("test.test")
                .setObject("id", 1)
// Asynchronously save the data to the database. The result is returned after data is stored in the background cache.
                //.commit()
//  The result is returned after data is successfully saved to the database.
                .syncCommit();
    }
Forcibly updating the cache to the database
public void flush() throws DwsClientException {
        getClient().flush();
    }

Disabling resources

When the close operation is performed, the cache is updated to the database. After the close operation is performed, APIs such as importing data to the database, deleting data, and executing SQL statements cannot be executed.
public void close() throws IOException {
        getClient().close();
     }

Listening to Data Import Events

In the asynchronous import scenario, if you want to know which data has been imported to the database, you can bind the flushSuccess function interface. This interface is called back to report the import information after the database transaction is submitted.
public DwsClient getClient() throws Exception {
    DwsConfig config = DwsConfig.of()
        .with(DwsClientConfigs.JDBC_URL, System.getenv("db_url"))
        .with(DwsClientConfigs.JDBC_PASSWORD, System.getenv("db_pwd"))
        .with(DwsClientConfigs.JDBC_USERNAME, System.getenv("db_username"))
        .onFlushSuccess(records -> {
            for (Record record : records) {
                log.info("flush success. value = {}, pk = {}", RecordUtil.toMap(record), RecordUtil.getRecordPrimaryKeyValue(record));
            }
        });
    return new DwsClient(config);
}

1.x compatibility

public DwsClient getClient() {
        DwsConfig config = DwsConfig
                .builder()
                .withUrl("jdbc:postgresql://***/gaussdb")
                .withUsername("***")
                .withPassword("****")
                .onFlushSuccess(records -> {
                    for (Record record : records) {
                        log.info("flush success. value = {}, pk = {}", RecordUtil.toMap(record), RecordUtil.getRecordPrimaryKeyValue(record));
                    }
                })
                .build();
        return new DwsClient(config);
     }

Listening to Abnormal Background Tasks

In the asynchronous import process, a background task imports data to the database. Bind the ERROR function to catch task failures. Without this binding, you will only see the error next time you submit the data. If the bound function does not throw an error, the issue clears up. If it does throw an error, the service will get an exception the next time you submit data.
public DwsClient getClient() throws Exception {
    DwsConfig config = DwsConfig.of()
        .with(DwsClientConfigs.JDBC_URL, System.getenv("db_url"))
        .with(DwsClientConfigs.JDBC_PASSWORD, System.getenv("db_pwd"))
        .with(DwsClientConfigs.JDBC_USERNAME, System.getenv("db_username"))
        .onError((clientException, client) -> {
            if (clientException instanceof DwsClientRecordException) {
                DwsClientRecordException recordException = (DwsClientRecordException) clientException;
                List<Record> records = recordException.getRecords();
                List<DwsClientException> exceptions = recordException.getExceptions();
                for (int i = 0; i < records.size(); i++) {
                    log.error("pk = {} . error = {}", RecordUtil.getRecordPrimaryKeyValue(records.get(i)), exceptions.get(i));
                }
            }
            if (clientException.getCode() != ExceptionCode.CONNECTION_ERROR && clientException.getCode() != ExceptionCode.LOCK_ERROR) {
                throw clientException;
            }
            log.error("code = {}", clientException.getCode(), clientException.getOriginal());
            return null;
        });
    return new DwsClient(config);
}

1.x compatibility

public DwsClient getClient() {
        DwsConfig config = DwsConfig
                .builder()
                .withUrl("jdbc:postgresql://***/gaussdb")
                .withUsername("***")
                .withPassword("****")
                .onError((clientException, client) -> {
                    if (clientException instanceof DwsClientRecordException) {
                        DwsClientRecordException recordException = (DwsClientRecordException) clientException;
                        List<Record> records = recordException.getRecords();
                        List<DwsClientException> exceptions = recordException.getExceptions();
                        for (int i = 0; i < records.size(); i++) {
                            log.error("pk = {} . error = {}", RecordUtil.getRecordPrimaryKeyValue(records.get(i)), exceptions.get(i));
                        }
                    }
                    if (clientException.getCode() != ExceptionCode.CONNECTION_ERROR && clientException.getCode() != ExceptionCode.LOCK_ERROR) {
                        throw clientException;
                    }
                    log.error("code = {}", clientException.getCode(), clientException.getOriginal());
                    return null;
                })
                .build();
        return new DwsClient(config);
     }

Exception Handling

Exceptions can be classified into three types:

  1. InvalidException is not thrown and is triggered when the request parameter is invalid.
  2. 2. DwsClientException encapsulates all exceptions, including the parsed code and original exceptions.
  3. 3. DwsClientRecordException is an extension to DwsClientException. It includes the datasets written to the exception and the corresponding DwsClientException exception.

The following table lists the exception codes.

public enum ExceptionCode {
    /**
   /* Invalid parameter */
     */
    INVALID_CONFIG(1),

    /**
* Connection exception.
     */
    CONNECTION_ERROR(100),
    /**
* Read-only
     */
    READ_ONLY(101),
    /**
* Timeout
     */
    TIMEOUT(102),
    /**
* Too many connections
     */
    TOO_MANY_CONNECTIONS(103),
    /**
* Locking exception.
     */
    LOCK_ERROR(104),


    /**
* Authentication failed.
     */
    AUTH_FAIL(201),
    /**
* Closed
     */
    ALREADY_CLOSE(202),
    /**
* No permission.
     */
    PERMISSION_DENY(203),
    SYNTAX_ERROR(204),
    /**
* Internal exception.
     */
    INTERNAL_ERROR(205),
    /**
* Interruption exception.
     */
    INTERRUPTED(206),
    /**
* The table is not found.
     */
    TABLE_NOT_FOUND(207),
    CONSTRAINT_VIOLATION(208),
    DATA_TYPE_ERROR(209),
    DATA_VALUE_ERROR(210),

    /**
* Exceptions that cannot be parsed
     */
    UNKNOWN_ERROR(500);
    private final int code;

 }

Detailed Configuration

The list includes only public parameters. Do not configure parameters not on this list.

For configuration using the file, use the key parameter. The value parameter sets the time unit. Supported values include:

Day: d, day(s)

Hour: h, hour(s)

Minute: min(s), m, minute(s)

Second: s, sec(s), second(s)

Millisecond: ms, milli(s), millisecond(s)

Memory parameters:

byte: b

kb: k, kb

mb: m, mb

gb: g, gb

Parameter

key

1.x Parameter

Description

Default Value

JDBC_URL

dws.client.jdbc.url

url

JDBC connection address of the GaussDB(DWS) database. The client must use jdbc:gaussdb://***.

Replace jdbc:postgresql with jdbc:gaussdb. They both can be configured and have the same functions.

-

JDBC_USERNAME

dws.client.jdbc.username

username

GaussDB(DWS) database username.

-

JDBC_PASSWORD

dws.client.jdbc.password

password

GaussDB(DWS) database user password.

-

JDBC_CONNECTION_MAX_USE_TIME

dws.client.jdbc.max.use-time

connectionMaxUseTimeSeconds

Maximum duration specified for a connection. If the duration is exceeded, the current connection is forcibly closed and re-obtained. When COPY_MERGE/COPY_UPSERT is used, a temporary table is used. The schema of the temporary table is cleared only when the connection is disconnected.

3600s

JDBC_CONNECTION_MAX_IDLE

dws.client.jdbc.max.idle

connectionMaxIdleMs

Maximum connection idle time. When there is no data, the connection is always idle. When the maximum connection idle time is reached, the connection is released.

60s

WRITE_PARTITION_POLICY

dws.client.write.partition-policy

--

Partition policy for the partition cache (multiple copies of cache under the table):

DYNAMIC: The number of partitions adjusts between WRITE_PARTITION_MIN and WRITE_PARTITION_MAX based on resource usage. Currently, only initialization based on WRITE_PARTITION_MIN is done.

DN: Data is spread using the GaussDB(DWS) cluster's DN distribution policy. Each cache is then stored on the same DN. Data imports use an internal protocol to go straight to the database via the DN. This policy needs extra cluster setup and should be used with guidance. Only int* and text columns can be used as the distribution columns.

DYNAMIC

CACHE_TABLE_METADATA

dws.client.cache.table-metadata

metadataCacheSeconds

Table metadata cache time. To improve performance, data that is not changed theoretically, like table structures, stays cached. If the system does not allow online table structure changes, you do not need to set this to ease the load on the GaussDB(DWS) cluster. A value of 0 or less means the table never expires.

-1

The default value in 1.x is 180s.

RETRY_SLEEP_BASE_TIME

dws.client.retry.sleep-base-time

retryBaseTime

Sleep time during retry, which is calculated as: RETRY_SLEEP_BASE_TIME x Number of times + (0 to RETRY_SLEEP_RANDOM_TIME). The base time is set by this parameter.

1000ms

RETRY_SLEEP_RANDOM_TIME

dws.client.retry.sleep-random-time

retryRandomTime

Sleep time during retry, which is calculated as: RETRY_SLEEP_BASE_TIME x Number of times + (0 to RETRY_SLEEP_RANDOM_TIME). The random time range is specified by this parameter. This helps stagger task execution times in deadlock scenarios.

300ms

RETRY_MAX_TIMES

dws.client.retry.max-times

maxFlushRetryTimes

Maximum number of attempts to execute a database update task.

3

WRITE_AUTO_FLUSH_BATCH_SIZE

dws.client.write.auto-flush-size

autoFlushBatchSize

The system updates the background task database when the number of cached records equals or exceeds WRITE_AUTO_FLUSH_BATCH_SIZE, or when the cache age equals or exceeds WRITE_AUTO_FLUSH_MAX_INTERVAL. This setting limits the maximum number of cached records.

30000

WRITE_FORCE_FLUSH_BATCH_SIZE

dws.client.write.force-flush-size

--

Data is refreshed in the database to boost throughput. Even if the data in the service thread buffer does not meet the auto-refresh criteria, it is still written to the cache. The goal is for the background thread to submit data to the database, reducing service thread delays. When the cache limit is reached, the service thread sends data to the database only if the thread pool has free resources. Typically, this limit is higher than WRITE_AUTO_FLUSH_BATCH_SIZE.

40000

WRITE_AUTO_FLUSH_MAX_INTERVAL

dws.client.write.auto-flush-max-interval

autoFlushMaxIntervalMs

The system updates the background task database when the number of cached records equals or exceeds WRITE_AUTO_FLUSH_BATCH_SIZE, or when the cache age equals or exceeds WRITE_AUTO_FLUSH_MAX_INTERVAL. This setting limits the maximum cache duration.

3s

WRITE_FIXED_COPY_CACHE_SIZE

dws.client.write.fixed-copy.cache-size

--

Size of the buffer queue used for streaming write when the fixed copy mode is used.

1000

WRITE_USE_COPY_BATCH_SIZE

dws.client.write.use-copy-size

copyWriteBatchSize

When writeMode is set to AUTO and the data volume is less than the value of copyWriteBatchSize, the UPSERT method is used to import data to the database. Otherwise, the COPY/COPY+UPSERT method is used to import data to the database based on whether the primary key exists.

1000

WRITE_BUFFER_ALL_MAX_BYTES

dws.client.write.buffer.all-max-bytes

--

When the client instance's batch data memory reaches a certain threshold, the database is refreshed to prevent out-of-memory errors. If this threshold is not set, it defaults to a value based on the maximum JVM heap size. The value can be calculated as: Maximum JVM memory x WRITE_BUFFER_JVM_PROCESSORS.

NOTE:

The value is estimated based on the object type during batch saving and is not an accurate value.

1G

WRITE_BUFFER_TABLE_MAX_BYTES

dws.client.write.buffer.table-max-bytes

--

To avoid refreshing the entire table cache due to memory constraints, a default value is used when the maximum batch size is reached. It is recommended to adjust this parameter when importing multiple tables to the database.

500m

WRITE_BUFFER_PARTITION_MAX_BYTES

dws.client.write.buffer.partition-max-bytes

--

Refreshing the cache of a partition is triggered when its batch size hits a specific limit to prevent out-of-memory issues. The default value is calculated as: WRITE_BUFFER_TABLE_MAX_BYTES/WRITE_PARTITION_MIN.

200m

WRITE_BUFFER_JVM_PROCESSORS

dws.client.write.buffer.jvm-processors

--

If WRITE_BUFFER_ALL_MAX_BYTES is not configured, the default value of this parameter is calculated as: Maximum JVM memory x WRITE_BUFFER_ALL_MAX_BYTES.

0.4

WRITE_PARTITION_MIN

dws.client.write.partition-min

--

Set this parameter to define the number of cached partitions in the same table. If WRITE_PARTITION_POLICY is DN, the number of cached partitions equals the number of DNs times this parameter's value. For multiple partitions, data is spread across them using a hash algorithm based on the distribution column. Data can go straight into partitions without importing it to the database.

1

WRITE_PARTITION_MAX

dws.client.write.partition-max

--

If WRITE_PARTITION_POLICY is DYNAMIC, the number of partitions is dynamically adjusted between WRITE_PARTITION_MIN and WRITE_PARTITION_MAX based on the resource usage (not implemented currently).

1

WRITE_FIXED_COPY

dws.client.write.fixed-copy

--

Fixed copy mode, which activates when WRITE_MODE is set to copy*, a direct link to the database opens as soon as the first data record arrives. Data goes straight to the database's I/O stream, bypassing the cache. For deduplication, duplicates are saved in the next batch, requiring a retry to fetch all data. Sometimes, data still needs to go into the cache.

Note:

1. You can only add data to the database, not delete it. All fields in the added data must match.

2. If RETRY_MAX_TIMES is 1 and no client retries occur, data streams directly to the GaussDB(DWS) database, saving client memory by avoiding batch processing. In write-only cases, such as tables without primary keys or with auto-increment keys, or tables with primary keys where WRITE_MODE is set to copy, the service ensures data uniqueness. Setting WRITE_MEMORY_DUPLICATE_REMOVAL to false turns off memory deduplication.

3. A connection pool resource must stay open in this mode.

false

WRITE_MODE

dws.client.write.mode

writeMode

Data write methods:

  • AUTO:

    When importing data to the database, UPSERT is used if the data volume is less than the copyWriteBatchSize value. Otherwise, COPY_UPSERT is used instead.

  • COPY_MERGE:
    • If there is a primary key, the COPY + MERGE method is used to import data to the database.
    • If there is no primary key, the COPY method is used to import data to the database.
  • COPY_UPSERT:
    • If there is no primary key, the COPY method is used to import data to the database.
    • If there is a primary key, the COPY + UPSERT method is used to import data to the database.
  • UPSERT:
    • If there is no primary key, use INSERT INTO to import data to the database.
    • If there is a primary key, use UPSERT to import data to the database.
  • UPDATE:
    • Use the UPDATE WHERE syntax to update data. If the original table does not have a primary key, you can specify unique keys. A column specified as a unique key does not need to be a unique index, but a non-unique index may impact performance.
  • COPY_UPDATE:
    • Data is imported to a temporary table using the COPY method. Temporary tables can be used to accelerate the update using UPDATE FROM WHERE.
  • UPDATE_AUTO:
    • If the batch size is less than copyWriteBatchSize, UPDATE is used. Otherwise, COPY_UPDATE is used.

AUTO

WRITE_CONFLICT_STRATEGY

dws.client.write.update.conflict-policy

conflictStrategy

Primary key conflict policy when the database has primary keys. The options include:

  • INSERT_OR_IGNORE: Ignore new data when a primary key conflict occurs.
  • INSERT_OR_UPDATE: Use the new data column to update the original data column when a primary key conflict occurs.
  • INSERT_OR_REPLACE: Replace the original data with new data when a primary key conflict occurs. The columns that are not included in the new data in the database are set to null. The update of all columns is the same as that of INSERT_OR_UPDATE.

INSERT_OR_UPDATE

WRITE_THREAD_SIZE

dws.client.write.thread-size

threadSize

The thread pool size determines how many threads can store data to the database. Each thread uses one connection, so the thread pool size equals the connection pool size. The connection pool handles data imports, queries, and SQL events. When the cache meets the write conditions, data goes to the thread pool for import. If no threads are free, data waits until a thread becomes available. Once data is imported, the cache clears, allowing new data to enter. If not, services stop. Set the WRITE_PARTITION_MIN parameter to boost the number of cache copies per table and speed up imports.

Note: In version 1.x, each table has one cache. Cache data must follow the same order when written to the database. This rule no longer applies. If the database import is slower than cache writes, concurrent imports can cause lock conflicts. Set this parameter to 1 to avoid issues.

3

WRITE_NOT_WRITE

dws.client.write.disable

--

When this is set to false, the client follows the regular data import process but skips writing data to the database. You can use this to test overall link performance without GaussDB(DWS) on the client, helping to identify performance issues.

false

WRITE_TABLE_COMPARE_FIELD

dws.client.write.table.compare-field

compareField

This is used to configure the comparison fields. Fields are imported to the database only if their values are smaller than the current ones. A WHERE condition is added during import. Example:

INSERT INTO "test"."compare_test"("id", "age", "update_time") VALUES (?, ?, ?) ON CONFLICT ("id") DO UPDATE SET "id"=EXCLUDED."id", "age"=EXCLUDED."age", "update_time"=EXCLUDED."update_time" WHERE "update_time"< EXCLUDED."update_time"

-

LOG_SWITCH

dws.client.log.enable

logSwitch

Log switch. If this function is enabled, detailed process logs are recorded for debugging or fault locating.

false

onFlushSuccess

--

onFlushSuccess

Callback function after data is imported to the database.

-

onError

--

onError

Callback function when a background task fails to be executed.

-

WRITE_TABLE_UNIQUE_KEY

dws.client.write.table.unique-key

uniqueKeys

You can use this parameter to mark a field as a unique constraint if the table lacks a primary key but has a unique index. For updates, this field must be a unique index or primary key.

-

WRITE_COPY_FORMAT

dws.client.write.copy-format

copyMode

Format of the data to be imported to the database. The options include:

CSV: Concatenate data into a string with each field in double quotes and separated by commas. Line breaks separate different data entries. To import data, use the JDBC copy API. This method is less efficient than the DELIMITER mode but remains stable and reliable.

DELIMITER: Use the copy API to import data where characters are separated by 0X1E and data by 0X1F. Ensure data does not contain delimiters; otherwise, errors occur and data fails to import. A null character string is treated as null data.

CSV

WRITE_TABLE_FIELD_CASE_SENSITIVE

dws.client.write.table.case-sensitive

caseSensitive

This parameter indicates whether a table column is case sensitive.

false

WRITE_TABLE_CREATE_TEMP_MODE

dws.client.write.table.create-temp-mode

createTempTableMode

This parameter indicates the method of creating a temporary table when copy merge or upsert is used.

  • AS: Use the create temp table *** as select * from *** as method. This method allows the use of tables with auto-increment fields, but it may result in lower performance.
  • LIKE: Use CREATE TEMP TABLE *** LIKE to copy a table's structure for a temporary table. This method does not support fields like auto-increment.
  • CUSTOM (supported only in 2.x): Create a custom temporary table using the LIKE mode. The SQL statement might include multiple fields. (Example: create WRITE_TABLE_TEMP_TYPE temp table name_tmp like name WRITE_TABLE_TEMP_INCLUDING with ( WRITE_TABLE_TEMP_WITH ) DISTRIBUTE BY WRITE_TABLE_TEMP_DISTRIBUTE)

AS

WRITE_TABLE_TEMP_TYPE

dws.client.write.table.temp-type

--

Temporary table type. This parameter is mandatory when WRITE_TABLE_CREATE_TEMP_MODE is set to CUSTOM. The options include:

COMMON: default type.

VOLATILE: volatile type, which needs kernel support.

COMMON

WRITE_TABLE_TEMP_INCLUDING

dws.client.write.table.temp-including

--

This parameter is mandatory when WRITE_TABLE_CREATE_TEMP_MODE is set to CUSTOM.

including defaults

WRITE_TABLE_TEMP_WITH

dws.client.write.table.temp-with

--

This parameter is mandatory when WRITE_TABLE_CREATE_TEMP_MODE is set to CUSTOM.

-

WRITE_TABLE_TEMP_DISTRIBUTE

dws.client.write.table.temp-distribute

--

This parameter is mandatory when WRITE_TABLE_CREATE_TEMP_MODE is set to CUSTOM.

-

WRITE_FORMAT_NUMBER_TO_DATE

dws.client.write.format.number-to-date

numberAsEpochMsForDatetime

This parameter indicates whether to use the number-to-time conversion function if the database field is a date, time, or timestamp, and the data source is a number. The number converts to a standard timestamp (in milliseconds), and a Java object of the correct type is created.

NOTE:
  • This setting has no effect when copying data to the database in version 1.x.
  • It is enabled in versions before 1.0.9. Numeric strings are treated as timestamps.

false

WRITE_FORMAT_INT_TO_DATE_TYPE

dws.client.write.format.number-to-date.int-type

--

Conversion logic when WRITE_FORMAT_NUMBER_TO_DATE is true and the data type is int. The options include:

days: The number is treated as days since the start year.

seconds: The number is treated as seconds since the start year.

-

WRITE_FORMAT_LONG_TO_DATE_TYPE

dws.client.write.format.number-to-date.long-type

--

Conversion logic when WRITE_FORMAT_NUMBER_TO_DATE is true and the data type is long. The options include:

days: The number is treated as days since the start year.

seconds: The number is treated as seconds since the start year.

ms: The number is treated as milliseconds since the start year.

-

WRITE_FORMAT_NUMBER_STRING_TO_DATE

dws.client.write.format.number-string-date

--

This parameter decides if a date/time/timestamp field should be converted to a numeric string. After conversion, you can use the WRITE_FORMAT_NUMBER_TO_DATE function.

false

WRITE_FORMAT_STRING_TO_DATE

dws.client.write.format.string-date

stringToDatetimeFormat

This is a global setting. If the database field is a date and the data source is a string, SimpleDateFormat converts the field to a date. It then builds the correct type of data in the database using the timestamp.

NOTE:

In versions before 2.0.0-r3, this setting also applies to date, time, and timestamp types.

-

WRITE_FORMAT_STRING_TO_TIMESTAMP

dws.client.write.format.string-timestamp

--

This is a global setting. If the database field is a date and the data source is a string, SimpleDateFormat converts the field to a timestamp. It then builds the correct type of data in the database using the timestamp.

-

WRITE_FORMAT_STRING_TO_TIME

dws.client.write.format.string-time

--

This is a global setting. If the database field is a date and the data source is a string, SimpleDateFormat converts the field to a time point. It then builds the correct type of data in the database using the timestamp.

-

WRITE_FORMAT_DATE_STRING_FORMAT

dws.client.write.format.date-string-formats

--

If the database field is a date, time, or timestamp, and the data source is a string, specify the time format for each field using the key-value format. The format is field name:format. If the format includes special characters like colons, enclose it in double quotes, for example, update_time:"yyyy-MM-dd hh:mm:ss".

-

WRITE_FORMAT_DATE_0000

dws.client.write.format.date-0000

--

This parameter decides if the time should be changed to the timestamp 0 when the database field is a date/time/timestamp type and the data source is the string 0000-00-00 00:00:00.

false

WRITE_UPDATE_INCLUDE_PRIMARY_KEY

dws.client.write.update.include-pk

updateAll

This parameter controls whether to show if the set field includes the primary key during an upsert. If the HStore table updates all columns (the set field has all database fields), the database does not need to be queried, improving performance.

true

JDBC_AUTO_COMMIT

dws.client.jdbc.auto-commit

autoCommit

This parameter controls whether to use automatic transactions when importing data to the database.

false

WRITE_FORMAT_STRING_U0000

dws.client.write.format.string-u0000

--

This parameter decides if the special character \u0000 should be removed from the string. If not removed, an error occurs. In this case, you can only use UPSERT to import the string to the database.

true

WRITE_FORMAT_DECIMAL_DEF_TYPE

dws.client.write.format.decimal-def.type

--

This parameter sets the default value for NUMERIC or DECIMAL fields when the input is empty. The options are:

null: The value is null.

zero: The value is 0.

custom: The value is set by WRITE_FORMAT_DECIMAL_DEF_CUSTOM.

null

WRITE_FORMAT_DECIMAL_DEF_CUSTOM

dws.client.write.format.decimal-def.custom

--

This custom parameter is mandatory when WRITE_FORMAT_DECIMAL_DEF_TYPE is custom.

-

NETWORK_INTERNAL_PRIVATE_IP

dws.client.network.internal-privateIp

--

Use this parameter to map internal and external IP addresses if WRITE_PARTITION_POLICY is DN and the client and GaussDB(DWS) cluster are on different networks. The format is Internal IP:External IP. Separate multiple addresses with semicolons (;).

-

TIMEOUT_TASK

dws.client.timeout.task

--

Timeout interval for executing a task, including the retry interval.

10m

TIMEOUT_SQL_STATEMENT

dws.client.timeout.statement

--

Timeout period for executing SQL statements.

5m

TRACE_SWITCH

dws.client.trace.enable

-

This parameter controls whether to print trace records. Logs record main steps and time consumption for analyzing performance problems.

false

SYSTEM_TIMEZONE

dws.client.system.timezone

--

Set this parameter to the correct system time zone for accurate time conversions. It changes the time zone for the whole Java process. Ideally, it should match the GaussDB(DWS) time zone. If not set, it uses the client's default time zone.

-

PRINT_METRICS_NAMES

dws.client.print-metrics.names

--

Set this parameter to name the metrics you want to print. If you use a regular expression, only matching metrics will print. Set it to .* to print all metrics and check your selection.

rps_.*: data import rate for each table in the database

write_rpc: write speed of the entire instance

action_submit_wait: waiting time of the submission thread pool

buffer_size_.*: number of batches saved to the database in each operation

action_cost_.*: time needed for importing data to the database in each operation

addBatch_cost_.*: time taken to execute the addBatch function

executeBatch_cost_.*: duration of executing the executeBatch function

buildBuffer_cost_.*: time required to construct the copy buffer

copy_cost_.*: time taken to execute the copy operation

-

PRINT_METRICS_PERIOD

dws.client.print-metrics.names

--

Set this parameter to specify the metric printing frequency.

30s

Deleted parameters in version 2.x

Parameter

Description

Reason

logDataTables

Tables whose data needs to be printed during data import to the database for data comparison during fault locating.

No application scenario exists. Debugging the development node is easier. You cannot configure this parameter.

batchOutWeighRatio

To improve the overall throughput, you can set this parameter when the requirement on autoFlushBatchSize is not strict. When data is submitted to the buffer and the data volume in the buffer is greater than batchOutWeighRatio x autoFlushBatchSize, the task of submitting data to the database will be executed. This parameter is used to preferably use background threads to submit import tasks, rather than using service threads.

The WRITE_FORCE_FLUSH_BATCH_SIZE parameter has the same function and is more straightforward.