Updated on 2024-03-05 GMT+08:00

Configuring the DIS Flume Plugin

A DIS Flume Plugin consists of DIS Source and DIS Sink. The dis-flume-plugin.conf.template file in the installation package lists the configuration methods. This section describes the configuration items of DIS Source and DIS Sink.

dis-flume-plugin.conf.template is a configuration sample for DIS plug-in and is not a configuration file that will be accessed when Flume is run. Flume provides a configuration sample file in FLUME_HOME/conf/flume-conf.properties.template, where FLUME_HOME is the installation path of Flume. You can modify the configuration file based on site requirements.

SK is sensitive information. To encrypt the SK, perform the following steps:

  1. Run the following command to go to the dis-flume-plugin/ directory:

    cd /dis-flume-plugin

  2. Run the encryption script, enter the password, and press Enter.

    bash dis-encrypt.sh

  3. View the encryption result. The character string following "Encrypt result:" displayed on the console is the encryption result. Use this method to encrypt the MySQL password and SK, respectively and record the ciphertext in the configuration file.

Configuring DIS Source

Table 1 DIS Source configuration parameters

Parameter

Mandatory

Description

Default Value

channels

Yes

Name of the Flume channel.

-

type

Yes

DIS Source type.

com.cloud.dis.adapter.flume.source.DISSource

streams

Yes

DIS stream name.

The entered DIS stream name must be the same as the stream name specified when you are creating a DIS stream on the DIS console.

ak

Yes

User's AK.

For details about how to obtain an AK, see Checking Authentication Information.

-

sk

Yes

User's SK.

For details about how to obtain an SK, see Checking Authentication Information.

-

region

Yes

Region in which the DIS is located.

-

projectId

Yes

Project ID specific to your region.

For details about how to obtain a project ID, see Checking Authentication Information.

-

endpoint

Yes

Data API address of the region where DIS resides.

-

group.id

Yes

Application name, which is used to identify a consumer group and consists of letters, digits, hyphens (-), and underscores (_).

-

Configuring DIS Sink

Table 2 DIS Sink configuration parameters

Parameter

Mandatory

Description

Default Value

channel

Yes

Name of the Flume channel.

-

type

Yes

Sink type.

com.cloud.dis.adapter.flume.sink.DISSink

streamName

Yes

Name of the DIS stream.

The entered DIS stream name must be the same as the stream name specified when you are creating a DIS stream on the DIS console.

ak

Yes

User's AK.

For details about how to obtain an AK, see Checking Authentication Information.

-

sk

Yes

User's SK.

For details about how to obtain an SK, see Checking Authentication Information.

-

region

Yes

Region in which the DIS is located.

-

projectId

Yes

Project ID specific to your region.

For details about how to obtain a project ID, see Checking Authentication Information.

-

endpoint

Yes

Data API address of the region where DIS resides.

-

partitionNumber

No

Number of partitions that the chosen DIS stream has.

The value is used to calculate batchSize.

1

batchSize

No

Number of data records that can be batch processed in a single Flume transaction.

batchSize = partitionNumber * 250

sendingThreadSize

No

The number of sender threads. By default, there is only one sender thread.

NOTE:

If multiple sender threads are used, the following situations will occur:

  • There is no guarantee on the order in which data will be sent.
  • Certain data will be resent if the Flume application restarts after it stops abruptly.

1

sendingRecordSize

No

Number of data records that can be sent in a single call to the DIS API that is used to put data into DIS streams.

NOTE:

batchSize indicates the number of data records that can be batch processed in a single Flume transaction, whereas sendingRecordSize indicates the number of data records that can be batch processed in a single API call. For example, if batchSize is 1000 and sendingRecordSize is 250, it indicates that four API calls will be made to complete the Flume transaction. A Flume transaction is completed and submitted only after the batchSize amount of data is successfully sent. If the application restarts before a Flume transaction is submitted, data will be resent. If sendingThreadSize is set to 1, it indicates that sendingRecordSize and batchSize will have the same value. This prevents unnecessary data resending.

250

retrySize

No

The maximum number of times that the DIS Flume Sink can retry to call a DIS API when the initial call to the DIS API fails.

The default value 2147483647 is recommended, indicating that Sink can retry the API call for an unlimited number of times.

Exponential backoff is used to incrementally increase the wait between retry attempts in order to reduce server load and increase the likelihood that repeated requests will succeed.

2147483647

resultLogLevel

No

The level of logs generated to print out the latest sequenceNumber at the end of each DIS API call.

Log levels are listed in the order of from low to high: OFF < DEBUG < INFO < WARN < ERROR.

The value OFF indicates that no logs will be generated.

If the log level of Flume log4j is higher than resultLogLevel, no logs will be generated.

OFF

maxBufferAgeMillis

No

The maximum number of milliseconds that must elapse before data can be uploaded to the DIS.

  • If the buffer is full with data waiting to be uploaded, data will be immediately uploaded to the DIS.
  • If the buffer is not full, data will be uploaded to the DIS after the specified number of milliseconds elapses.

5000

connectionTimeOutSeconds

No

The amount of time that must elapse before a DIS API call times out.

Unit: second

30

socketTimeOutSeconds

No

The amount of time that must elapse before a response to a DIS API call times out.

Unit: second

60

dataEncryptEnabled

No

An indicator of whether data is encrypted using the Advanced Encryption Standard (AES) algorithm.

  • true
  • false

false

dataPassword

No

Password used to encrypt or decrypt data.

This parameter is mandatory if dataEncryptEnabled is set to true.

-

bodySerializeType

No

Upload format of the DIS data packet (not original data format). Possible values:

  • json: The DIS data packet is encapsulated in the format of JSON.
  • protobuf: The DIS data packet is encapsulated in the binary format. After being encapsulated, the volume of the data packet is reduced by 1/3. This format is recommended when a massive amount of data is generated.

json