Configuration Table

Enhanced Open Source Feature: Configuration Table

In some scenarios, users have fixed configuration tables that store basic information. After Flink receives stream data, Flink needs to be configured to match configuration tables. Redis is recommended for storage because the configuration table may be of large size. Redis is a high-performance key-value database with low query latency for stream data.

The detailed process is as follows:

Figure 1 Process flow

Data Stored on Redis

Redis is a data structure server supporting various types of values, in addition to key-value storage. The following data types are supported:

Binary-safe string.
List: A collection of string elements sorted by their insertion order. It is basically a linked list.
Sets: Disordered collection of character string elements without repetition.
Sorted sets: Each string element is associated with a score floating number value. Elements are sorted by score and can be searched.
Hashes: The map that consists of fields and related values. Fields and values are strings.
Bit arrays: You can process strings as a series of bits by running certain commands. For example, you are allowed to configure and clear certain bits, calculate the number of bits that are configured to 1, and find the first bit that is configured to 1 or 0.
HyperLogLogs: A probabilistic data structure which is used to estimate the cardinality of a set.

Redis clusters are used to store configuration tables containing a maximum of 500 million pieces of data, enabling quick query response. Asynchronous I/Os of streams are used to query messages, improving throughput of the data processing.

Redis cluster: In a Redis cluster, Redis is deployed on all nodes in the cluster and data is stored on all nodes with high storage capacity. MRS provides Redis.
Asynchronous I/O: Asynchronous I/O is used to processes data with maximized data processing throughput, improving the processing efficiency.

Operations on Redis are as follows:

Install Redis.
When installing clusters, you can select Redis provided by MRS.
Import configuration tables to Redis.
You are allowed to select the main key or multiple key columns as the keys based on the feature of the configuration table. If to-be-stored configuration tables contain a large number of attributes, you are advised to storage them in the Hashes data format.

The Redis provided by MRS provides Jedis client for inserting queries. For details, see Redis sample code.

For details about Redis data types, visit the official website at https://redis.io/topics/data-types-intro.

Asynchronous I/Os

When Flink interacts with external systems, such as external databases, the waiting time for responses is too long, reducing data processing efficiency. In asynchronous I/O mode, other requests can be sent without waiting for the response to the previous request, improving data throughput.

The following requirements are required for achieving the API of asynchronous I/O:

You need to rewrite the asyncInvoke method of the AsyncFunction function to implement asynchronous data processing.
Callback function obtains operator results and AsyncCollector collects the obtained results.
Figure 2 Comparison of Async.I/O
You need to configure the timeout period and maximum capacity.
Timeout period defines the maximum allowed period for an asynchronous request. The maximum capacity refers to the maximum concurrent number of asynchronous requests. You are advised to configure maximum capacity based on data source features, because an improperly large value will cause high resources consumption and an improperly small value will reduce the throughput.