Help Center/ GeminiDB/ GeminiDB Redis API/ Development Reference/ Using Parallel SCAN to Accelerate Full Database Scanning
Updated on 2024-08-06 GMT+08:00

Using Parallel SCAN to Accelerate Full Database Scanning

When there are a large number of keys in an instance, Redis SCAN commands takes a long time to work. GeminiDB Redis API utilizes a distributed architecture that enables concurrent scanning of multiple data partitions, resulting in parallel acceleration.

Precautions

  • This solution applies only to GeminiDB Redis cluster instances.
  • When using the SCAN command with the PARTITION parameter, the returned cursor must match the same partition when continuing the scanning process. Value of the PARTITION should not be changed temporarily; otherwise, the scanned data may not meet the expected results, or an error may occur.

Procedure

  1. Obtain information about all partitions of a GeminiDB Redis instance for subsequent parallel scanning.

    Data partitioning: There are many data partitions at the bottom layer of a GeminiDB Redis cluster instance, which are distributed across nodes. Each partition name is a 16-character ID. The name and total number of data partitions at the bottom layer of an instance remain fixed and do not change with any modifications made to the instance.

    Obtain the data partition list: Run the INFO ROUTE command to obtain all data partitions of the GeminiDB Redis instance. In the following example, the instance has four data partitions: efb06d5c7a4ecb31, c7a36e9eee0103c1, 6fd3dfdbcca37686, 7f7666870a88501b.

    127.0.0.1:6379>info route
     # Route
    server: 127.0.0.1:16379 // Display the data partition on the first node.
        efb06d5c7a4ecb31 // Data partition.
        c7a36e9eee0103c1 // Data partition.
     server: 127.0.0.1:26379 // Display the data partition on the second node.
        6fd3dfdbcca37686 //Data partition.
        7f7666870a88501b // Data partition.

  2. Start multiple SCAN tasks to scan different data partitions.

    GeminiDB Redis SCAN commands include a new parameter PARTITION, which allows users to scan specific data partitions using the open-source syntax. This feature allows for the creation of parallel scanning scripts, enabling SCAN operations on multiple data partitions simultaneously. As a result, scanning performance is greatly improved.

    • For details about the standard SCAN command syntax, see SCAN.
    • Syntax reference for the optional PARTITION parameter added to GeminiDB Redis API.
      SCAN cursor [MATCH pattern] [COUNT count] [TYPE type] [PARTITION partition_index]

      The syntax of the MATCH, COUNT, and TYPE parameters is the same as that of the open-source Redis.

      • PARTITION: specifies a data partition to be scanned. If the cursor returned by the SCAN command is 0, the data partition has been scanned.
      • partition_index: indicates the dictionary sequence number of all data shard IDs, starting from 0. For example, if there are four data partitions in an instance, partition_index of the partitions is [0,3]. If there are 240 data partitions in an instance, partition_index of the partitions is [0,239]. For example:
        127.0.0.1:6379> scan 0 count 2 partition 1
        1) "1125900712148994"
        2) 1) "memtier-1"
           2) "memtier-12