Help Center/ GeminiDB/ GeminiDB Redis API/ Best Practices/ GeminiDB Redis API for Instant Messaging
Updated on 2024-12-02 GMT+08:00

GeminiDB Redis API for Instant Messaging

Context

Instant messaging (IM) works by connecting two or more people through a messaging platform over a network. Once connected, users can send text messages, files, even make voice and video calls. In the highly information-based mobile Internet era, IM products (such as WeChat and QQ) have become a must-have item in our life. The core of an IM system is a messaging system, which is used for synchronization, retrieval, and storage of messages.

  • Message synchronization: Transmitting integrate messages from the sender to the recipient quickly. The most important metrics of a message synchronization system are instantaneity, sequentiality, and integrity of transmitted messages, and the size of messages that can be supported.
  • Message storage: The persistent storage of messages. Conventional message systems store messages on premises on a client, and data is not reliable. Modern message systems store messages on the cloud. This is the so-called "message roaming". You can log in to your account at any terminals to view all historical messages.
  • Message retrieval: Messages are generally text. Therefore, full-text retrieval is also a mandatory capability. Conventional message systems usually create indexes based on local messages and support local retrieval. Modern message systems support online message storage and index creation while data is stored, providing comprehensive retrieval functions.

Application Scenarios

IM systems can be used in many industries, such as chatting, gaming, and intelligent customer service. Different industries have different requirements on the cost, performance, reliability, and latency of IM systems. These requirements need to be considered to achieve balance in architecture design.

Figure 1 IM application scenarios

IM System Architecture

The basic concepts involved in IM system architecture design are as follows .

  • Comparison between conventional and modern architectures
    Figure 2 Comparison between conventional and modern architectures

    Conventional architecture:

    • Messages are synchronized before being stored.
    • Messages are synchronized online and cached offline.
    • Servers do not persist messages or support message roaming.

    Modern architecture:

    • Messages are stored before being synchronized.
    • Messages are stored and synchronized in different libraries. The storage library stores all conversations and supports message roaming. The synchronization library stores synchronized messages by receiver.
    • Full-text retrieval is supported.
  • Comparison between read fan-out and write fan-out

    A suitable read/write model ensures message reliability and consistency and effectively reduces workloads of servers or clients, which is critical to an IM system. This section describes two models: read fan-out and write fan-out.

    Figure 3 Read fan-out

    Messages from users A1, A2, and A3 are stored in three different mailboxes (an abstract data structure used to store messages) of user B. User B has to read new messages from all the mailboxes every time. In read fan-out mode, every two associated users have a mailbox.

    Advantages of read fan-out:
    • No matter a one-on-one chat or a group chat is initiated, messages need to be written into recipient's mailbox once.
    • Each mailbox stores two users' chat records, which can be easily viewed and searched for.

    Disadvantages of read fan-out:

    • As the volume of read operations increases, the system may face challenges in scaling to handle the load efficiently.
    Figure 4 Write fan-out

    Users B1, B2, and B3 read messages only from their own mailboxes. They write or send messages in different ways for one-on-one chat and group chats.

    • One-to-one chat: A message is written into both a sender's and a recipient's mailboxes. To view the chat history, another message needs to be written.
    • Group chat: A sender needs to write a message to mailboxes of all group members. The group chat works in write fan-out mode, which consumes enormous resources. Therefore, a WeChat group can hold a maximum of 500 members.

    Advantages of write fan-out:

    • Users only need to read their own mailboxes.
    • It is convenient to synchronizing messages between multiple terminals.

    Disadvantages of write fan-out:

    • The system is subjected to heavy write loads, especially for group chats.
  • Comparison among the push, pull, and push-pull modes
    Figure 5 Push, pull, and push-pull modes

    In the IM system, messages can be obtained in the following modes:

    • Push: The server instantly pushes a new message to all clients. A persistent connection needs to be established between the client and the server to ensure real-time performance. The client only needs to receive and process the message. However the server does not know the message processing capability of the client, which may cause a data backlog.
    • Pull: The client requests messages from the frontend. This mode is used to obtain historical messages. The interval for the client to obtain new messages is not preset. If the interval is too short, a large number of connections may fail to obtain data. If the interval is too long, data cannot be received in time.
    • Push-pull: This hybrid mode integrates advantages of push and pull systems. The server pushes a new message notification to the frontend. After receiving the notification, the frontend pulls the message from the server.

IM Technology Challenges

Figure 6 IM system architecture

Messages between the clients are forwarded by servers. Core functions of IM are implemented by the message storage and synchronization libraries, which have high requirements on storage layer performance.

  • Massive data storage: If messages need to be stored permanently, the data volume will grow gradually. The message storage library must support unlimited capacity expansion to cope with the increasing data volume.
  • Low storage cost: Messages contain both hot and cold data. Hot data is generated in most queries. The cold tier has lower storage costs against increasing data volumes.
  • Data life cycle management: The life cycle must be defined for message data storage and synchronization. The storage library stores data online. Generally, a long retention duration needs to be specified. The synchronization library is used for online or offline push in the write fan-out mode, and data is stored for a short period.
  • High write throughput: The write fan-out mode is used in most IM systems, so storage hardware must offer enhanced write throughput to cope with message floods.
  • Low-latency read: The messaging system is usually used online with high real-time performance. The read latency must be as low as possible.

Advantages of GeminiDB Redis in IM Scenarios

At the heart of the IM system lies the storage layer, whose performance directly affects user experience. Currently, there are many database products at the storage layer, such as HBase and open-source Redis, which can be selected based on the business scale, cost, and performance. GeminiDB Redis API is an in-house NoSQL database service. It can meet strict requirements of IM systems on the storage layer in terms of performance and scale, including massive data storage, low storage cost, lifecycle management, high write throughput, and low read latency.

With a cloud native distributed architecture, GeminiDB Redis API is compatible with Redis 5.0 and adopts decoupled storage and compute. In-house storage systems ensure unlimited capacity expansion, strong consistency, and high reliability. The compute layer leverages LSM-based storage engines. A large number of random writes are converted into sequential writes, which greatly enhances write performance. In addition, read performance is greatly improved by read caches and Bloom filters.

Figure 7 Advantages of GeminiDB Redis API

Application Cases of GeminiDB Redis API in IM Scenarios

The following figure shows an IM system based on GeminiDB Redis API. A stream is used as a basic data structure. A Redis stream acts as a message container and allows data exchange between producers and consumers. A Redis stream provides basic functions of IM systems, such as message subscription, distribution, and adding consumers. Users can quickly build an IM system using GeminiDB Redis API. When a group chat is created, a stream queue is also created for the group chat on a GeminiDB Redis instance. Each sender adds messages to the stream queue in time sequence. A stream is a persistent queue that ensures no information loss.

Figure 8 Redis stream-based IM system

GeminiDB Redis API uses a series of innovative technologies to improve read and write performance, scale up storage in seconds, and automatically back up data. A GeminiDB Redis API offers a storage layer of the IM system. Its excellent read and write performance and advanced features will greatly facilitate IM applications. In addition, GeminiDB Redis API balances performance and costs based on open-source Redis and can be widely used in fields such as smart healthcare, traffic control, and counter.