Updated on 2022-06-14 GMT+08:00

Basic Concepts

  • DIS stream

    A DIS stream is an ordered sequence of streaming data records.

    Streams are distinguished from each other by the stream names assigned during DIS stream creation. When reading or writing streaming data, you need to specify the name of DIS stream from or to which data will be read or written.

  • Partition

    Data records in DIS streams are distributed into partitions. Partitions are the base throughput unit of a DIS stream. The total capacity of a stream is the sum of the capacities of its partitions.

    When creating a DIS stream, you are expected to specify the number of partitions needed within your stream.

  • Data record

    A data record is the unit of data stored in a DIS stream. A data record is composed of a sequence number, partition key, and data blob.

    Data blobs are key data added by data producers to DIS streams. The payload of a data blob can be up to 1 MB before Base64 encoding.

  • Sequence number

    Each data record has a sequence number that is unique within its partition. The sequence number is assigned by DIS when a data producer calls PutRecords to add data to a DIS stream.

    Sequence numbers for the same partition key generally increase over time; the longer the time period between write requests (PutRecords requests), the larger the sequence numbers become.

  • DIS application

    DIS applications write, read, and process data in DIS streams. You can develop DIS applications using the client library software development kit (SDK).

    DIS applications are classified into producer and consumer applications.

  • SDK

    SDK is a Java-based client library. With SDK, you can build DIS applications easily to write, read, and process data in DIS streams.

  • Project

    Projects are used to group and isolate OpenStack resources (computing resources, storage resources, and network resources). A project can be a department or a project team. Multiple projects can be created for one tenant account.

    A region has multiple projects, but one project is related to one region.

  • Checkpoint: When an application consumes data, the latest serial number of the consumed data is recorded as a checkpoint. When the data is reconsumed, the consumption can be continued based on this checkpoint.
  • App: When multiple applications consume data in the same stream, an App is used as an identifier to distinguish consumption checkpoints of different applications.