Updated on 2025-08-12 GMT+08:00

Doris Data Model

In Doris, data is logically described in the form of tables. A table is a collection of homogeneous data with the same schema. A table consists of rows and columns. Row indicates a row of user data. Column describes different fields in a row of data. Different data types (such as integers, strings, and Boolean values) can be used as required.

In OLAP scenarios, columns can be divided into two categories: Key and Value. Key and Value can correspond to the dimension column and indicator column respectively.

Doris data models are classified into the following types:

Aggregate Model

This section illustrates what an Aggregate model is and how to use it correctly with practical examples.

  • Example 1: Importing data aggregation

    Assume that the business has the following data table schema:

    Table 1 Data description

    ColumnName

    Type

    AggregationType

    Comment

    user_id

    LARGEINT

    -

    User ID

    date

    DATE

    -

    Data import date

    city

    VARCHAR(20)

    -

    City where a user is located

    age

    SMALLINT

    -

    User age

    sex

    TINYINT

    -

    User gender

    last_visit_date

    DATETIME

    REPLACE

    Last visit date

    cost

    BIGINT

    SUM

    Total consumption

    max_dwell_time

    INT

    MAX

    Maximum residence time

    min_dwell_time

    INT

    MIN

    Minimum residence time

    The corresponding CREATE TABLE statement would be as follows:

    CREATE TABLE IF NOT EXISTS demo.example_tbl
    (
        `user_id` LARGEINT NOT NULL COMMENT "User ID",
        `date` DATE NOT NULL COMMENT "Data import date and time",
        `city` VARCHAR(20) COMMENT "City where the user is located",
        `age` SMALLINT COMMENT "User age",
        `sex` TINYINT COMMENT "User gender",
        `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "Last visit date of the user",
        `cost` BIGINT SUM DEFAULT "0" COMMENT "Total consumption",
        `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum residence time",
        `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum residence time",
    )
    AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`)
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
    PROPERTIES (
        "replication_allocation" = "tag.location.default: 3"
    );

    This is a typical fact table of user information and visit behaviors. In star models, user information and visit behaviors are usually stored in dimension tables and fact tables, respectively. Here, for easier explanation of Doris data models, the two types of information are stored in one single table.

    The columns in the table are divided into Key (dimension) columns and Value (indicator) columns based on whether they are set with an AggregationType. Key columns are not set with an AggregationType, such as user_id, date, age, and sex, while Value columns are.

    When data is imported, rows with the same contents in the Key columns will be aggregated into one row, and their values in the Value columns will be aggregated as their AggregationType specify. AggregationType has the following modes:
    • SUM: Sum up the values in multiple rows.
    • REPLACE: Replace the previous value with the newly imported value.
    • MAX: Keep the maximum value.
    • MIN: Keep the minimum value.
      Table 2 User information table

      user_id

      date

      city

      age

      sex

      last_visit_date

      cost

      max_dwell_time

      min_dwell_time

      10000

      2017-10-01

      A

      20

      0

      2017-10-01 06:00:00

      20

      10

      10

      10000

      2017-10-01

      A

      20

      0

      2017-10-01 07:00:00

      15

      2

      2

      10001

      2017-10-01

      A

      30

      1

      2017-10-01 17:05:45

      2

      22

      22

      10002

      2017-10-02

      B

      20

      1

      2017-10-02 12:59:12

      200

      5

      5

      10003

      2017-10-02

      C

      32

      0

      2017-10-02 11:20:00

      30

      11

      11

      10004

      2017-10-01

      D

      35

      0

      2017-10-01 10:00:15

      100

      3

      3

      10004

      2017-10-03

      D

      35

      0

      2017-10-03 10:20:22

      11

      6

      6

    Assume that this is a table recording user behaviors when they are visiting a certain commodity page. The first row of data, for example, is explained as follows:

    Table 3 Parameters

    Value

    Description

    10000

    User ID, which uniquely identifies a user.

    2017-10-01

    Time when data is imported to the database. The value is accurate to date.

    A

    City where a user is located

    20

    User age

    0

    Gender: male (1 indicates female)

    2017-10-01 06:00:00

    Time when a user visits the page. The value is accurate to second.

    20

    Consumption generated by the current visit

    10

    Time spent on the page during the current visit

    10

    Time spent on the page during the current visit (redundancy)

    After this batch of data is imported into Doris correctly, it will be stored in Doris as follows:

    Table 4 Inserting data

    user_id

    date

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10000

    2017-10-01

    A

    20

    0

    2017-10-01 07:00:00

    35

    10

    2

    10001

    2017-10-01

    A

    30

    1

    2017-10-01 17:05:45

    2

    22

    22

    10002

    2017-10-02

    B

    20

    1

    2017-10-02 12:59:12

    200

    5

    5

    10003

    2017-10-02

    C

    32

    0

    2017-10-02 11:20:00

    30

    11

    11

    10004

    2017-10-01

    D

    35

    0

    2017-10-01 10:00:15

    100

    3

    3

    10004

    2017-10-03

    D

    35

    0

    2017-10-03 10:20:22

    11

    6

    6

    As shown in the figure, the data of user 10000 has been aggregated to one row, while those of other users remain the same. The explanation for the aggregated data of User 10000 is as follows.

    The first 5 columns remain unchanged, so it starts with Column 6 last_visit_date.

    • 2017-10-01 07:00:00: The last_visit_date column is aggregated by REPLACE, so 2017-10-01 07:00 has replaced 2017-10-01 06:00.

      When using REPLACE to aggregate data from the same import batch, the order of replacement is uncertain. That means, in this case, the data eventually saved in Doris could be 2017-10-01 06:00. However, for different import batches, data from the new batch will replace those from the old batch.

    • 35: The cost column is aggregated by SUM, so the update value 35 is the result of 20 plus 15.
    • 10: The max_dwell_time column is aggregated by MAX, so 10 is saved as it is the maximum between 10 and 2.
    • 2: The min_dwell_time column is aggregated by MIN, so 2 is saved as it is the minimum between 10 and 2.

      After aggregation, Doris only stores the aggregated data. The detailed raw data is not retained.

  • Example 2: Keeping detailed data

    Here is a modified version of the table schema in Example 1:

    Table 5 Data description

    ColumnName

    Type

    AggregationType

    Comment

    user_id

    LARGEINT

    -

    User ID

    date

    DATE

    -

    Data import date

    timestamp

    DATETIME

    -

    Date and time when the data is imported. The value is accurate to seconds.

    city

    VARCHAR(20)

    -

    City where the user is located

    age

    SMALLINT

    -

    User age

    sex

    TINYINT

    -

    User gender

    last_visit_date

    DATETIME

    REPLACE

    Last visit date

    cost

    BIGINT

    SUM

    Total consumption

    max_dwell_time

    INT

    MAX

    Maximum residence time

    min_dwell_time

    INT

    MIN

    Minimum residence time

    A new column timestamp (accurate to seconds) has been added to record the date and time when the data is imported.

    In addition, AGGREGATE KEY is set to AGGREGATE KEY(user_id, date, timestamp, city, age, sex).

    Suppose that the imported data is as follows:

    Table 6 User information table

    user_id

    date

    timestamp

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10000

    2017-10-01

    2017-10-01 08:00:05

    A

    20

    0

    2017-10-01 06:00:00

    20

    10

    10

    10000

    2017-10-01

    2017-10-01 09:00:05

    A

    20

    0

    2017-10-01 07:00:00

    15

    2

    2

    10001

    2017-10-01

    2017-10-01 18:12:10

    A

    30

    1

    2017-10-01 17:05:45

    2

    22

    22

    10002

    2017-10-02

    2017-10-02 13:10:00

    B

    20

    1

    2017-10-02 12:59:12

    200

    5

    5

    10003

    2017-10-02

    2017-10-02 13:15:00

    C

    32

    0

    2017-10-02 11:20:00

    30

    11

    11

    10004

    2017-10-01

    2017-10-01 12:12:48

    D

    35

    0

    2017-10-01 10:00:15

    100

    3

    3

    10004

    2017-10-03

    2017-10-03 12:38:20

    D

    35

    0

    2017-10-03 10:20:22

    11

    6

    6

    After this batch of data is imported into Doris correctly, it will be stored in Doris as follows:

    Table 7 Stored data

    user_id

    date

    timestamp

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10000

    2017-10-01

    2017-10-01 08:00:05

    A

    20

    0

    2017-10-01 06:00:00

    20

    10

    10

    10000

    2017-10-01

    2017-10-01 09:00:05

    A

    20

    0

    2017-10-01 07:00:00

    15

    2

    2

    10001

    2017-10-01

    2017-10-01 18:12:10

    A

    30

    1

    2017-10-01 17:05:45

    2

    22

    22

    10002

    2017-10-02

    2017-10-02 13:10:00

    B

    20

    1

    2017-10-02 12:59:12

    200

    5

    5

    10003

    2017-10-02

    2017-10-02 13:15:00

    C

    32

    0

    2017-10-02 11:20:00

    30

    11

    11

    10004

    2017-10-01

    2017-10-01 12:12:48

    D

    35

    0

    2017-10-01 10:00:15

    100

    3

    3

    10004

    2017-10-03

    2017-10-03 12:38:20

    D

    35

    0

    2017-10-03 10:20:22

    11

    6

    6

  • Example 3: Aggregate the imported data with the existing data

    Based on the table in Example 1, suppose that you have the following data stored in Doris:

    Table 8 User information table

    user_id

    date

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10000

    2017-10-01

    A

    20

    0

    2017-10-01 07:00:00

    35

    10

    2

    10001

    2017-10-01

    A

    30

    1

    2017-10-01 17:05:45

    2

    22

    22

    10002

    2017-10-02

    B

    20

    1

    2017-10-02 12:59:12

    200

    5

    5

    10003

    2017-10-02

    C

    32

    0

    2017-10-02 11:20:00

    30

    11

    11

    10004

    2017-10-01

    D

    35

    0

    2017-10-01 10:00:15

    100

    3

    3

    10004

    2017-10-03

    D

    35

    0

    2017-10-03 10:20:22

    11

    6

    6

    Now you need to import a new batch of data:

    Table 9 New data

    user_id

    date

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10004

    2017-10-03

    D

    35

    0

    2017-10-03 11:22:00

    44

    19

    19

    10005

    2017-10-03

    E

    29

    1

    2017-10-03 18:11:02

    3

    1

    1

    After this batch of data is imported into Doris correctly, the data stored in Doris will be updated as follows:

    Table 10

    user_id

    date

    city

    age

    sex

    last_visit_date

    cost

    max_dwell_time

    min_dwell_time

    10000

    2017-10-01

    A

    20

    0

    2017-10-01 07:00:00

    35

    10

    2

    10001

    2017-10-01

    A

    30

    1

    2017-10-01 17:05:45

    2

    22

    22

    10002

    2017-10-02

    B

    20

    1

    2017-10-02 12:59:12

    200

    5

    5

    10003

    2017-10-02

    C

    32

    0

    2017-10-02 11:20:00

    30

    11

    11

    10004

    2017-10-01

    D

    35

    0

    2017-10-01 10:00:15

    100

    3

    3

    10004

    2017-10-03

    D

    35

    0

    2017-10-03 11:22:00

    55

    19

    6

    10005

    2017-10-03

    E

    29

    1

    2017-10-03 18:11:02

    3

    1

    1

    In this table, the existing data and the newly imported data of User 10004 have been aggregated. Meanwhile, the new data of User 10005 has been added.

    In Doris, data aggregation happens in the following three stages:

    • The ETL stage of each batch of import data. At this stage, the batch of import data will be aggregated internally.
    • The data compaction stage of the underlying BE nodes. At this stage, BE nodes will aggregate data from different imported batches.
    • The data query stage. The data involved in the query will be aggregated accordingly.

    At different stages, data will be aggregated to varying degrees. For example, when a batch of data is just imported, it may not be aggregated with the existing data. But users can only query aggregated data and they should not assume that what they have seen are not or partly aggregated.

Unique Model

In some multi-dimensional analysis scenarios, users are highly concerned about how to create uniqueness constraints for the Primary Key. The Unique model is introduced to solve this problem. This model is a special case of the Aggregate model and a simplified representation of table schema. Assume that the business has the following data table schema:

Unique model table. You are not advised to enable the merge-on-write attribute. The merge-on-read attribute is used by default.

Table 11 Data description

ColumnName

Type

IsKey

Comment

user_id

BIGINT

Yes

User ID

username

VARCHAR(50)

Yes

Username

city

VARCHAR(20)

No

City where the user is located

age

SMALLINT

No

User age

sex

TINYINT

No

User gender

phone

LARGEINT

No

Phone number of a user

address

VARCHAR(500)

No

User address

register_time

DATETIME

No

User registration time

This is a table that contains the basic information of a user. There is no aggregation requirement for such data. The only concern is to ensure the uniqueness of the primary key (user_id+username). The CREATE TABLE statement would be as follows:

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
    `user_id` LARGEINT NOT NULL COMMENT "User ID",
    `username` VARCHAR(50) NOT NULL COMMENT "Username",
    `city` VARCHAR(20) COMMENT "City where the user is located",
    `age` SMALLINT COMMENT "User age",
    `sex` TINYINT COMMENT "User gender",
    `phone` LARGEINT COMMENT "User phone number",
    `address` VARCHAR(500) COMMENT "User address",
    `register_time` DATETIME COMMENT "User registration time"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3"
);

This table schema is the same as the following table schema using the Aggregate model:

Table 12 Data description

ColumnName

Type

AggregationType

Comment

user_id

BIGINT

-

User ID

username

VARCHAR(50)

-

Username

city

VARCHAR(20)

REPLACE

City where the user is located

age

SMALLINT

REPLACE

User age

sex

TINYINT

REPLACE

User gender

phone

LARGEINT

REPLACE

Phone number of a user

address

VARCHAR(500)

REPLACE

User address

register_time

DATETIME

REPLACE

User registration time

The CREATE TABLE statement would be as follows:

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
    `user_id` LARGEINT NOT NULL COMMENT "User ID",
    `username` VARCHAR(50) NOT NULL COMMENT "Username",
    `city` VARCHAR(20) REPLACE COMMENT "City where the user is located",
    `age` SMALLINT REPLACE COMMENT "User age",
    `sex` TINYINT REPLACE COMMENT "User gender",
    `phone` LARGEINT REPLACE COMMENT "User phone number",
    `address` VARCHAR(500) REPLACE COMMENT "User address",
    `register_time` DATETIME REPLACE COMMENT "User registration time"
)
AGGREGATE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3"
);

The Unique Model is equivalent to the REPLACE aggregation function in the Aggregate model. The internal implementation and data storage are exactly the same.

Duplicate Model

In some multi-dimensional analysis scenarios, there is no need for primary keys or data aggregation. Duplicate models can be introduced to meet such requirements.
Table 13 Data

ColumnName

Type

SortKey

Comment

timestamp

DATETIME

Yes

Log time

type

INT

Yes

Log type

error_code

INT

Yes

Error code

error_msg

VARCHAR(1024)

No

Error details

op_id

BIGINT

No

Operator ID

op_time

DATETIME

No

Operation time

The CREATE TABLE statement would be as follows:

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
    `timestamp` DATETIME NOT NULL COMMENT "Log time",
    `type` INT NOT NULL COMMENT "Log type",
    `error_code` INT COMMENT "Error code",
    `error_msg` VARCHAR(1024) COMMENT "Error details",
    `op_id` BIGINT COMMENT "Operator ID",
    `op_time` DATETIME COMMENT "Operation time"
)
DUPLICATE KEY(`timestamp`, `type`)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3"
);

Different from the Aggregate and Unique models, the Duplicate model stores the data as it is and executes no aggregation. Even if there are two identical rows of data, they will both be retained. The "DUPLICATE KEY" in the CREATE TABLE statement is only used to specify based on which columns the data are sorted. A more appropriate name than "DUPLICATE KEY" would be "Sorted Column", but it is named as such to specify the data model used. The DUPLICATE KEY is suitable for the first 2 to 4 columns.

The Duplicate model is suitable for storing raw data without aggregation requirements or primary key uniqueness constraints. For more usage scenarios, see the part "Limitations of the Aggregate Model".

Limitations of the Aggregate Model

  • Aggregate model and Unique models

    The following describes the limitations of the Aggregate model. (The Unique model will also be discussed.)

    The Aggregate model only presents the aggregated data. That is, for any data that has not been aggregated (for example, data in two different import batches), the consistency of the data displayed externally must be ensured.

    Suppose that you have the following table schema:

    Table 14 Data

    ColumnName

    Type

    AggregationType

    Comment

    user_id

    LARGEINT

    -

    User ID

    date

    DATE

    -

    Data import date

    cost

    BIGINT

    SUM

    Total consumption

    Assume that there are two batches of data that have been imported into the storage engine as follows:

    batch1

    Table 15 Data

    user_id

    date

    cost

    10001

    2017-11-20

    50

    10002

    2017-11-21

    39

    batch2

    Table 16 Data

    user_id

    date

    cost

    10001

    2017-11-20

    1

    10001

    2017-11-21

    5

    10003

    2017-11-22

    22

    Data about user 10001 in these two import batches has not yet been aggregated. However, users can only query the aggregated data shown in the following table.

    Table 17 Data

    user_id

    date

    cost

    10001

    2017-11-20

    51

    10001

    2017-11-21

    5

    10002

    2017-11-21

    39

    10003

    2017-11-22

    22

    Create a data table.

    CREATE TABLE IF NOT EXISTS example_db.expamle_tb2
    (
        `user_id` LARGEINT NOT NULL COMMENT "User ID",
        `date` DATE NOT NULL COMMENT "Data import date and time",
        `cost` BIGINT SUM DEFAULT "0" COMMENT "Total consumption",
    )
    AGGREGATE KEY(`user_id`, `date`)
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
    PROPERTIES (
    "replication_allocation" = "tag.location.default: 3"
    );

    Insert the data of Table 15 and Table 16.

    INSERT INTO example_db.expamle_tb2 (user_id,date,cost) VALUES('10001','2017-11-20','50'),('10002','2017-11-21','39'),('10001','2017-11-20','1'),('10001','2017-11-21','5'),('10003','2017-11-22','22');

    The aggregation operator is added to the query engine to ensure data consistency.

    In addition, on the aggregated columns (Value columns), when executing aggregation class queries that are inconsistent with the aggregation function, pay attention to the semantics. For example, in the example above, if you execute the following query:

    mysql> SELECT MIN(cost) FROM example_db.expamle_tb2;
    +-------------+
    | min(`cost`) |
    +-------------+
    |           5 |
    +-------------+
    1 row in set (0.02 sec)

    The result is 5, not 1.

    In addition, this consistency guarantee could considerably reduce efficiency in some queries.

    Take the basic count (*) query as an example:

    SELECT COUNT(*) FROM table;

    In other databases, such queries return results quickly. Because in actual implementation, the models can get the query result by counting rows and saving the statistics upon import, or by scanning only one certain column of data to get count value upon query, with very little overhead. But in Doris's Aggregate model, the overhead of such queries is large. Take the following two batches of data as example:

    batch1

    Table 18 Data

    user_id

    date

    cost

    10001

    2017-11-20

    50

    10002

    2017-11-21

    39

    batch2

    Table 19 Data

    user_id

    date

    cost

    10001

    2017-11-20

    1

    10001

    2017-11-21

    5

    10003

    2017-11-22

    22

    The final aggregation result is as follows:

    Table 20 Data

    user_id

    date

    cost

    10001

    2017-11-20

    51

    10001

    2017-11-21

    5

    10002

    2017-11-21

    39

    10003

    2017-11-22

    22

    Query the result.

    mysql> SELECT COUNT(date) FROM example_db.expamle_tb2;
    +---------------+
    | count(`date`) |
    +---------------+
    |             4 |
    +---------------+
    1 row in set (0.01 sec)

    The correct result of select count (*) from table; should be 4. But if the model only scans the user_id column and operates aggregation upon query, the final result will be 3 (10001, 10002, 10003). And if it does not operate aggregation, the final result will be 5 (a total of five rows in two batches). Apparently, both results are wrong.

    In order to get the correct result, we must read both the user_id and date columns, and perform aggregation when querying. In the count (*) query, Doris must scan all AGGREGATE KEY columns (the user_id and date columns) and aggregate them to get the semantically correct results. If there are many aggregated columns, count (*) queries could involve scanning large amounts of data.

    Therefore, if you need to perform frequent count (*) queries, simulate count (*) by adding a column of value 1 and the SUM aggregation function. The table schema in the previous example will be modified as follows:

    Table 21 Data description

    ColumnName

    Type

    AggregateType

    Comment

    user_id

    BIGINT

    -

    User ID

    date

    DATE

    -

    Data import date

    cost

    BIGINT

    SUM

    Total consumption

    count

    BIGINT

    SUM

    Used for count queries

    Add a count column, the value of which will always be 1. The result of select count(*) from table; is equivalent to that of select sum(count) from table;. The latter is much more efficient than the former. However, this method has its shortcomings. It requires that users will not import rows with the same values in the AGGREGATE KEY columns. Otherwise, select sum (count) from table; can only express the number of rows of the originally imported data, instead of the semantics of select count (*) from table;.

    Another method is to add a count column of value 1 but use the REPLACE aggregation function. Then select sum (count) from table; and select count (*) from table; could produce the same results. This method does not have restrictions on duplicate row import.

  • Duplicate

    The Duplicate model does not impose the same limitations as the Aggregate model because the Duplicate model does not involve aggregation semantics. For any columns, the Duplicate model can return the semantically correct results in count (*) queries.

Data Model Selection

The Doris data model is classified into three types: AGGREGATE KEY, UNIQUE KEY, and DUPLICATE KEY. Data in all three models is sorted by KEY.

  • AGGREGATE KEY models

    The AGGREGATE KEY model aggregates data in advance, greatly reducing data scanning and calculation workload. Therefore, it is suitable for reporting query business, which has fixed schema. But it is not suitable for count(*) queries. In addition, because the aggregation function in Value columns is fixed, semantic correctness needs to be considered when aggregation queries using other functions are performed.

    When AGGREGATE KEY is the same, old and new records are aggregated. The aggregation functions currently supported are SUM, MIN, MAX, REPLACE.

    CREATE TABLE site_visit
    (
        siteid      INT,
        city        SMALLINT,
        username    VARCHAR(32),
        pv BIGINT   SUM DEFAULT '0'
    )
    AGGREGATE KEY(siteid, city, username)
    DISTRIBUTED BY HASH(siteid) BUCKETS 10;
  • UNIQUE KEY models

    The UNIQUE KEY model applies to scenarios where a unique primary key constraint is required. If UNIQUE KEY is the same, the new record overwrites the old record to ensure that the primary key constraint is unique. This model is suitable for analytical business with updated requirements. The UNIQUE KEY model implements the same REPLACE aggregation function as the AGGREGATE KEY model, and they are essentially the same. However, the high-performance query with pre-aggregation using functions such as ROLLUP cannot be used (because the REPLACE function is used instead of SUM).

    CREATE TABLE sales_order
    (
        orderid     BIGINT,
        status      TINYINT,
        username    VARCHAR(32),
        amount      BIGINT DEFAULT '0'
    )
    UNIQUE KEY(orderid)
    DISTRIBUTED BY HASH(orderid) BUCKETS 10;
  • DUPLICATE KEY models

    The DUPLICATE KEY model does not merge same rows and is suitable for ad-hoc queries in any dimension. Although the pre-aggregation feature cannot be used, the DUPLICATE KEY models do not impose the same limitations as the AGGREGATE KEY models. It supports column tailoring and vectorized execution.

    CREATE TABLE session_data
    (
        visitorid   SMALLINT,
        sessionid   BIGINT,
        visittime   DATETIME,
        city        CHAR(20),
        province    CHAR(20),
        ip          varchar(32),
        brower      CHAR(20),
        url         VARCHAR(1024)
    )
    DUPLICATE KEY(visitorid, sessionid)
    DISTRIBUTED BY HASH(sessionid, visitorid) BUCKETS 10;