Help Center/ TaurusDB/ User Guide/ HTAP Analysis (Standard Edition)/ What Is HTAP of Standard Edition
Updated on 2024-11-06 GMT+08:00

What Is HTAP of Standard Edition

Hybrid Transactional and Analytical Processing (HTAP) is a data architecture that handles both online transactional processing (OLTP) and online analytical processing (OLAP) workloads.

It uses the column-based storage engine and Single Instruction Multiple Data (SIMD) for parallel compute. In massive data analysis scenarios, HTAP analysis provided by GaussDB(for MySQL) frees you from having to independently maintain data extraction and synchronization links. It reduces data management costs and provides simple and efficient real-time data analysis.

HTAP of Standard Edition is developed based on the open-source StarRocks.

Product Architecture

HTAP instances are deployed on ECSs and use extreme SSDs or ultra-high I/O disks.

You can enable binlog of your GaussDB(for MySQL) instance to synchronize data and operations to HTAP instances. Synchronized operations include inserting table, deleting tables, and changing table structures. After data is synchronized to an HTAP instance, you can access the HTAP instance through its private IP address for data analysis.

An HTAP instance of the standard edition provides frontend (FE) and backend (BE) nodes. The FE nodes manage metadata, manage client connections, and plan and schedule queries. Each FE node stores and maintains a complete metadata backup in the memory to ensure data consistency between FE nodes. The BE nodes are for data storage and SQL computing.

Figure 1 Product architecture

There are three roles for FE nodes.

Figure 2 FE node roles

  • The fe-leader nodes read and write metadata. The fe-follower and fe-observer nodes can only read metadata and route write requests for metadata to the fe-leader nodes. The fe-leader nodes update the metadata and synchronize the metadata changes to the fe-follower and fe-observer nodes.
  • The fe-follower nodes can only read metadata.
  • The fe-observer nodes synchronize and replay logs from the fe-leader nodes to update metadata. The fe-observer nodes are used to increase query concurrency of a cluster.

Main Features

  • Massively Parallel Processing (MPP) architecture

    Multiple nodes are used to execute queries in parallel.

  • High performance

    It supports vectorized engines and CBO optimizers and excels in queries for large and wide tables and multi-table join operations.

  • Standard SQL

    Query statements comply with the SQL-92 standard.

  • Data compression for storage

    Column-based storage and data compression greatly reduce your storage costs for any given set of conditions.

  • Aggregation of multiple data sources

    Data in multiple GaussDB(for MySQL) databases can be synchronized to a given HTAP instance.

Billing

Standard HTAP instances are in OBT and are free.

Precautions

To improve the stability and security of standard HTAP instances, GaussDB(for MySQL) has certain constraints in place.

  1. When you query data in an HTAP instance, the names of databases, tables, views, users, and roles are case sensitive, but the names of columns and partitions are case insensitive.
  2. Tables in a GaussDB(for MySQL) database can be synchronized only when primary keys are defined for the tables.
  3. Some DDL statements executed on GaussDB(for MySQL) instances cannot be synchronized to HTAP instances, which may cause synchronization failures or data inconsistencies.

    The DDL statements that can be synchronized and cannot be synchronized are as follows:

    • DDL statements that can be synchronized
      Table 1 DDL statements that can be synchronized

      DDL Name

      SQL Example

      Creating a table

      CREATE TABLE tbl_name (c_id int not null, c_d_id integer not null, primary key (c_id));

      Dropping a table

      DROP TABLE tbl_name;

      Renaming a table

      RENAME TABLE tbl_name to new_tbl_name;

      ALTER TABLE tbl_name RENAME TO new_tbl_name;

      Clearing table data

      TRUNCATE TABLE tbl_name;

      Altering table comments

      ALTER TABLE tbl_name COMMENT='test';

      Adding a column (non-primary key column)

      ALTER TABLE tbl_name ADD c_varchar varchar(2000) AFTER c_tinytext;

      Deleting a column (non-primary key column)

      ALTER TABLE tbl_name DROP c_vchar;

      Changing the type and sequence of a column (non-primary key column)

      ALTER TABLE tbl_name CHANGE c_vchar c_vchar varchar(2000) default 'test' AFTER c_tinytext;

      CAUTION:

      The column name and default value cannot be changed.

      ALTER TABLE tbl_name MODIFY c_vchar varchar(2100) default 'test' AFTER c_tinytext;

      CAUTION:

      The default value cannot be changed.

    • DDL statements that cannot be synchronized

      After a database synchronization task, only tables and data can be synchronized. Operations for databases, tablespaces, indexes, foreign keys, functions, stored procedures, triggers, views, partitions (DELETE operations), primary keys (INSERT/DELETE/ALTER operations), transactions, users, roles, permissions, and events cannot be synchronized.

      Table 2 list index-related operations that cannot be synchronized.

      Table 2 Index-related operations that cannot be synchronized

      DDL Name

      SQL Example

      Adding an index

      ALTER TABLE tbl_name ADD INDEX name;

      Renaming an index

      ALTER TABLE tbl_name RENAME INDEX old_index_name TO new_index_name;

      Dropping an index

      DROP INDEX name ON table;

      Adding a full-text index

      CREATE FULLTEXT INDEX name ON table(column);

      Adding a spatial index

      ALTER TABLE geom ADD SPATIAL INDEX(g);

      Modifying the type of an index

      ALTER TABLE tbl_name DROP INDEX i1, ADD INDEX i1(key_part,...) USING BTREE;

      Adding an index constraint

      ALTER TABLE tbl_name ADD CONSTRAINT UNIQUE USING BTREE (column);

      ALTER TABLE tbl_name ADD CONSTRAINT UNIQUE USING HASH(column);

      Table 3 list partitioned table-related operations that cannot be synchronized

      Table 3 Partitioned table-related operations that cannot be synchronized

      DDL Name

      SQL Example

      Analyzing a table partition

      ALTER TABLE {db}.tp ANALYZE PARTITION p0;

      Checking a table partition

      ALTER TABLE {db}.tp CHECK PARTITION p0;

      Optimizing a table partition

      ALTER TABLE {db}.tp OPTIMIZE PARTITION p0;

      Re-building a table partition

      ALTER TABLE {db}.tp REBUILD PARTITION p0;

      Repairing a table partition

      ALTER TABLE {db}.tp REPAIR PARTITION p0;

      Creating a database

      CREATE DATABASE ddl_test_2;

      Modifying a row format

      ALTER TABLE tbl_name ROW_FORMAT = row_format;

      Setting persistent table statistics

      ALTER TABLE tbl_name STATS_PERSISTENT=0, STATS_SAMPLE_PAGES=20,STATS_AUTO_RECALC=1, ALGORITHM=INPLACE, LOCK=NONE;

      Setting a table character set

      ALTER TABLE tbl_name CHARACTER SET = charset_name;

      Converting a table character set

      ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;

      Optimizing a table

      OPTIMIZE TABLE tbl_name;

      Rebuilding a table using the FORCE option

      ALTER TABLE tbl_name FORCE;

      Rebuilding a table without data

      ALTER TABLE tbl_name ENGINE=InnoDB;

      Renaming a tablespace

      ALTER TABLESPACE tablespace_name RENAME TO new_tablespace_name;

      Adding a table partition

      ALTER TABLE {db}.tp ADD PARTITION (PARTITION p3 VALUES LESS THAN (2006) );

      Setting the default character set and verification rules for a table

      ALTER TABLE tbl_name DEFAULT CHARACTER SET = utf8 COLLATE = utf8_general_ci;

      Table creation statements cannot contain CHECK or table options.

      During data synchronization, operations in Table 4 may cause data inconsistency between HTAP instances and GaussDB(for MySQL) instances. You should avoid these operations.

      They do not affect data query and analysis on HTAP instances.

      Table 4 DDL operations that result in data inconsistency

      DDL Name

      SQL Example

      Deleting a primary key

      ALTER TABLE tbl_name DROP PRIMARY KEY;

      Adding a primary key

      ALTER TABLE {db}.t1 ADD PRIMARY KEY (id);

      Adding a primary key and deleting a primary key

      ALTER TABLE tbl_name DROP PRIMARY KEY, ADD PRIMARY KEY (column);

      Setting a primary key to NULL

      ALTER TABLE tbl_name MODIFY COLUMN key_column_name data_type;

      Changing the type of a primary key

      ALTER TABLE tbl_name MODIFY COLUMN key_column_name data_type not null;

      Adding a foreign key

      ALTER TABLE tbl1 ADD CONSTRAINT fk_name FOREIGN KEY index (col1)REFERENCES tbl2(col2) referential_actions;

      Deleting a foreign key

      ALTER TABLE tbl DROP FOREIGN KEY fk_name;

      Adding a column

      NOTE:

      Common columns can be added. If columns contain the following default values, they cannot be added.

      • Double quotation marks (")
      • Functions, character strings, and identifiers that cannot be found in HTAP instances

      ALTER TABLE tbl_name ADD COLUMN column_name column_definition

      c VARCHAR(10) DEFAULT (CONCAT('1', '2'));

      Setting the default value of a column

      NOTE:

      If columns contain the following default values, you cannot reset default values for the columns.

      • Double quotation marks (")
      • Functions, character strings, and identifiers that cannot be found in HTAP instances

      ALTER TABLE tbl_name ALTER COLUMN col SET DEFAULT literal;

      Changing NULL in tables to NOT NULL

      ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NOT NULL;

      Changing the column name and type at the same time

      ALTER TABLE t1 CHANGE b b1 VARCHAR(100);

      Changing the name of a column

      ALTER TABLE t1 RENAME COLUMN a TO b;

      Creating a table without a primary key

      ALTER TABLE t1 ADD COLUMN (c2 INT GENERATED ALWAYS AS (c1 + 1)STORED);

      Adding a STORED derived column

      ALTER TABLE {db}.t1 ADD COLUMN (st2 INT GENERATED ALWAYS AS (c2 + 2)STORED), ALGORITHM=COPY;

      Adding a VIRTUAL derived column

      ALTER TABLE t1 ADD COLUMN (c2 INT GENERATED ALWAYS AS (c1 + 1)VIRTUAL);

      Dropping a table partition

      ALTER TABLE {db}.tp DROP PARTITION p4;

      Discarding a table partition

      ALTER TABLE {db}.tp DISCARD PARTITION p2 TABLESPACE;

      Importing a table partition

      ALTER TABLE {db}.tp IMPORT PARTITION p2 TABLESPACE;

      Truncating a table partition

      ALTER TABLE {db}.tp TRUNCATE PARTITION p2;

      Truncating a partitioned table

      TRUNCATE {db}.tp;

      Coalescing table partitions

      ALTER TABLE {db}.tp_hash COALESCE PARTITION 2;

      Reorganizing table partitions

      ALTER TABLE {db}.tp REORGANIZE PARTITION p0,p1,p2,p3 INTO ( ... );

      Exchanging table partitions

      ALTER TABLE {db}.tp EXCHANGE PARTITION p0 WITH TABLE {db}.tp2;

      Removing a table partition

      ALTER TABLE {db}.tp REMOVE PARTITIONING;

      Using a REPLACE clause

      CREATE OR REPLACE TABLE;

  4. The names of the databases and tables to be synchronized cannot contain Chinese characters.
  5. To improve performance, you can use the following methods to optimize queries:
    • Simplify SQL statements by reducing invalid calculations, deleting unused fields, and avoiding SELECT.
    • Instead of querying all columns, delete those that are unnecessary.
  6. Tables to be synchronized use the OLAP engine and primary key model by default.
  7. After connecting to a standard HTAP instance, run the following command to view the databases synchronized from GaussDB(for MySQL) to the standard HTAP instance and the synchronization status.
    As this command queries the binlog information in GaussDB(for MySQL), it may consume the hourly query quota. You are advised to run this command at most once a minute.
    show sync job;