What Is HTAP of Standard Edition

Hybrid Transactional and Analytical Processing (HTAP) is a data architecture that handles both online transactional processing (OLTP) and online analytical processing (OLAP) workloads.

It uses the column-based storage engine and Single Instruction Multiple Data (SIMD) for parallel compute. In massive data analysis scenarios, HTAP analysis provided by GaussDB(for MySQL) frees you from having to independently maintain data extraction and synchronization links. It reduces data management costs and provides simple and efficient real-time data analysis.

HTAP of Standard Edition is developed based on the open-source StarRocks.

Product Architecture

HTAP instances are deployed on ECSs and use extreme SSDs or ultra-high I/O disks.

You can enable binlog of your GaussDB(for MySQL) instance to synchronize data and operations to HTAP instances. Synchronized operations include inserting table, deleting tables, and changing table structures. After data is synchronized to an HTAP instance, you can access the HTAP instance through its private IP address for data analysis.

An HTAP instance of the standard edition provides frontend (FE) and backend (BE) nodes. The FE nodes manage metadata, manage client connections, and plan and schedule queries. Each FE node stores and maintains a complete metadata backup in the memory to ensure data consistency between FE nodes. The BE nodes are for data storage and SQL computing.

Figure 1 Product architecture

There are three roles for FE nodes.

Figure 2 FE node roles

The fe-leader nodes read and write metadata. The fe-follower and fe-observer nodes can only read metadata and route write requests for metadata to the fe-leader nodes. The fe-leader nodes update the metadata and synchronize the metadata changes to the fe-follower and fe-observer nodes.
The fe-follower nodes can only read metadata.
The fe-observer nodes synchronize and replay logs from the fe-leader nodes to update metadata. The fe-observer nodes are used to increase query concurrency of a cluster.

Main Features

Massively Parallel Processing (MPP) architecture
Multiple nodes are used to execute queries in parallel.
High performance
It supports vectorized engines and CBO optimizers and excels in queries for large and wide tables and multi-table join operations.
Standard SQL
Query statements comply with the SQL-92 standard.
Data compression for storage
Column-based storage and data compression greatly reduce your storage costs for any given set of conditions.
Aggregation of multiple data sources
Data in multiple GaussDB(for MySQL) databases can be synchronized to a given HTAP instance.

Billing

Standard HTAP instances are in OBT and are free.

Precautions

To improve the stability and security of standard HTAP instances, GaussDB(for MySQL) has certain constraints in place.

When you query data in an HTAP instance, the names of databases, tables, views, users, and roles are case sensitive, but the names of columns and partitions are case insensitive.
Tables in a GaussDB(for MySQL) database can be synchronized only when primary keys are defined for the tables.

Some DDL statements executed on GaussDB(for MySQL) instances cannot be synchronized to HTAP instances, which may cause synchronization failures or data inconsistencies.

The DDL statements that can be synchronized and cannot be synchronized are as follows:

DDL statements that can be synchronized

**Table 1** DDL statements that can be synchronized
DDL Name	SQL Example
Creating a table	CREATE TABLE tbl_name (c_id int not null, c_d_id integer not null, primary key (c_id));
Dropping a table	DROP TABLE tbl_name;
Renaming a table	RENAME TABLE tbl_name to new_tbl_name; ALTER TABLE tbl_name RENAME TO new_tbl_name;
Clearing table data	TRUNCATE TABLE tbl_name;
Altering table comments	ALTER TABLE tbl_name COMMENT='test';
Adding a column (non-primary key column)	ALTER TABLE tbl_name ADD c_varchar varchar(2000) AFTER c_tinytext;
Deleting a column (non-primary key column)	ALTER TABLE tbl_name DROP c_vchar;
Changing the type and sequence of a column (non-primary key column)	ALTER TABLE tbl_name CHANGE c_vchar c_vchar varchar(2000) default 'test' AFTER c_tinytext; CAUTION: The column name and default value cannot be changed. ALTER TABLE tbl_name MODIFY c_vchar varchar(2100) default 'test' AFTER c_tinytext; CAUTION: The default value cannot be changed.

DDL statements that cannot be synchronized

After a database synchronization task, only tables and data can be synchronized. Operations for databases, tablespaces, indexes, foreign keys, functions, stored procedures, triggers, views, partitions (DELETE operations), primary keys (INSERT/DELETE/ALTER operations), transactions, users, roles, permissions, and events cannot be synchronized.

Table 2 list index-related operations that cannot be synchronized.

**Table 2** Index-related operations that cannot be synchronized
DDL Name	SQL Example
Adding an index	ALTER TABLE tbl_name ADD INDEX name;
Renaming an index	ALTER TABLE tbl_name RENAME INDEX old_index_name TO new_index_name;
Dropping an index	DROP INDEX name ON table;
Adding a full-text index	CREATE FULLTEXT INDEX name ON table(column);
Adding a spatial index	ALTER TABLE geom ADD SPATIAL INDEX(g);
Modifying the type of an index	ALTER TABLE tbl_name DROP INDEX i1, ADD INDEX i1(key_part,...) USING BTREE;
Adding an index constraint	ALTER TABLE tbl_name ADD CONSTRAINT UNIQUE USING BTREE (column); ALTER TABLE tbl_name ADD CONSTRAINT UNIQUE USING HASH(column);

Table 3 list partitioned table-related operations that cannot be synchronized

**Table 3** Partitioned table-related operations that cannot be synchronized
DDL Name	SQL Example
Analyzing a table partition	ALTER TABLE {db}.tp ANALYZE PARTITION p0;
Checking a table partition	ALTER TABLE {db}.tp CHECK PARTITION p0;
Optimizing a table partition	ALTER TABLE {db}.tp OPTIMIZE PARTITION p0;
Re-building a table partition	ALTER TABLE {db}.tp REBUILD PARTITION p0;
Repairing a table partition	ALTER TABLE {db}.tp REPAIR PARTITION p0;
Creating a database	CREATE DATABASE ddl_test_2;
Modifying a row format	ALTER TABLE tbl_name ROW_FORMAT = row_format;
Setting persistent table statistics	ALTER TABLE tbl_name STATS_PERSISTENT=0, STATS_SAMPLE_PAGES=20,STATS_AUTO_RECALC=1, ALGORITHM=INPLACE, LOCK=NONE;
Setting a table character set	ALTER TABLE tbl_name CHARACTER SET = charset_name;
Converting a table character set	ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
Optimizing a table	OPTIMIZE TABLE tbl_name;
Rebuilding a table using the FORCE option	ALTER TABLE tbl_name FORCE;
Rebuilding a table without data	ALTER TABLE tbl_name ENGINE=InnoDB;
Renaming a tablespace	ALTER TABLESPACE tablespace_name RENAME TO new_tablespace_name;
Adding a table partition	ALTER TABLE {db}.tp ADD PARTITION (PARTITION p3 VALUES LESS THAN (2006) );
Setting the default character set and verification rules for a table	ALTER TABLE tbl_name DEFAULT CHARACTER SET = utf8 COLLATE = utf8_general_ci;

Table creation statements cannot contain CHECK or table options.

During data synchronization, operations in Table 4 may cause data inconsistency between HTAP instances and GaussDB(for MySQL) instances. You should avoid these operations.

They do not affect data query and analysis on HTAP instances.

**Table 4** DDL operations that result in data inconsistency
DDL Name	SQL Example
Deleting a primary key	ALTER TABLE tbl_name DROP PRIMARY KEY;
Adding a primary key	ALTER TABLE {db}.t1 ADD PRIMARY KEY (id);
Adding a primary key and deleting a primary key	ALTER TABLE tbl_name DROP PRIMARY KEY, ADD PRIMARY KEY (column);
Setting a primary key to NULL	ALTER TABLE tbl_name MODIFY COLUMN key_column_name data_type;
Changing the type of a primary key	ALTER TABLE tbl_name MODIFY COLUMN key_column_name data_type not null;
Adding a foreign key	ALTER TABLE tbl1 ADD CONSTRAINT fk_name FOREIGN KEY index (col1)REFERENCES tbl2(col2) referential_actions;
Deleting a foreign key	ALTER TABLE tbl DROP FOREIGN KEY fk_name;
Adding a column NOTE: Common columns can be added. If columns contain the following default values, they cannot be added. Double quotation marks (") Functions, character strings, and identifiers that cannot be found in HTAP instances	ALTER TABLE tbl_name ADD COLUMN column_name column_definition c VARCHAR(10) DEFAULT (CONCAT('1', '2'));
Setting the default value of a column NOTE: If columns contain the following default values, you cannot reset default values for the columns. Double quotation marks (") Functions, character strings, and identifiers that cannot be found in HTAP instances	ALTER TABLE tbl_name ALTER COLUMN col SET DEFAULT literal;
Changing NULL in tables to NOT NULL	ALTER TABLE tbl_name MODIFY COLUMN column_name data_type NOT NULL;
Changing the column name and type at the same time	ALTER TABLE t1 CHANGE b b1 VARCHAR(100);
Changing the name of a column	ALTER TABLE t1 RENAME COLUMN a TO b;
Creating a table without a primary key	ALTER TABLE t1 ADD COLUMN (c2 INT GENERATED ALWAYS AS (c1 + 1)STORED);
Adding a STORED derived column	ALTER TABLE {db}.t1 ADD COLUMN (st2 INT GENERATED ALWAYS AS (c2 + 2)STORED), ALGORITHM=COPY;
Adding a VIRTUAL derived column	ALTER TABLE t1 ADD COLUMN (c2 INT GENERATED ALWAYS AS (c1 + 1)VIRTUAL);
Dropping a table partition	ALTER TABLE {db}.tp DROP PARTITION p4;
Discarding a table partition	ALTER TABLE {db}.tp DISCARD PARTITION p2 TABLESPACE;
Importing a table partition	ALTER TABLE {db}.tp IMPORT PARTITION p2 TABLESPACE;
Truncating a table partition	ALTER TABLE {db}.tp TRUNCATE PARTITION p2;
Truncating a partitioned table	TRUNCATE {db}.tp;
Coalescing table partitions	ALTER TABLE {db}.tp_hash COALESCE PARTITION 2;
Reorganizing table partitions	ALTER TABLE {db}.tp REORGANIZE PARTITION p0,p1,p2,p3 INTO ( ... );
Exchanging table partitions	ALTER TABLE {db}.tp EXCHANGE PARTITION p0 WITH TABLE {db}.tp2;
Removing a table partition	ALTER TABLE {db}.tp REMOVE PARTITIONING;
Using a REPLACE clause	CREATE OR REPLACE TABLE;

The names of the databases and tables to be synchronized cannot contain Chinese characters.
To improve performance, you can use the following methods to optimize queries:
- Simplify SQL statements by reducing invalid calculations, deleting unused fields, and avoiding SELECT.
- Instead of querying all columns, delete those that are unnecessary.
Tables to be synchronized use the OLAP engine and primary key model by default.
After connecting to a standard HTAP instance, run the following command to view the databases synchronized from GaussDB(for MySQL) to the standard HTAP instance and the synchronization status.
As this command queries the binlog information in GaussDB(for MySQL), it may consume the hourly query quota. You are advised to run this command at most once a minute.
```
show sync job;
```