Impact of Partitioned Tables on Import Performance

In GaussDB, compared with a non-partitioned table, a partitioned table has extra overhead of partition routing in a data insertion process.

The data insertion overhead in the partitioned table scenario consists of two parts: (1) Heap-insert: This part addresses the problem how tuples are stored in the corresponding heap table. This operation is common in ordinary tables and partitioned tables. (2) Partition-routing: This part addresses the partition routing problem. That is, tuples need to be inserted into the corresponding partitioned table (partRel).

Therefore, data insertion optimization focuses on the following aspects:

Heap-insert :
- The operator noise floor is optimized.
- Heap data insertion is optimized.
- Index insertion build (with indexes) is optimized.
Partition-routing:
- The logic of the routing search algorithm is optimized.
- The routing noise floor is optimized, including enabling the partRel handle of the partitioned table and adding the logic overhead of function calling.
  
  The performance of partition routing is significantly reflected when a single INSERT statement involving a large amount of data is executed. In the UPDATE scenario, the operation logic is more complex. You need to find the tuple to be updated, perform the DELETE operation to remove the tuple, and then perform the INSERT operation to insert a new tuple. Compared with the scenario where a single INSERT statement is used to insert data, the UPDATE scenario has more steps and more complex processes. Therefore, the partition routing performance in the UPDATE scenario is not as good as that in the scenario where a single INSERT statement is used.

Table 1 shows the routing algorithm logic of different partitioning types.

**Table 1** Routing algorithm logic
Partitioning Type	Routing Algorithm Complexity	Implementation Description
Range partitioning	O(logN)	Implemented based on binary search
Hash partitioning	O(1)	Implemented based on the key-partOid hash table
List partitioning	O(1)	Implemented based on the key-partOid hash table

The core processing logic of partition routing is to calculate the partition to which the imported data tuple belongs based on the partition key of the imported data tuple. This is an extra overhead of a partitioned table compared with a non-partitioned table. The impact of the overhead on the final data import performance depends on the CPU processing capability of the server, table width, and actual disk and memory capacity.

Generally, the following estimation can be roughly performed:

In the x86 server scenario, the import performance of a partitioned table is 10% lower than that of an ordinary table.
In the Arm server scenario, the data import performance decreases by about 20%.

The x86 and Arm servers have different performance in importing partitioned tables because partition routing is a memory computing enhancement scenario. The single-core instruction processing capability of the mainstream x86 CPU is slightly stronger than that of the Arm CPU.

Parent topic: Partitioning Policy

Previous topic: List Partitioning

Next topic: Basic Usage of Partitions