Updated on 2024-12-23 GMT+08:00

Overview

What Is Near Data Processing?

Near Data Processing (NDP) is a compute pushdown solution to improve data query efficiency. For data-intensive queries, operations such as column extraction, aggregation calculation, and condition filtering are pushed down to multiple nodes on a distributed storage layer for parallel execution. This reduces query processing pressure on compute nodes, improves parallel processing capabilities, and saves network traffic.

How It Works

GaussDB(for MySQL) uses an architecture with decoupled storage and compute to reduce network traffic. Based on this architecture, NDP is used to accelerate data queries. Without NDP, all raw data needs to be transmitted from storage nodes to compute nodes for query processing. NDP pushed the most I/O-intensive and CPU-intensive query tasks down to storage nodes. Only the required columns and filtered rows or aggregated results are sent back to compute nodes, greatly reducing network traffic. Additionally, parallel processing across storage nodes reduces the CPU usage of compute nodes and improves the query efficiency.

NDP is integrated with parallel query. Pages are prefetched in batches to realize the entire process in parallel. The query execution efficiency is greatly improved.

Figure 1 How NDP works

Scenarios

NDP is suitable for the following scenarios:

  • Projection

    Column pruning: Only the fields required by a query statement are sent to the compute node.

  • Aggregate

    Typical aggregation operations include COUNT, SUM, AVG, MAX, MIN, and GROUP BY. Only the aggregated results (not all tuples) are sent to the query engine. COUNT (*) is the most common.

  • SELECT - WHERE clause for filtering

    Common condition expressions are COMPARE(>=,<=,<,>,==), BETWEEN, IN, AND/OR, and LIKE.

    A filter expression is executed on the storage nodes. Only the rows that meet the conditions are sent to the compute node.

Application Constraints

  1. InnoDB tables.
  2. Tables with rows in the COMPACT or DYNAMIC format.
  3. Primary keys or B-tree indexes. Hash and full-text indexes are not supported.
  4. SELECT statements among the DML statements. INSERT INTO SELECT statements and SELECT statements that will lock rows (such as SELECT FOR SHARE/UPDATE) are not supported.
  5. Expressions with numeric, log, time, or partial string types (CHAR and VARCHAR). The utf8mb4 and utf8 character sets are supported.
  6. Expression predicates with comparison operators (<,>,=,<=,>=,!=), IN, NOT IN, LIKE, NOT LIKE, BETWEEN AND, and AND/OR.

Parameters

Table 1 Parameter description

Parameter

Level

Description

ndp_mode

Global

NOTE:
  • To enable NDP at the global level, contact technical support.
  • NDP is in the test phase. There are 10 test users in total.

Enables or disables NDP.

Value: off or on

Default value: off