Overview

Updated on 2024-12-23 GMT+08:00

View PDF

What Is Near Data Processing?

Near Data Processing (NDP) is a compute pushdown solution to improve data query efficiency. For data-intensive queries, operations such as column extraction, aggregation calculation, and condition filtering are pushed down to multiple nodes on a distributed storage layer for parallel execution. This reduces query processing pressure on compute nodes, improves parallel processing capabilities, and saves network traffic.

How It Works

GaussDB(for MySQL) uses an architecture with decoupled storage and compute to reduce network traffic. Based on this architecture, NDP is used to accelerate data queries. Without NDP, all raw data needs to be transmitted from storage nodes to compute nodes for query processing. NDP pushed the most I/O-intensive and CPU-intensive query tasks down to storage nodes. Only the required columns and filtered rows or aggregated results are sent back to compute nodes, greatly reducing network traffic. Additionally, parallel processing across storage nodes reduces the CPU usage of compute nodes and improves the query efficiency.

NDP is integrated with parallel query. Pages are prefetched in batches to realize the entire process in parallel. The query execution efficiency is greatly improved.

Figure 1 How NDP works

Scenarios

NDP is suitable for the following scenarios:

Projection
Column pruning: Only the fields required by a query statement are sent to the compute node.
Aggregate
Typical aggregation operations include COUNT, SUM, AVG, MAX, MIN, and GROUP BY. Only the aggregated results (not all tuples) are sent to the query engine. COUNT (*) is the most common.
SELECT - WHERE clause for filtering
Common condition expressions are COMPARE(>=,<=,<,>,==), BETWEEN, IN, AND/OR, and LIKE.

A filter expression is executed on the storage nodes. Only the rows that meet the conditions are sent to the compute node.

Application Constraints

InnoDB tables.
Tables with rows in the COMPACT or DYNAMIC format.
Primary keys or B-tree indexes. Hash and full-text indexes are not supported.
SELECT statements among the DML statements. INSERT INTO SELECT statements and SELECT statements that will lock rows (such as SELECT FOR SHARE/UPDATE) are not supported.
Expressions with numeric, log, time, or partial string types (CHAR and VARCHAR). The utf8mb4 and utf8 character sets are supported.
Expression predicates with comparison operators (<,>,=,<=,>=,!=), IN, NOT IN, LIKE, NOT LIKE, BETWEEN AND, and AND/OR.

Parameters

**Table 1** Parameter description
Parameter	Level	Description
ndp_mode	Global NOTE: To enable NDP at the global level, contact technical support. NDP is in the test phase. There are 10 test users in total.	Enables or disables NDP. Value: off or on Default value: off

Parent topic: Near Data Processing

Previous topic: Near Data Processing

Next topic: DDL Optimization

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot