Help Center/ Cloud Search Service/ User Guide/ Elasticsearch/ Enhancing Search Capabilities for Elasticsearch Clusters/ Tracking the Query Resource Consumption of an Elasticsearch Cluster
Updated on 2026-01-09 GMT+08:00

Tracking the Query Resource Consumption of an Elasticsearch Cluster

During routine maintenance, O&M engineers may need to identify and analyze top (resource-consuming) queries that are causing high resource consumption or performance issues in Elasticsearch clusters. Typically, this requires calling Elasticsearch APIs to retrieve a list of ongoing tasks and examining hot threads to determine which queries are causing excessive resource usage, such as high CPU consumption. The process can be complex and time-consuming. To improve O&M efficiency, CSS provides a query resource tracker. With this feature, O&M personnel can call an API to obtain top queries with the highest latency, CPU usage, or memory consumption, filter the queries by time range, and quickly identify problematic queries. This can significantly improve troubleshooting efficiency and accuracy.

How the Feature Works

The query resource tracker helps identify and optimize top resource-consuming queries, improving system performance and resource utilization. It is a useful tool in big data analytics and log processing scenarios.

Figure 1 How the query resource tracker works

During query execution, the system records the resource consumption of each sub-phase (such as Query, Fetch, and Scroll), tracking metrics like CPU time and memory usage. The data is aggregated by a fixed time window (for example, 5 minutes). Queries with the highest resource consumption are recorded in a dedicated top queries index for analysis. A new top queries index is created daily to store these query resource statistics. The naming format is top-queries-xxx (where xxx indicates the date).

The top queries are ranked by the metric you specify, which can be CPU usage, memory usage, or latency. By default, the system ranks queries by latency.

Constraints

  • The query resource tracker adds memory cache fields, which may impact cluster performance.
  • The query resource tracker is enabled by default for Elasticsearch clusters whose cluster version is 7.10.2 and whose image version is 7.10.2_25.9.0_xxx or later.

Modifying Top Queries Monitoring Settings

Run the following command to modify the top queries monitoring settings as needed:

PUT _cluster/settings
{
  "persistent": {
    "search.insights.top_queries.cpu.enabled": true,
    "search.insights.top_queries.cpu.window_size": "10m",
    "search.insights.top_queries.cpu.top_n_size": 20,
    "search.insights.top_queries.exporter.delete_after_days": 8,
    "search.insights.top_queries.group_by": "none"
  }
}
Table 1 Configuration items

Configuration Item

Type

Description

search.insights.top_queries.<metric>.enabled

Boolean

Whether to enable top query monitoring by the specified metric.

Supported metrics: latency, cpu (CPU usage), or memory (memory usage).

  • true (default): Enable top query monitoring.
  • false: Disable top query monitoring.

search.insights.top_queries.<metric>.window_size

String

The size of the observation window. Monitoring data is aggregated and computed by a fixed window (for example, 5 minutes).

Supported metrics: latency, cpu (CPU usage), or memory (memory usage).

Value range: 1m (1 minute), 5m (5 minutes), 10m (10 minutes), 30m (30 minutes), or xh (x hours, where x ranges from 1 to 24)

Default value: 5m

search.insights.top_queries.<metric>.top_n_size

Integer

Number of top N queries monitored in each time window.

For example, if this parameter is set to 20, only the top 20 queries are monitored in each time window.

Supported metrics: latency, cpu (CPU usage), or memory (memory usage).

Value range: 1 to 100

Default value: 10

search.insights.top_queries.exporter.delete_after_days

Integer

Retention period of the top-queries-xxx index.

For example, if this parameter is set to 8, the index is retained for eight days.

Value range: 1 to 180

Default value: 7

Unit: days

search.insights.top_queries.group_by

String

Whether to enable top query grouping.

The value can be:

  • none (default): No grouping.
  • similarity: Group queries by feature similarity. Within each time window, only the first query is displayed for each group.

For more information, see Introduction to Query Grouping.

search.insights.top_queries.grouping.attributes.field_name

Boolean

Whether to use query field names for query grouping.

This item takes effect only when search.insights.top_queries.group_by is set to similarity.

  • true (default): Group queries by field name.
  • false: Ignore query field names.

search.insights.top_queries.grouping.attributes.field_type

Boolean

Whether to use query field types for query grouping.

This item takes effect only when search.insights.top_queries.group_by is set to similarity.

  • true (default): Group queries by field type.
  • false: Ignore query field types.

Obtaining Top Queries

The following is an example of the command that you can run to obtain top queries by a specified metric and time range:

GET _insights/top_queries?type=cpu&from=2025-12-02T00:00:00.000Z&to=2025-12-02T17:00:00.000Z
Table 2 Request parameters

Parameter

Description

type

The metric by which top queries are identified.

Value range: latency, cpu (CPU usage), or memory (memory usage).

Default value: latency

from

Start time of the query time range.

from and to must both be configured.

Default value: null. If unspecified, the top queries from the last two windows are retrieved.

to

End time of the query time range.

from and to must both be configured.

Default value: null. If unspecified, the top queries from the last two windows are retrieved.

Example response:

{
  "top_queries": [
    {
      "timestamp": 1764662136273,                            //Timestamp of the query.
      "date": "2025-12-02 07:55:36Z",                        //Time when the query was executed.
      "id": "4a5b4b1e-b502-4621-a5c1-09b8d9a1b81c",          //Unique ID of the query.
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]", // CPU and memory consumption of the query phase on node FB2ixw4IQCuXzCR83GT5Yg
          "taskId": 1877927,
          "parentTaskId": 111295,
          "nodeId": "FB2ixw4IQCuXzCR83GT5Yg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 3017078,
            "memory_in_bytes": 102360
          }
        },
        {
          "action": "indices:data/read/search[phase/query]", // CPU and memory consumption of the query phase on node FB2ixw4IQCuXzCR83GT5Yg
          "taskId": 1877926,
          "parentTaskId": 111295,
          "nodeId": "FB2ixw4IQCuXzCR83GT5Yg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 5618940,
            "memory_in_bytes": 271680
          }
        },
        {
          "action": "indices:data/read/search[phase/query]", // CPU and memory consumption of the query phase on node 2ICOHICoSS26YeQu5PIrlg
          "taskId": 107710,
          "parentTaskId": 111295,
          "nodeId": "2ICOHICoSS26YeQu5PIrlg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 8703914,
            "memory_in_bytes": 501560
          }
        },
        {
          "action": "indices:data/read/search[phase/fetch/id]", // CPU and memory consumption of the fetch phase on node FB2ixw4IQCuXzCR83GT5Yg
          "taskId": 1877928,
          "parentTaskId": 111295,
          "nodeId": "FB2ixw4IQCuXzCR83GT5Yg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 424055,
            "memory_in_bytes": 59000
          }
        },
        {
          "action": "indices:data/read/search[phase/fetch/id]", // CPU and memory consumption of the fetch phase on node 2ICOHICoSS26YeQu5PIrlg
          "taskId": 107711,
          "parentTaskId": 111295,
          "nodeId": "2ICOHICoSS26YeQu5PIrlg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 1279677,
            "memory_in_bytes": 337504
          }
        },
        {
          "action": "indices:data/read/search",                // CPU and memory consumption on the access node gzsjh_47SjCe6QFs9pKjEg during the query start phase
          "taskId": 111295,
          "parentTaskId": -1,
          "nodeId": "gzsjh_47SjCe6QFs9pKjEg",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 297268,
            "memory_in_bytes": 8632
          }
        }
      ],
      "source": {                                              //Specific query statement
        "query": {
          "match": {
            "message": {
              "query": "http",
              "operator": "OR",
              "prefix_length": 0,
              "max_expansions": 50,
              "fuzzy_transpositions": true,
              "lenient": false,
              "zero_terms_query": "NONE",
              "auto_generate_synonyms_phrase_query": true,
              "boost": 1
            }
          }
        }
      },
      "indices": [                                           //Queried index
        "log1"
      ],
      "total_shards": 3,                                     //Total number of shards queried
      "phase_latency_map": {                                 //Time consumed during each phase
        "expand": 0,                                         //Time consumed in the expand phase
        "query": 16,                                         //Time consumed in the query phase
        "fetch": 3                                          //Time consumed in the fetch phase
      },
      "labels": {},
      "group_by": "NONE",                                    //Query grouping type
      "node_id": "gzsjh_47SjCe6QFs9pKjEg",                   //ID of the node that received the request
      "search_type": "query_then_fetch",                     //Query type
      "measurements": {                                      //Metrics used
        "memory": {                                         //Memory consumption
          "number": 1280736,
          "count": 1,
          "aggregationType": "NONE"
        },
        "latency": {                                          //Latency
          "number": 20,
          "count": 1,
          "aggregationType": "NONE"
        },
        "cpu": {                                              //CPU consumption
          "number": 19340932,
          "count": 1,
          "aggregationType": "NONE"
        }
      }
    }
  ]
}

Obtaining the Resource Consumption of the Query Resource Tracker

Run the following command to obtain the resource consumption of the query resource tracker:

GET _insights/health_stats

Example response:

{
  "QDyhJ8Q6Td2acc3KGQ43bQ" : {
    "ThreadPoolInfo" : {
      "query_insights_executor" : {  //Dedicated queue for the plugin, used to execute top query analysis tasks
        "type" : "scaling",
        "core" : 1,
        "max" : 1,
        "keep_alive" : "5m",
        "queue_size" : -1
      }
    },
    "QueryRecordsQueueSize" : 0,  //Number of unprocessed tasks in the query_insights_executor queue
    "TopQueriesHealthStats" : {
      "latency" : {
        "TopQueriesHeapSize" : 0,  //Memory occupied by top queries statistics
        "QueryGroupCount_Total" : 0,  //Number of groups kept in the memory when grouping is enabled
        "QueryGroupCount_MaxHeap" : 0  //Memory used to store groups when grouping is enabled
      },
      "cpu" : {
        "TopQueriesHeapSize" : 0,
        "QueryGroupCount_Total" : 0,
        "QueryGroupCount_MaxHeap" : 0
      },
      "memory" : {
        "TopQueriesHeapSize" : 0,
        "QueryGroupCount_Total" : 0,
        "QueryGroupCount_MaxHeap" : 0
      }
    },
    "FieldTypeCacheStats" : { //Cache statistics. When query grouping is enabled, the field mapping is cached to avoid repeated mapping lookups.
      "size_in_bytes" : 0,
      "entry_count" : 0,
      "evictions" : 0,
      "hit_count" : 0,
      "miss_count" : 0
    }
  }
}

Introduction to Query Grouping

When a single query continuously consumes excessive resources, it can monopolize the TopN statistics, obscuring other resource-intensive queries. Query grouping addresses this by aggregating similar queries through pattern matching, ensuring that only the first query from each group appears in the TopN results.

Table 3 Grouping modes

Grouping Mode

Description

Complete query structure (structure + field + type)

Precisely matches the field type.

Query structure only

Considers the query structure only, while ignoring field names and types.

Query structure + field only

Considers the query structure and field names only, while ignoring field types.

Query structure + type only

Considers the query structure and field types only, while ignoring field names.

Choose field type matching if the field types in your database remain relatively constant. The system will cache field types to improve grouping efficiency.

Example

The mapping of an index is as follows:

"mappings": {
  "properties": {
    "field1": {
      "type": "keyword"
    },
    "field2": {
      "type": "text"
    },
    "field3": {
      "type": "text"
    },
    "field4": {
      "type": "long"
    }
  }
}

Perform the following query on the index:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "field1": "example_value"
          }
        }
      ],
      "filter": [
        {
          "match": {
            "field2": "search_text"
          }
        },
        {
          "range": {
            "field4": {
              "gte": 1,
              "lte": 100
            }
          }
        }
      ],
      "should": [
        {
          "regexp": {
            "field3": ".*"
          }
        }
      ]
    }
  }
}
Table 4 provides examples that help you understand how each grouping mode behaves.
Table 4 Behavior of different grouping modes

Grouping Mode

Pattern

Complete query structure (structure + field + type)

bool []
  must:
    term [field1, keyword]
  filter:
    match [field2, text]
    range [field4, long]
  should:
    regexp [field3, text]

Query structure only

bool
  must:
    term
  filter:
    match
    range
  should:
    regexp

Query structure + field only

bool []
  must:
    term [field1]
  filter:
    match [field2]
    range [field4]
  should:
    regexp [field3]

Query structure + type only

bool []
  must:
    term [keyword]
  filter:
    match [text]
    range [long]
  should:
    regexp [text]