Updated on 2024-11-04 GMT+08:00

Monitoring Clusters Using Cloud Eye

This section describes metrics reported by GES to Cloud Eye as well as their namespaces, lists, and dimensions. You can use APIs provided by Cloud Eye to query the metric information generated for GES.

Namespace

SYS.GES

Monitoring Metrics

Table 1 GES metrics

Metric ID

Metric

Description

Value Range

Monitored Object

ges001_vertex_util

Vertex Capacity Usage

Vertex usage in a graph instance. The value is the ratio of used vertices to the total vertices.

Unit: %

0–100

Type: float

GES instance

ges002_edge_util

Edge Capacity Usage

Edge usage of a graph instance. The value is the ratio of the used edges to the total edges.

Unit: %

0–100

Type: float

GES instance

ges003_average_import_rate

Average Import Rate

Average rate of importing vertices or edges to a graph instance

Unit: count/s

0–400000

Type: float

GES instance

ges004_request_count

Request Quantity

Number of requests received by a graph instance

Unit: count

≥ 0

Type: integer

GES instance

ges005_average_response_time

Average Response Time

Average response time of requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges006_min_response_time

Minimum Response Time

Minimum response time of requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges007_max_response_time

Maximum Response Time

Maximum response time of requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges008_read_task_pending_queue_size

Length of the Waiting Queue for Read Tasks

Length of the waiting queue for read requests received by a graph instance. This metric is used to view the number of read requests waiting in the queue.

Unit: count

≥ 0

Type: integer

GES instance

ges009_read_task_pending_max_time

Maximum Waiting Duration of Read Tasks

Maximum waiting duration of read requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges010_pending_max_time_ read_task_type

Type of the Read Task That Waits the Longest

Type of the read request that waits the longest in a graph instance. You can find the corresponding task name in GES documents.

≥ 1

Type: integer

GES instance

ges011_read_task_running_queue_size

Length of the Running Queue for Read Tasks

Length of the running queue for read requests received by a graph instance. This metric is used to view the number of running read requests.

Unit: count

≥ 0

Type: integer

GES instance

ges012_read_task_running_max_time

Maximum Running Duration of Read Tasks

Maximum running duration of read requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges013_running_max_time_ read_task_type

Type of the Read Task That Runs the Longest

Type of the read request that runs the longest in a graph instance. You can find the corresponding task name in GES documentation.

≥ 1

Type: integer

GES instance

ges014_write_task_pending_queue_size

Length of the Waiting Queue for Write Tasks

Length of the waiting queue for write requests received by a graph instance. This metric is used to view the number of write requests waiting in the queue.

Unit: count

≥ 0

Type: integer

GES instance

ges015_write_task_pending_max_time

Maximum Waiting Duration of Write Tasks

Maximum waiting duration of write requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges016_pending_max_time_ write_task_type

Type of the Write Task That Waits the Longest

Type of the write request that waits the longest in a graph instance. You can find the corresponding task name in GES documents.

≥ 1

Type: integer

GES instance

ges017_write_task_running_queue_size

Length of the Running Queue for Write Tasks

Length of the running queue for write requests received by a graph instance. This metric is used to view the number of running write requests.

Unit: count

≥ 0

Type: integer

GES instance

ges018_write_task_running_max_time

Maximum Running Duration of Write Tasks

Maximum running duration of write requests received by a graph instance

Unit: ms

≥ 0

Type: integer

GES instance

ges019 _running_max_time_ write_task_type

Type of the Write Task That Runs the Longest

Type of the write request that runs the longest in a graph instance. You can find the corresponding task name in GES documentation.

≥ 1

Type: integer

GES instance

ges020_computer_resource_usage

Computing Resource Usage

Compute resource usage of each graph instance

Unit: %

0–100

Type: float

GES instance

ges021_memory_usage

Memory Usage

Memory usage of each graph instance

Unit: %

0–100

Type: float

GES instance

ges022_iops

IOPS

Number of I/O requests processed by each graph instance per second

Unit: count/s

≥ 0

Type: integer

GES instance

ges023_bytes_in

Network Input Throughput

Data input to each graph instance per second over the network

Unit: byte/s

≥ 0

Type: float

GES instance

ges024_bytes_out

Network Output Throughput

Data sent to the network per second from each graph instance

Unit: byte/s

≥ 0

Type: float

GES instance

ges025_disk_usage

Disk Usage

Disk usage of each graph instance

Unit: %

0–100

Type: float

GES instance

ges026_disk_total_size

Total Disk Size

Total data disk space of each graph instance

Unit: GB

≥ 0

Type: float

GES instance

ges027_disk_used_size

Disk Space Used

Used data disk space of each graph instance

Unit: GB

≥ 0

Type: float

GES instance

ges028_disk_read_throughput

Disk Read Throughput

Data volume read from the disk in a graph instance per second

Unit: byte/s

≥ 0

Type: float

GES instance

ges029_disk_write_throughput

Disk Write Throughput

Data volume written to the disk in a graph instance per second

Unit: byte/s

≥ 0

Type: float

GES instance

ges030_avg_disk_sec_per_read

Average Time per Disk Read

Average time per disk read for a graph instance

Unit: second

≥ 0

Type: float

GES instance

ges031_avg_disk_sec_per_write

Average Time per Disk Write

Average time per disk write for a graph instance

Unit: second

≥ 0

Type: float

GES instance

ges032_avg_disk_queue_length

Average Disk Queue Length

Average I/O queue length of the disk in a graph instance

Unit: count

≥ 0

Type: integer

GES instance

Dimensions

Key

Value

instance_id

GES instance

Mapping Between Task Types and Names

Table 2 Mapping between task types and names

Type

Name

100

Querying vertices

101

Creating a vertex

102

Deleting a vertex

103

Modifying a vertex property

104

Adding a vertex label

105

Deleting a vertex label

200

Querying edges

201

Creating an edge

202

Deleting an edge

203

Modifying an edge property

300

Querying schema details

301

Adding a label

302

Modifying a label

303

Querying a label

304

Modifying a property

400

Querying graph details

401

Clearing graphs

402

Incrementally importing graph data online

403

Creating a graph

405

Deleting a graph

406

Exporting graphs

407

filtered_khop

408

Querying path details

409

Incrementally importing graph data offline

500

Creating a graph backup

501

Restoring a graph from a backup

601

Creating an index

602

Querying indexes

603

Updating an index

604

Deleting an index

700

Running an algorithm

Viewing Instance Monitoring Information

  1. Log in to the GES management console and choose Graph Management.
  2. In the graph list, locate the row that contains the target graph, choose More, and select View Metric to access the Cloud Eye management console. By default, the graph instance monitoring information is displayed.

    You can select a monitoring metric name and time range to check the performance curve.

Creating an Alarm Rule

By setting alarm rules for GES, you can customize monitoring objects and notification policies to promptly understand the operational status of GES and serve as an early warning.

Alarm rule settings for GES include parameters such as alarm rule name, monitoring object, monitoring metrics, alarm threshold, monitoring cycle, and notification sending.

This part describes how to set an alarm rule for GES.

  1. Log in to the GES management console and choose Graph Management from the navigation pane on the left.
  2. Locate the row containing the target instance, choose More in the Operation column, and select View Metric to access the Cloud Eye management console and check the GES monitoring information.
    Figure 1 Selecting View Metrics

    Ensure that the status of the instance whose monitoring information you want to view is Running. Otherwise, you cannot create an alarm.

  3. In the navigation pane on the left of the Cloud Eye management console, choose Alarm Management > Alarm Rules. On the page displayed, click Create Alarm Rule in the upper right corner or in the middle.

  4. On the Create Alarm Rule page, set parameters as prompted.
    1. Setting alarm parameters
      Figure 2 Setting parameters
      Table 3 Alarm parameters

      Parameter

      Description

      Example Value

      Alarm Type

      Alarm type the alarm rule applies to. The value can be Metric or Event.

      Metric

      Cloud Product

      Name of the cloud service the alarm rule is created for

      Graph Engine Service - Graph Instances

      Resource Level

      This parameter is only available when Alarm Type is set to Metric. The options are Cloud product (recommended) and Specific dimension.

      Take GES as an example. A user purchases a cloud product (GES VMs) and divides the product into multiple sub-dimensions based on metrics, including disks, mount points, and processes.

      Cloud product

      Monitoring Scope

      Resource scope the alarm rule applies to. Select Specified resources and select one or more monitored objects. For GES, select the ID of the cluster instance you have created. Then, set Instance.

      Specific resources

      Method

      There are three options: Associate template, Use existing template, and Configure manually.

      Associate template

      Template

      This parameter is available only when Use template is selected.

      Select the template to be used. If no alarm template is available, click Create Custom Template to create one that meets your requirements.

      -

      Alarm Policy

      This parameter is available only when Configure manually is selected for Method.

      Set the policy that triggers an alarm. For example, trigger an alarm if the CPU usage equals to or is greater than 80% for 3 consecutive periods.

      For details about GES monitoring metrics, see Monitoring Metrics.

      -

      Alarm Severity

      Alarm severity, which can be Critical, Major, Minor, or Informational.

      Major

    2. Configure the alarm notification parameters as prompted.
      Figure 3 Setting alarm notification parameters
      Table 4 Alarm notification parameters

      Parameter

      Description

      Example Value

      Alarm Notification

      Whether to send email, SMS, HTTP, or HTTPS notifications to users when an alarm is triggered

      You can enable (recommended) or disable this function.

      Enable this function

      Notification Recipient

      The options are Notification policies, Notification group, and Topic subscription.

      Topic subscription

      Notification Policy

      If Notification policies is selected for Notification Recipient, you need to select one or more notification policies. You can specify the notification group, template, window, and other parameters in a notification policy. For how to create a notification policy, see Creating, Modifying, or Deleting a Notification Policy.

      -

      Notification Object

      This parameter is mandatory when Notification Recipient is set to Topic subscription.

      Name of the topic the alarm notification is to be sent to. If you have enabled Alarm Notification, select a topic. If no desirable topics are available, create one first, whereupon the SMN service is invoked.

      For details about how to create a topic, see Creating a Topic.

      SMN topic

      Notification Group

      This parameter is mandatory when Notification Recipient is set to Notification group.

      You can select or create a notification group. After creating a notification group, you need to click Add Notification Recipient in the Operation column of the notification group list to add group members and notification methods.

      Notification group name

      Notification Template

      You can select a system template or create a custom notification template.

      System template

      Notification Window

      Notifications are sent only within the notification window specified in the alarm rule.

      For example, if Notification Window is set to 00:00–08:00, Cloud Eye sends notifications only within this period.

      -

      Trigger Condition

      Condition for triggering the alarm notification. You can select Generated alarm (when an alarm is generated), Cleared alarm (when an alarm is cleared), or both.

      -

  5. Click Create. After the alarm rule is created, if the metric data reaches the specified threshold, Cloud Eye will immediately inform you that an exception has occurred.

Transferring Data to OBS

On Cloud Eye, raw metric data is only stored for two days. However, if you subscribe to OBS, you can synchronize the raw data and extend the storage period.