Monitoring DLI Using Cloud Eye
Description
This section describes metrics reported by DLI to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored object and alarms generated for DLI.
Namespace
SYS.DLI
Metric
Metric ID |
Name |
Description |
Value Range |
Unit |
Conversion Rule |
Monitored Object |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|---|---|
queue_cu_num |
Queue CU Usage |
Displays the number of CUs applied by the user queue |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_job_launching_num |
Number of Jobs Being Submitted |
Displays the number of jobs in the Submitting state in the user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_job_running_num |
Number of Running Jobs |
Displays the number of running jobs in the user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_job_succeed_num |
Number of Finished Jobs |
Displays the number of completed jobs in the user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_job_failed_num |
Failed Jobs |
Displays the number of failed jobs in the user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_job_cancelled_num |
Number of Canceled Jobs |
Displays the number of canceled jobs in the user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_alloc_cu_num |
Allocated CUs (queue) |
Displays the CU allocation for user queues. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_min_cu_num |
Minimum CUs for Queue |
Displays the minimum number of CUs for a user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_max_cu_num |
Maximum CUs for Queue |
Displays the maximum number of CUs for a user queue. |
≥ 0 |
Count |
N/A |
Queues |
5 minutes |
queue_priority |
Queue Priority |
Displays the priority of a user queue. |
1–100 |
N/A |
N/A |
Queues |
5 minutes |
queue_cpu_usage |
Queue CPU Usage |
Displays the CPU usage of user queues. |
0–100 |
% |
N/A |
Queues This metric applies only to queues in non-elastic resource pools. |
5 minutes |
queue_disk_usage |
Queue Disk Usage |
Displays the disk usage of user queues. |
0–100 |
% |
N/A |
Queues This metric applies only to queues in non-elastic resource pools. |
5 minutes |
queue_disk_used |
Max Disk Usage |
Displays the maximum disk usage of user queues. |
0–100 |
% |
N/A |
Queues This metric applies only to queues in non-elastic resource pools. |
5 minutes |
queue_mem_usage |
Queue Memory Usage |
Displays the memory usage of user queues. |
0–100 |
% |
N/A |
Queues This metric applies only to queues in non-elastic resource pools. |
5 minutes |
queue_mem_used |
Used Memory |
Displays the memory usage rate of the user queues. |
≥ 0 |
MB |
N/A |
Queues This metric applies only to queues in non-elastic resource pools. |
5 minutes |
queue_job_launching_max_duration |
Longest Job Submission |
The longest submitted job that is still in progress at the sampling time (including SQL, Flink, and Spark jobs). |
≥ 0 |
Seconds |
N/A |
Queues |
5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest submitted jobs that are still in progress at the moment of sampling, specifically those in the Submitting or Starting state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues. |
queue_sql_job_running_max_duration |
Longest SQL Job |
The longest running SQL job that is still in progress at the sampling time. |
≥ 0 |
Seconds |
N/A |
Queues |
5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest running SQL jobs that are still in progress at the moment of sampling, specifically those in the Running state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues. |
queue_spark_job_running_max_duration |
Longest Spark Job |
The longest running Spark job that is still in progress at the sampling time. |
≥ 0 |
Seconds |
N/A |
Queues |
5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest running Spark jobs that are still in progress at the moment of sampling, specifically those in the Running state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues. |
flink_read_records_per_second |
Flink Job Data Read Rate |
Displays the data input rate of a Flink job for monitoring and debugging. |
≥ 0 |
record/s |
N/A |
Flink jobs |
10 seconds |
flink_write_records_per_second |
Flink Job Data Write Rate |
Displays the data output rate of a Flink job for monitoring and debugging. |
≥ 0 |
record/s |
N/A |
Flink jobs |
10 seconds |
flink_read_records_total |
Flink Job Total Data Read |
Displays the total number of data inputs of a Flink job for monitoring and debugging. |
≥ 0 |
record/s |
N/A |
Flink jobs |
10 seconds |
flink_write_records_total |
Flink Job Total Data Write |
Displays the total number of output data records of a Flink job for monitoring and debugging. |
≥ 0 |
record/s |
N/A |
Flink jobs |
10 seconds |
flink_read_bytes_per_second |
Flink Job Byte Read Rate |
Displays the number of input bytes per second of a Flink job. |
≥ 0 |
byte/s |
1024(IEC) |
Flink jobs |
10 seconds |
flink_write_bytes_per_second |
Flink Job Byte Write Rate |
Displays the number of output bytes per second of a Flink job. |
≥ 0 |
byte/s |
1024(IEC) |
Flink jobs |
10 seconds |
flink_read_bytes_total |
Flink Job Total Read Byte |
Displays the total number of input bytes of a Flink job. |
≥ 0 |
byte/s |
1024(IEC) |
Flink jobs |
10 seconds |
flink_write_bytes_total |
Flink Job Total Write Byte |
Displays the total number of output bytes of a Flink job. |
≥ 0 |
byte/s |
1024(IEC) |
Flink jobs |
10 seconds |
flink_cpu_usage |
Flink Job CPU Usage |
Displays the CPU usage of Flink jobs. |
0–100 |
% |
N/A |
Flink jobs |
10 seconds |
flink_mem_usage |
Flink Job Memory Usage |
Displays the memory usage of Flink jobs. |
0–100 |
% |
N/A |
Flink jobs |
10 seconds |
flink_max_op_latency |
Flink Job Max Operator Latency |
Displays the maximum operator delay of a Flink job. The unit is ms. |
≥ 0 |
ms |
N/A |
Flink jobs |
10 seconds |
flink_max_op_backpressure_level |
Flink Job Maximum Operator Backpressure |
Displays the maximum operator backpressure value of a Flink job. A larger value indicates severer backpressure. 0: OK 50: low 100: high |
0–100 |
N/A |
N/A |
Flink jobs |
10 seconds |
elastic_resource_pool_cpu_usage |
CPU Usage of Elastic Resource Pool |
Displays the CPU usage of elastic resource pools. |
0–100 |
% |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_mem_usage |
Memory Usage of Elastic Resource Pool |
Displays the memory usage of elastic resource pools. |
0–100 |
% |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_disk_usage |
Disk Usage of Elastic Resource Pool |
Displays the disk usage of elastic resource pools. |
0–100 |
% |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_disk_max_usage |
Maximum Disk Usage of Elastic Resource Pool |
Displays the maximum disk usage of elastic resource pools. |
0–100 |
% |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_cu_num |
CU Usage of Elastic Resource Pool |
Displays the CU usage of elastic resource pools. |
≥ 0 |
Count |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_alloc_cu_num |
Allocated CUs of Elastic Resource Pool |
Displays the CU allocation of elastic resource pools. |
≥ 0 |
Count |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_min_cu_num |
Minimum CUs of Elastic Resource Pool |
Displays the minimum number of CUs of elastic resource pools. |
≥ 0 |
Count |
N/A |
Elastic resource pools |
5 minutes |
elastic_resource_pool_max_cu_num |
Maximum CUs of Elastic Resource Pool |
Displays the maximum number of CUs of elastic resource pools. |
≥ 0 |
Count |
N/A |
Elastic resource pools |
5 minutes |
Dimension
Key |
Value |
---|---|
queue_id |
Queue |
flink_job_id |
Flink job |
Viewing DLI Monitoring Metrics on Cloud Eye
- Search for Cloud Eye on the management console.
- In the navigation pane on the left of the Cloud Eye console, click Cloud Service Monitoring > Data Lake Insight.
- Select a queue to view its information.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot