Monitoring DLI Using Cloud Eye

Description

This section describes metrics reported by DLI to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored object and alarms generated for DLI.

Namespace

SYS.DLI

Metric

**Table 1** DLI metrics
Metric ID	Name	Description	Value Range	Unit	Conversion Rule	Monitored Object	Monitoring Period (Raw Data)
queue_cu_num	Queue CU Usage	Displays the number of CUs applied by the user queue	≥ 0	Count	N/A	Queues	5 minutes
queue_job_launching_num	Number of Jobs Being Submitted	Displays the number of jobs in the Submitting state in the user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_job_running_num	Number of Running Jobs	Displays the number of running jobs in the user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_job_succeed_num	Number of Finished Jobs	Displays the number of completed jobs in the user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_job_failed_num	Failed Jobs	Displays the number of failed jobs in the user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_job_cancelled_num	Number of Canceled Jobs	Displays the number of canceled jobs in the user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_alloc_cu_num	Allocated CUs (queue)	Displays the CU allocation for user queues.	≥ 0	Count	N/A	Queues	5 minutes
queue_min_cu_num	Minimum CUs for Queue	Displays the minimum number of CUs for a user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_max_cu_num	Maximum CUs for Queue	Displays the maximum number of CUs for a user queue.	≥ 0	Count	N/A	Queues	5 minutes
queue_priority	Queue Priority	Displays the priority of a user queue.	1–100	N/A	N/A	Queues	5 minutes
queue_cpu_usage	Queue CPU Usage	Displays the CPU usage of user queues.	0–100	%	N/A	Queues This metric applies only to queues in non-elastic resource pools.	5 minutes
queue_disk_usage	Queue Disk Usage	Displays the disk usage of user queues.	0–100	%	N/A	Queues This metric applies only to queues in non-elastic resource pools.	5 minutes
queue_disk_used	Max Disk Usage	Displays the maximum disk usage of user queues.	0–100	%	N/A	Queues This metric applies only to queues in non-elastic resource pools.	5 minutes
queue_mem_usage	Queue Memory Usage	Displays the memory usage of user queues.	0–100	%	N/A	Queues This metric applies only to queues in non-elastic resource pools.	5 minutes
queue_mem_used	Used Memory	Displays the memory usage rate of the user queues.	≥ 0	MB	N/A	Queues This metric applies only to queues in non-elastic resource pools.	5 minutes
queue_job_launching_max_duration	Longest Job Submission	The longest submitted job that is still in progress at the sampling time (including SQL, Flink, and Spark jobs).	≥ 0	Seconds	N/A	Queues	5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest submitted jobs that are still in progress at the moment of sampling, specifically those in the Submitting or Starting state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues.
queue_sql_job_running_max_duration	Longest SQL Job	The longest running SQL job that is still in progress at the sampling time.	≥ 0	Seconds	N/A	Queues	5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest running SQL jobs that are still in progress at the moment of sampling, specifically those in the Running state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues.
queue_spark_job_running_max_duration	Longest Spark Job	The longest running Spark job that is still in progress at the sampling time.	≥ 0	Seconds	N/A	Queues	5 minutes This metric is an instantaneous sampling metric (non-continuous sampling), used to record the longest running Spark jobs that are still in progress at the moment of sampling, specifically those in the Running state. It does not serve as a statistical metric for all jobs. Data statistics for historical jobs or completed jobs are not included. It is only applicable for monitoring the status of queues.
flink_read_records_per_second	Flink Job Data Read Rate	Displays the data input rate of a Flink job for monitoring and debugging.	≥ 0	record/s	N/A	Flink jobs	10 seconds
flink_write_records_per_second	Flink Job Data Write Rate	Displays the data output rate of a Flink job for monitoring and debugging.	≥ 0	record/s	N/A	Flink jobs	10 seconds
flink_read_records_total	Flink Job Total Data Read	Displays the total number of data inputs of a Flink job for monitoring and debugging.	≥ 0	record/s	N/A	Flink jobs	10 seconds
flink_write_records_total	Flink Job Total Data Write	Displays the total number of output data records of a Flink job for monitoring and debugging.	≥ 0	record/s	N/A	Flink jobs	10 seconds
flink_read_bytes_per_second	Flink Job Byte Read Rate	Displays the number of input bytes per second of a Flink job.	≥ 0	byte/s	1024(IEC)	Flink jobs	10 seconds
flink_write_bytes_per_second	Flink Job Byte Write Rate	Displays the number of output bytes per second of a Flink job.	≥ 0	byte/s	1024(IEC)	Flink jobs	10 seconds
flink_read_bytes_total	Flink Job Total Read Byte	Displays the total number of input bytes of a Flink job.	≥ 0	byte/s	1024(IEC)	Flink jobs	10 seconds
flink_write_bytes_total	Flink Job Total Write Byte	Displays the total number of output bytes of a Flink job.	≥ 0	byte/s	1024(IEC)	Flink jobs	10 seconds
flink_cpu_usage	Flink Job CPU Usage	Displays the CPU usage of Flink jobs.	0–100	%	N/A	Flink jobs	10 seconds
flink_mem_usage	Flink Job Memory Usage	Displays the memory usage of Flink jobs.	0–100	%	N/A	Flink jobs	10 seconds
flink_max_op_latency	Flink Job Max Operator Latency	Displays the maximum operator delay of a Flink job. The unit is ms.	≥ 0	ms	N/A	Flink jobs	10 seconds
flink_max_op_backpressure_level	Flink Job Maximum Operator Backpressure	Displays the maximum operator backpressure value of a Flink job. A larger value indicates severer backpressure. 0: OK 50: low 100: high	0–100	N/A	N/A	Flink jobs	10 seconds
elastic_resource_pool_cpu_usage	CPU Usage of Elastic Resource Pool	Displays the CPU usage of elastic resource pools.	0–100	%	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_mem_usage	Memory Usage of Elastic Resource Pool	Displays the memory usage of elastic resource pools.	0–100	%	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_disk_usage	Disk Usage of Elastic Resource Pool	Displays the disk usage of elastic resource pools.	0–100	%	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_disk_max_usage	Maximum Disk Usage of Elastic Resource Pool	Displays the maximum disk usage of elastic resource pools.	0–100	%	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_cu_num	CU Usage of Elastic Resource Pool	Displays the CU usage of elastic resource pools.	≥ 0	Count	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_alloc_cu_num	Allocated CUs of Elastic Resource Pool	Displays the CU allocation of elastic resource pools.	≥ 0	Count	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_min_cu_num	Minimum CUs of Elastic Resource Pool	Displays the minimum number of CUs of elastic resource pools.	≥ 0	Count	N/A	Elastic resource pools	5 minutes
elastic_resource_pool_max_cu_num	Maximum CUs of Elastic Resource Pool	Displays the maximum number of CUs of elastic resource pools.	≥ 0	Count	N/A	Elastic resource pools	5 minutes

Dimension

**Table 2** Dimension
Key	Value
queue_id	Queue
flink_job_id	Flink job

Viewing DLI Monitoring Metrics on Cloud Eye

Search for Cloud Eye on the management console.
In the navigation pane on the left of the Cloud Eye console, click Cloud Service Monitoring > Data Lake Insight.
Select a queue to view its information.

Previous topic: Submitting a Spark Jar Job Using Livy

Next topic: Using CTS to Audit DLI