Help Center/ DataArts Studio/ Best Practices/ Configuring Alarms for Real-Time Data Migration Jobs

Updated on 2025-11-17 GMT+08:00

View PDF

Configuring Alarms for Real-Time Data Migration Jobs

Real-time data migration requires timeliness and stability of real-time migration jobs in DataArts Migration. Due to the performance bottleneck of the data source or changes in the service volume, the job latency may increase, or the job may encounter exceptions. Common job exceptions and their causes are as follows:

Abnormal job termination: This may be caused by the restart of the data source cluster, dirty data, or other service changes that may cause job failures.
Continuous job latency increase: This may be caused by the surge of source data volume, performance bottlenecks of the source or destination, or insufficient resources of the real-time migration job.
High job backpressure: This may be caused by poor write performance at the destination or a large amount of data at the source.

To continuously track job status and optimize or recover abnormal jobs in a timely manner, you are advised to configure alarms for monitoring metrics, and job latency/exception alarms so that you have enough time for troubleshooting. The specific measures are as follows:

Configuring job failure alarms: O&M personnel can be notified of job failures in a timely manner, avoiding fault escalation risks.
Configuring alarms for abnormal job monitoring metrics: This helps you detect potential risks for jobs and adjust jobs or services in a timely manner to avoid a long latency.

Prerequisites

You have purchased a migration resource group and run a real-time data migration job. For details, see Configuring a Real-Time Migration Job.
The job is running and metrics are reported properly. For details about how to view job monitoring metrics, see Viewing Monitoring Metrics.

Configuring Job Failure Alarms

You can enable the running exception/failure alarm for one or all real-time data migration jobs on the Manage Notification page. This section describes how to configure failure alarms for multiple jobs.

Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
On the DataArts Studio console, locate a workspace and click DataArts Factory.
In the navigation pane on the DataArts Factory page, choose Monitoring > Manage Notification.
On the Manage Notification page, click Configure Notification.
Figure 1 Configuring a notification
Set parameters.
Figure 2 Setting notification parameters
- Notification Scope: Select Multiple jobs.
- Notification Type: Select Abnormal.
- Notification Mode: Select By topic.
  You can receive job failure alarms by SMS message, email, or voice notification through SMN topics.
  
  For details about how to create and subscribe to an SMN topic, see Creating a Topic and Adding a Subscription to a Topic.
- Set other parameters as required.
Click OK. You can view the created rule on the Manage Notification page.
Figure 3 Created rule

When the job status changes from Running to Abnormal or Failed, the alarm information is sent to the communications software associated with the specified topic.

Configuring Alarms for Abnormal Job Monitoring Metrics

Monitoring metrics of real-time data migration jobs can be automatically reported to Cloud Eye. For how to view metric details, see Configuring a Real-Time Migration Job.

You can configure monitoring metric alarms for real-time data migration jobs on the Cloud Eye console by referring to this section. For details about Cloud Eye functions, see Cloud Eye Alarms.

Create a notification object and group in sequence on the Cloud Eye console.
1. Create a notification object.
  On the Cloud Eye console, choose Alarm Management > Alarm Notifications. On the displayed page, click the Recipients tab and click Create. On the displayed page, enter a recipient name and select a protocol.
  
  Figure 4 Creating a notification recipient
  
  The created recipient is displayed on the Recipients page.
2. Create a notification group and bind it to the notification recipient.
  Click the Notification Groups tab and then Create. On the displayed page, enter a group name and select notification recipients.
  
  Figure 5 Creating a notification group
  
  The created notification group is displayed on the Notification Groups page.

Create an alarm rule.

On the Cloud Eye console, choose Alarm Management > Alarm Rules. In the upper right corner, click Create Alarm Rule and set parameters as required.

Name: Set this parameter as required.
Alarm Type: Select Metric.
Cloud Product: Select DataArts Studio - DataArts Studio Jobs.
Resource Level: Select Cloud product.
Monitoring Scope: Select the resources to be monitored. Available options include All resources and Specific resources. Select Specific resources.

Method: Select Associate template or Configure manually. The following table lists the common alarm rules recommended for real-time data migration jobs.

**Table 1** Alarm triggering rules
Category	Name	Level	Description
Abnormal job termination	Source Database WAL Extract Lag	Minor	If the original value increases by 0% for 180 consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every three hours.
Abnormal job termination	Job memory usage	Major	If the original value decreases by 0% for 90 consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every hour.
Continuous job latency increase	Source Database WAL Extract Lag	Critical	If the original value is greater than or equal to 86,400,000 ms for three consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every one hour.
High job backpressure	Job memory usage in five minutes	Major	If the average value is greater than 90% for four consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every one hour.
	Job CPU usage in 5 minutes	Major	If the average value is greater than 90% for four consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every one hour.
	Job max operator backpressure in 1 hour	Minor	If the minimum value is greater than 100% for two consecutive periods, an alarm is triggered. Frequency: An alarm is triggered every three hours.
Unstable network NOTE: You are advised to configure alarm rules separately for automatic job retries caused by network jitter and data source cluster pressure.	Job retries	Major	If the original value increases by 1% in one period, an alarm is triggered. Frequency: The alarm is triggered only once.

Figure 6 Configuration of common alarm rules

Figure 7 Configuring an alarm rule for job retries

Notified By: Select Notification policies.
Set other parameters as required.
Click Create to create an alarm rule. The created alarm rule is displayed on the Alarm Rules page.

Figure 8 Created rule

When the changes of the monitoring metrics meet the alarm policies configured in the alarm rule, the system automatically sends an alarm to the notification group.