Help Center/ DataArts Studio/ Best Practices/ Configuring Alarms for Real-Time Data Migration Jobs
Updated on 2025-11-17 GMT+08:00

Configuring Alarms for Real-Time Data Migration Jobs

Real-time data migration requires timeliness and stability of real-time migration jobs in DataArts Migration. Due to the performance bottleneck of the data source or changes in the service volume, the job latency may increase, or the job may encounter exceptions. Common job exceptions and their causes are as follows:

  • Abnormal job termination: This may be caused by the restart of the data source cluster, dirty data, or other service changes that may cause job failures.
  • Continuous job latency increase: This may be caused by the surge of source data volume, performance bottlenecks of the source or destination, or insufficient resources of the real-time migration job.
  • High job backpressure: This may be caused by poor write performance at the destination or a large amount of data at the source.

To continuously track job status and optimize or recover abnormal jobs in a timely manner, you are advised to configure alarms for monitoring metrics, and job latency/exception alarms so that you have enough time for troubleshooting. The specific measures are as follows:

Prerequisites

Configuring Job Failure Alarms

You can enable the running exception/failure alarm for one or all real-time data migration jobs on the Manage Notification page. This section describes how to configure failure alarms for multiple jobs.

  1. Log in to the DataArts Studio console by following the instructions in Accessing the DataArts Studio Instance Console.
  2. On the DataArts Studio console, locate a workspace and click DataArts Factory.
  3. In the navigation pane on the DataArts Factory page, choose Monitoring > Manage Notification.
  4. On the Manage Notification page, click Configure Notification.
    Figure 1 Configuring a notification
  5. Set parameters.
    Figure 2 Setting notification parameters
    • Notification Scope: Select Multiple jobs.
    • Notification Type: Select Abnormal.
    • Notification Mode: Select By topic.

      You can receive job failure alarms by SMS message, email, or voice notification through SMN topics.

      For details about how to create and subscribe to an SMN topic, see Creating a Topic and Adding a Subscription to a Topic.

    • Set other parameters as required.
  6. Click OK. You can view the created rule on the Manage Notification page.
    Figure 3 Created rule

    When the job status changes from Running to Abnormal or Failed, the alarm information is sent to the communications software associated with the specified topic.

Configuring Alarms for Abnormal Job Monitoring Metrics

Monitoring metrics of real-time data migration jobs can be automatically reported to Cloud Eye. For how to view metric details, see Configuring a Real-Time Migration Job.

You can configure monitoring metric alarms for real-time data migration jobs on the Cloud Eye console by referring to this section. For details about Cloud Eye functions, see Cloud Eye Alarms.

  1. Create a notification object and group in sequence on the Cloud Eye console.

    1. Create a notification object.

      On the Cloud Eye console, choose Alarm Management > Alarm Notifications. On the displayed page, click the Recipients tab and click Create. On the displayed page, enter a recipient name and select a protocol.

      Figure 4 Creating a notification recipient

      The created recipient is displayed on the Recipients page.

    2. Create a notification group and bind it to the notification recipient.

      Click the Notification Groups tab and then Create. On the displayed page, enter a group name and select notification recipients.

      Figure 5 Creating a notification group

      The created notification group is displayed on the Notification Groups page.

  1. Create an alarm rule.

    On the Cloud Eye console, choose Alarm Management > Alarm Rules. In the upper right corner, click Create Alarm Rule and set parameters as required.

    • Name: Set this parameter as required.
    • Alarm Type: Select Metric.
    • Cloud Product: Select DataArts Studio - DataArts Studio Jobs.
    • Resource Level: Select Cloud product.
    • Monitoring Scope: Select the resources to be monitored. Available options include All resources and Specific resources. Select Specific resources.
    • Method: Select Associate template or Configure manually. The following table lists the common alarm rules recommended for real-time data migration jobs.
      Table 1 Alarm triggering rules

      Category

      Name

      Level

      Description

      Abnormal job termination

      Source Database WAL Extract Lag

      Minor

      If the original value increases by 0% for 180 consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every three hours.

      Job memory usage

      Major

      If the original value decreases by 0% for 90 consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every hour.

      Continuous job latency increase

      Source Database WAL Extract Lag

      Critical

      If the original value is greater than or equal to 86,400,000 ms for three consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every one hour.

      High job backpressure

      Job memory usage in five minutes

      Major

      If the average value is greater than 90% for four consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every one hour.

      Job CPU usage in 5 minutes

      Major

      If the average value is greater than 90% for four consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every one hour.

      Job max operator backpressure in 1 hour

      Minor

      If the minimum value is greater than 100% for two consecutive periods, an alarm is triggered.

      Frequency: An alarm is triggered every three hours.

      Unstable network

      NOTE:

      You are advised to configure alarm rules separately for automatic job retries caused by network jitter and data source cluster pressure.

      Job retries

      Major

      If the original value increases by 1% in one period, an alarm is triggered.

      Frequency: The alarm is triggered only once.

      Figure 6 Configuration of common alarm rules
      Figure 7 Configuring an alarm rule for job retries
    • Notified By: Select Notification policies.
    • Set other parameters as required.

      Click Create to create an alarm rule. The created alarm rule is displayed on the Alarm Rules page.

      Figure 8 Created rule

      When the changes of the monitoring metrics meet the alarm policies configured in the alarm rule, the system automatically sends an alarm to the notification group.