Updated on 2024-10-29 GMT+08:00

Notebook Cache Directory Alarm Reporting

When creating a notebook instance, you can select CPU, GPU, or Ascend resources based on the service data volume. If you select GPU or Ascend resources, ModelArts mounts hard disks to the cache directory. You can use this directory to store temporary files.

Capacity alarms are not generated for the cache directory of the notebook instance by default. Exceeding the capacity limit will restart the notebook instance. After the restart, multiple configurations are reset, discarding your data and losing the environment. This will affect your experience. You are advised to enable the monitoring and alarms for the cache directory usage and report the data to AOM.

Configuration Process

  1. Enter the basic alarm information.
  2. Set an alarm rule.
    1. Configure monitoring metrics.
    2. Set alarm triggering conditions.
  3. Configure alarm notifications.
    1. Create a topic, configure the topic policy, and subscribe to the topic.
    2. Create an alarm action rule.
    3. Select the created action rule.

Configuring Alarm Settings

  1. Log in to the AOM console.
  2. Choose Alarm Center > Alarm Rules and click Create Alarm Rule.
  3. Enter the basic alarm information.

  4. Set an alarm rule.

    Rule Type: Select Threshold alarm.

    Monitored Object: Select Select resource objects. Click Select Resource Object. A new dialog box is displayed.

    • Add By: Select Dimension.
    • Metric Name: Click Custom Metrics and select the cache metrics to be monitored. Example: ma_container_notebook_cache_dir_size_bytes (total size of the cache directory) and ma_container_notebook_cache_dir_util (usage of the cache directory)
    • Dimension: Select a metric dimension, for example, service_id:xxx, and click Confirm.

    After setting the monitored object, set Statistic and Statistical Period.

    Alarm Condition: Set this parameter based on your needs.

    Figure 1 Select Monitored Object
    Figure 2 Configuring statistics method
    Figure 3 Configuring alarm conditions

  5. Configure alarm notifications and click Create Now.

    Alarm Mode: Select Direct Alarm Reporting.

    Action Rule: Enable it and select the created action rule. If the existing alarm action rules cannot meet your requirements, click Create Rule to create an action rule. For details, see Creating an Alarm Action Rule.

    Notification: Enable it.

    Figure 4 Configuring alarm notifications

    Create a topic in SMN to configure alarm notification rules.

    • Creating a Topic
      1. Go to the SMN console. In the navigation pane, choose Topic Management > Topics.
      2. Click Create Topic. Enter a topic name, select an enterprise project, and click OK.
      3. Locate the target topic and choose More > Configure Topic Policy in the Operation column.

        Select APM to allow AOM alarms to trigger SMN.

        Figure 5 Configure Topic Policy
      4. Click Add Subscription in the Operation column of the topic. After the subscription is successful, a notification is received once the alarm conditions are met.

        Select a protocol, such as email or SMS, and enter the endpoints, such as email addresses or mobile numbers. Click OK.

        A record is displayed in the subscription list, but the record is in the Unconfirmed state.

        After receiving the email, confirm the subscription.

        Then, the subscription is in the confirmed state.

    • Creating an Alarm Action Rule

      An action rule specifies how AOM notifies you when an alarm is triggered. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template.

      Enter the action rule name, select the action rule type, select the topic created in the previous step, select a message template, and click Confirm.

      Figure 6 Create Alarm Action Rule

In the Alarm Notification area of the Create Alarm Rule page, set Action Rule to the newly created alarm action rule and click Create Now.

After the configuration is complete, you will receive an email notification once the alarm conditions are met.