Updated on 2024-10-29 GMT+08:00

Viewing Notebook Events

Instance statuses and key operations such as creating, starting, and stopping an instance, and changing the instance flavor are recorded in the backend. You can view the events on the notebook instance details page to monitor the instance statuses. You can refresh events on the right of the Event tab. You can also set the interval for automatically refreshing events to 30 seconds, 1 minute, or 5 minutes.

Figure 1 Viewing notebook instance events and configuring automatic refresh
Table 1 Events during instance creation

Event

Description

Severity

Scheduled

The instance has been scheduled.

Warning

PullingImage

The image is being pulled.

Warning

PulledImage

The image has been pulled.

Warning

NotebookHealthy

The instance is running and healthy.

Major

CreateNotebookFailed

Creating an instance failed.

Critical

PullImageFailed

Pulling the image failed.

Critical

FailedCreate

Failed to create notebook container. Please contact SRE to check node {node_name}

Critical

CreateContainerError

Failed to create container. Please contact SRE to check node {node_name}

Critical

FailedAttachVolume

Failed to attach volume. Please contact SRE to check node {node_name}

Major

MountVolumeFailed

Mount volume failed; Check whether the DEW secret is correct if the instance cannot change to running in five minutes

Critical

Mount volume failed; Check if vpc of sfs-turbo is interconnected if the instance cannot change to running in five minutes

Critical

Mount volume failed; Please contact SRE to check node {node_name} if the instance cannot change to running in five minutes

Critical

Table 2 Events during instance startup

Event Name

Description

Severity

EmptyDirExceeded

Usage of empty-dir volume exceeds its limit. A new container will be scheduled and created automatically soon.

Critical

NodeResourcePressure

Insufficient node resources. A new container will be scheduled and created automatically soon.

Critical

EphemeralStorageExceeded

Local ephemeral storage exceeds its limit. A new container will be scheduled and created automatically soon.

Critical

FailedToStartContainer

Failed to start container. Please contact SRE to check node {node_name}

Critical

Scheduled

The instance has been scheduled.

Warning

PullingImage

The image is being pulled.

Warning

PulledImage

The image has been pulled.

Warning

NotebookHealthy

The instance is running and healthy.

Major

RunHookScript

Running a custom script

Warning

StartNotebookFailed

Starting the instance failed.

Critical

PullImageFailed

Pulling the image failed.

Critical

CreateKernelFailed

Creating a Jupyter kernel failed because the conda command is unavailable.

(The conda environments are not being detected and added as Jupyter kernels. Ensure that {conda_env} is available and the command {conda_cmd} env list can be run properly.)

Major

Creating a Jupyter kernel failed due to permission issues.

(Kernels are not showing up in Jupyter Notebook due to permission issues. Ensure that the uid {ma_uid} has write permissions on {conda_path}.)

Major

ConfigurationError

Configuring the ModelArts SDK and CLI paths in the conda environment failed due to unavailable conda command.

(The ModelArts SDK and CLI are unavailable in the conda environments due to conda environment issues. Ensure that {conda_env} is available and the command {conda_cmd} env list can be run properly.)

Major

Configuring the ModelArts SDK and CLI paths in the conda environment failed due to permission issues.

(The ModelArts SDK and CLI are unavailable in the conda environments due to conda environment issues. Ensure that the uid {ma_uid} has write permissions on {conda_path}.)

Major

FailedToPullImageReason

Failed to pull image. Please make sure the image exists in SWR repo, otherwise contact SRE to check node {node_name}

Major

Failed to pull image. Please contact SRE to check node {node_name}

NOTE:

{node_name} indicates the node name. It is a variable and is generally in the format of an IP address, for example, 192.168.1.1.

Table 3 Events during instance stopping

Event

Description

Severity

StopNotebook

The instance has been stopped.

Major

StopNotebookResourceIdle

The notebook instance will automatically stop or has automatically stopped because resources are idle.

Major

Table 4 Events during instance update

Event

Description

Severity

UpdateName

Updating the instance name

Warning

UpdateDescription

Updating the instance description

Warning

UpdateFlavor

Updating the instance flavor

Major

UpdateImage

Updating the instance image

Major

UpdateStorageSize

The instance storage size is being updated.

(User %s is updating storage size from %s GB to %s GB.)

Major

The instance storage size has been updated.

(User %s updated the storage size.)

Major

UpdateKeyPair

Configured the instance key pair.

(User %s updated the instance key pair to {%s}.)

Major

Updating the instance key pair

(User %s updated the instance key pair from %s to %s.)

Major

UpdateWhitelist

Updating the instance access whitelist

Major

UpdateHook

Updating a custom script

Major

UpdateStorageSizeFailed

Updating the storage size failed because the resources are sold out.

(EVS disks are sold out.)

Critical

Updating the storage size failed due to an internal error.

(Updating the EVS disk size failed. The O&M personnel are handling the fault.)

Critical

Table 5 Events during image saving

Event

Description

Severity

SaveImage

The image has been saved.

Major

SavedImageFailed

Saving the image failed due to processes in D status.

(There are processes in 'D' status. Check process status using 'ps -aux' and kill all the processes in 'D' status.)

Critical

Saving the image failed because the image is too large.

(The container size (%dG) is greater than the threshold (%dG).)

Critical

Saving the image failed due to the limit on the number of layers.

(There are too many layers in your image.)

Critical

Saving the image failed due to task timeout.

(The O&M personnel are handling the fault.)

Critical

Saving the image failed due to SWR service issues.

Critical

Table 6 Events during instance running

Event Name

Description

Severity

NotebookUnhealthy

The instance is unhealthy.

Critical

OutOfMemory

The instance is out of memory.

Critical

JupyterProcessKilled

The Jupyter process has been stopped.

Critical

CacheVolumeExceedQuota

The /cache file size has exceeded the upper limit.

Critical

NotebookHealthy

The instance has been restored to the healthy state.

Major

EVSSoldOut

EVS disks are sold out.

Critical

Table 7 Events for dynamic OBS mounting

Event

Description

Severity

DynamicMountStorage

The OBS storage is mounted.

Major

DynamicUnmountStorage

The OBS storage is unmounted.

Major

Table 8 Events triggered on the user side

Event

Description

Severity

RefreshCredentialsFailed

Authentication failed.

Critical