Viewing Notebook Events
Instance statuses and key operations such as creating, starting, and stopping an instance, and changing the instance flavor are recorded in the backend. You can view the events on the notebook instance details page to monitor the instance statuses. You can refresh events on the right of the Event tab. You can also set the interval for automatically refreshing events to 30 seconds, 1 minute, or 5 minutes.
Viewing Events of a Notebook Instance
To view the event details of a notebook, click the notebook name. On the displayed notebook details page, click the Event tab.
Notebook Instance Events
| Event | Description | Severity | Solution |
|---|---|---|---|
| Scheduled | The instance has been scheduled. | Warning | Normal event, no action required. |
| PullingImage | The image is being pulled. | Warning | Normal event, no action required. |
| PulledImage | The image has been pulled. | Warning | Normal event, no action required. |
| NotebookHealthy | The instance is running and healthy. | Major | Normal event, no action required. |
| CreateNotebookFailed | Creating an instance failed. | Critical | Internal service error. Submit a service ticket to contact O&M engineers. |
| PullImageFailed | Pulling the image failed. | Critical | Check whether the image selected during instance creation exists. If the image does not exist, select another image to create an instance. If the image exists, submit a service ticket to contact O&M engineers. |
| FailedCreate | Failed to create notebook container. Please contact SRE to check node {node_name} | Critical | Internal service error. Submit a service ticket to contact O&M engineers. |
| CreateContainerError | Failed to create container. Please contact SRE to check node {node_name} | Critical | Internal service error. Submit a service ticket to contact O&M engineers. |
| FailedAttachVolume | Failed to attach volume. Please contact SRE to check node {node_name} | Major | Internal service error. Submit a service ticket to contact O&M engineers. |
| MountVolumeFailed | Mount volume failed; Check whether the DEW secret is correct if the instance cannot change to running in five minutes | Critical | Wait for 5 to 10 minutes and check whether the instance status changes to Running. If yes, no action is required. If the status is not changed, check whether the authentication information selected when OBS is used is correct. |
| Mount volume failed; Check if vpc of sfs-turbo is interconnected if the instance cannot change to running in five minutes | Critical | Wait for 5 to 10 minutes and check whether the instance status changes to Running. If yes, no action is required. If the status remains unchanged, verify that SFS has been properly connected to the VPC of your dedicated resource pool. For details, see Configuring the Dedicated Resource Pool to Access the Internet. | |
| Mount volume failed; Please contact SRE to check node {node_name} if the instance cannot change to running in five minutes | Critical | Wait for 5 to 10 minutes and check whether the instance status changes to Running. If yes, no action is required. If the status is not changed, submit a service ticket to contact O&M engineers. |
| Event | Description | Severity | Solution |
|---|---|---|---|
| StopNotebook | The instance has been stopped. | Major | Normal event, no action required. |
| StopNotebookResourceIdle | The notebook instance will automatically stop or has automatically stopped because resources are idle. | Major | Normal event, no action required. |
| Event | Description | Severity | Solution |
|---|---|---|---|
| UpdateName | Updating the instance name | Warning | Normal event, no action required. |
| UpdateDescription | Updating the instance description | Warning | Normal event, no action required. |
| UpdateFlavor | Updating the instance flavor | Major | Normal event, no action required. |
| UpdateImage | Updating the instance image | Major | Normal event, no action required. |
| UpdateStorageSize | The instance storage size is being updated. (User %s is updating storage size from %s GB to %s GB.) | Major | Normal event, no action required. |
| The instance storage size has been updated. (User %s updated the storage size.) | Major | Normal event, no action required. | |
| UpdateKeyPair | Configured the instance key pair. (User %s updated the instance key pair to {%s}.) | Major | Normal event, no action required. |
| Updating the instance key pair (User %s updated the instance key pair from %s to %s.) | Major | Normal event, no action required. | |
| UpdateHook | Updating a custom script | Major | Normal event, no action required. |
| UpdateStorageSizeFailed | Updating the storage size failed because the resources are sold out. (EVS disks are sold out.) | Critical | Go to the details page of of the instance to be scaled out, click the Storage tab, and add dynamic storage or expand the storage capacity. |
| Updating the storage size failed due to an internal error. (Updating the EVS disk size failed. The O&M engineers are handling the fault.) | Critical | Internal service error. Submit a service ticket to contact O&M engineers. |
| Event | Description | Severity | Solution |
|---|---|---|---|
| SaveImage | The image has been saved. | Major | Normal event, no action required. |
| SavedImageFailed | Saving the image failed due to processes in D status. (There are processes in 'D' status. Check process status using 'ps -aux' and kill all the processes in 'D' status.) | Critical | Run ps -aux to query all processes in the D state, run kill -9 <PID> to stop all processes in the D state, and save the image again. |
| Saving the image failed because the image is too large. (The container size (%dG) is greater than the threshold (%dG).) | Critical | Delete unnecessary directories and files except those in the /home/ma-user/work/ directory of the instance. Reduce the container image size to the threshold specified in the event description and try again. | |
| Saving the image failed due to the limit on the number of layers. (There are too many layers in your image.) | Critical | The number of image layers used for starting the instance exceeds 125. Create an instance startup image. During the creation, you can reduce the number of image layers by combining commands and building the image by phase. | |
| Saving the image failed due to task timeout. (The O&M engineers are handling the fault.) | Critical | The task timed out due to a network or dependent service exception. Submit a service ticket to contact O&M engineers. | |
| Saving the image failed due to SWR service issues. | Critical | SWR service error. Submit a service ticket to contact O&M engineers. | |
| CheckImageSize | The notebook container image size is {image_size}G. {image_size} indicates the image size, which is a variable. | Warning | Normal event, no action required. |
| CheckImageLayer | The number of original notebook image layers is {layer_number}. {layer_number} indicates the number of image layers, which is a variable. | Warning | Normal event, no action required. |
| ContainerCommitStarted | Start to commit notebook container. | Warning | Normal event, no action required. |
| ContainerCommitSuccess | Notebook container commit successfully. | Warning | Normal event, no action required. |
| ImagePushStarted | Start to push notebook image. | Warning | Normal event, no action required. |
| ImagePushSuccess | Notebook image push successfully. | Warning | Normal event, no action required. |
| ContainerCommitFailed | Failed to commit notebook container. Please contact SRE to check node {node_name}. {node_name} indicates the node name, which is a variable and is generally in the format of an IP address, for example, 192.168.225.161. | Warning | Node error or internal service error. Submit a service ticket to contact O&M engineers. |
| ImagePushFailed | Failed to push Notebook image. Please contact SRE to check node {node_name}. | Warning | Failed to push the image. Try again. If the fault persists, submit a service ticket to contact O&M engineers. |
| Event Name | Description | Severity | Solution |
|---|---|---|---|
| NotebookUnhealthy | The instance is unhealthy. | Critical | This event may be triggered when a debugging task is started in an instance, for example, the task occupies too many CPU, memory, or I/O resources. It can be automatically cleared after the instance load decreases. Wait for a while and refresh the page. If the NotebookHealthy event is added, the instance status is normal and no action is required. If the fault persists for a long time, submit a service ticket to contact O&M engineers for assistance. |
| OutOfMemory | The instance is evicted because the memory usage exceeds the upper limit. | Critical | When an instance process occupies more memory than the applied specifications, this event is triggered by the Kubernetes mechanism and the instance is restarted. After the restart, the instance status changes to Normal. In future use, do not perform tasks with high memory usage. |
| JupyterProcessKilled | The Jupyter process stops abnormally. | Critical | This event may be triggered if the Jupyter process is stopped by mistake or an unknown error occurs in the instance container. The instance will automatically restart. After the restart, the instance status changes to Normal. |
| CacheVolumeExceedQuota | The /cache file size has exceeded the upper limit. | Critical | This event is triggered when the /cache directory file size exceeds the maximum limit allowed by the instance specifications. The instance will automatically restart. After the restart, the instance status changes to Normal. In future use, pay attention to the size of the /cache directory. For details about the mapping between the space allocated to the directory and the instance specifications, see What Are the Sizes of the /cache Directories for Resources with Varying Specifications on ModelArts Notebook Instances? |
| NotebookHealthy | The instance recovers from an abnormal state to a normal state. | Major | Normal event, no action required. |
| EVSSoldOut | EVS disks are sold out. | Critical | This event may be triggered when you create a notebook instance and select EVS as the storage type, but EVS disks are sold out. In this case, use OBS or PFS storage instead. If you want still want to use EVS, submit a service ticket to contact O&M engineers for capacity expansion. |
| Event | Description | Severity | Solution |
|---|---|---|---|
| DynamicMountStorage | The OBS storage is mounted. | Major | Normal event, no action required. |
| DynamicUnmountStorage | The OBS storage is unmounted. | Major | Normal event, no action required. |
| Event | Description | Severity | Solution |
|---|---|---|---|
| RefreshCredentialsFailed | Authentication failed. | Critical | Normal event, no action required. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot