Help Center/ Elastic Cloud Server/ Troubleshooting/ Linux ECS Issues/ Why Does My Linux ECS Restart Unexpectedly?
Updated on 2024-08-15 GMT+08:00

Why Does My Linux ECS Restart Unexpectedly?

Symptom

A Linux ECS restarts unexpectedly and the following error is displayed:

Kernel panic - not syncing: NMI: Not continuing

The following information is printed in the kernel log:

[645683.754132] Uhhuh. NMI received for unknown reason 20 on CPU 1.
[645683.754133] Do you have a strange power saving mode enabled?
[645683.754133] Kernel panic - not syncing: NMI: Not continuing

Possible Causes

When the kernel parameter kernel.unknown_nmi_panic of the Linux ECS is set to 1, the ECS panics and will automatically restart if the kernel detects a non-maskable interrupt (NMI).

Generally, kernel.unknown_nmi_panic is set to 1 to tell the kernel to trigger a kernel panic upon receiving an NMI. Certain CPU models may generate an NMI in normal service processes and this may cause the ECS to restart unexpectedly.

Solution

  1. Remotely log in to the ECS.
  2. Run the following command to check the value of the ECS kernel parameter kernel.unknown_nmi_panic:

    sysctl -n kernel.unknown_nmi_panic

    If the value of kernel.unknown_nmi_panic is 1, the abnormal restart is caused by the incorrect setting of this parameter.

    Figure 1 Command output
  1. Run the following command to check the kernel.unknown_nmi_panic settings in the /etc/sysctl.conf file:

    vim /etc/sysctl.conf

    Check whether kernel.unknown_nmi_panic=1 exists.

    • If kernel.unknown_nmi_panic=1 exists, change it to kernel.unknown_nmi_panic=0.
    • If kernel.unknown_nmi_panic=1 does not exist, add kernel.unknown_nmi_panic=0.
    Figure 2 Viewing the /etc/sysctl.conf file
  2. Press Esc, enter :wq, and press Enter to save the settings and exit.
  3. Run the following command to make the configuration take effect:

    sysctl –p

    Figure 3 Making configuration take effect

    The configuration takes effect without the need to restart the ECS.

Verification

  1. Run the following command to check whether the value of panic_on_unrecovered_nmi is 0:

    cat /proc/sys/kernel/panic_on_unrecovered_nmi

    Figure 4 Command output (1)
  2. Run the sysctl -n kernel.unknown_nmi_panic command to check whether the value of kernel.unknown_nmi_panic is 0.
    Figure 5 Command output (2)

    If the results meet the expectation, the modification is successful.