Help Center/ Elastic Cloud Server/ Troubleshooting/ Linux ECS Issues/ What Can I Do If the ECS Startup or Remote Login Fails Due to Incorrect System Configurations?
Updated on 2024-08-15 GMT+08:00

What Can I Do If the ECS Startup or Remote Login Fails Due to Incorrect System Configurations?

Symptom

Linux ECSs cannot be started or logged in to due to incorrect configurations. When you cannot log in to a Linux ECS using SSH or VNC, you can use a common method to modify the incorrect configurations of the system disk to restore the ECS or system. This section describes how to detach the system disk from a faulty Linux ECS and attach this disk to another Linux ECS as a data disk to rectify the fault.

Possible Cause

Table 1 lists the incorrect system configurations that may cause startup or access failures.

Table 1 Common incorrect system configurations

Type

Typical Problem

Incorrect configuration

The /etc/fstab file is lost or incorrectly configured.

The SELinux is incorrectly configured.

The /etc/security/limits.conf is incorrectly configured.

The configuration format of /etc/passwd is incorrect.

The configuration format of /etc/shadow is incorrect.

The configuration format of /etc/ssh/sshd_config is incorrect.

Unavailable file or directory

The /etc/ssh directory is deleted by mistake.

The /etc/security directory is deleted by mistake.

The /etc/passwd file is deleted by mistake.

The /etc/shadow file is deleted by mistake.

The /etc/ssh/sshd_config file is deleted by mistake.

Incorrect file permission

The permission for SSH private keys are excessive.

The permission for SSH public keys are excessive.

Incorrect kernel parameter configuration

The value of vm.nr_hugepages is set too large.

Procedure

  1. Prepare a normal ECS.

    Prepare an ECS that can be accessed. Its OS must be the same as that of the faulty ECS.

  2. Create a snapshot.

    Before performing operations on the system disk of a faulty ECS, you are advised to create a snapshot for the system disk of this ECS to prevent data loss. For details, see Creating an EVS Snapshot.

    1. Log in to the management console.
    2. Click and choose Compute > Elastic Cloud Server.
    3. On the ECS list page, locate a faulty ECS and click its name.
    4. On the displayed page, click the Disks tab.
    5. On the Disks tab, click Create Snapshot.
      Figure 1 Create Snapshot
    6. In the left navigation pane of the ECS console, choose Elastic Volume Service > Snapshots to check the snapshot status. If the Status is Available, the snapshot is created.
      Figure 2 Checking the snapshot status

  3. Detach the system disk of a faulty ECS.

    After the snapshot is created, stop the faulty ECS and detach the system disk from this ECS. For details, see Detaching a System Disk.

  4. Attach the system disk of the faulty ECS as a data disk of a normal ECS.

    The procedure is as follows:

    1. Log in to the management console.
    2. Click and choose Compute > Elastic Cloud Server.
    3. On the ECS list page, locate a normal ECS and click its name.
    4. On the displayed page, click the Disks tab.
    5. On the Disks tab, click Attach Disk.
      Figure 3 Attach Disk
    6. In the displayed dialog box, select the system disk that has been detached from the faulty ECS in step 3, set Disk Attribute to Data disk, and click OK.
      Figure 4 Attaching a data disk
    7. Wait for a while. In the left navigation pane of the ECS console, choose Elastic Volume Service > Disks to check the disk status. When the disk status changes to In-use, the disk is attached to a normal ECS.
    8. After that, log in to the ECS using SSH or VNC and run the fdisk -l command. The command output shows that a new disk device /dev/vdb and new partition /dev/vdb1 are added to this ECS.
      Figure 5 Checking a new disk device
    9. Run the following command to attach the system disk of a faulty ECS to a normal ECS as a data disk:

      mount <Data disk partition> <Mount point>

      • Data disk partition: partition queried in step 4.h, for example, /dev/vdb1
      • Mount point: local directory of a normal ECS, for example, /mnt

      For example, if the data disk partition is /dev/vdb1 and the mount point is /mnt, run the following command:

      mount /dev/vdb1 /mnt

  5. Restore a data disk.

    You can restore the data disk of a normal ECS in the following four scenarios:

    Scenario 1: incorrect file configuration

    For example, to modify the incorrect configuration of the /etc/fstab file of a faulty ECS, perform the following steps:

    1. Run the following command to modify the /mnt/etc/fstab file on the data disk of a normal ECS:

      vim /mnt/etc/fstab

    2. After that, press Esc to exit insert mode and enter :wq to save the modification and exit.

      This example describes how to restore the /etc/fstab file of a faulty ECS. To modify the configuration of other files, change the directory of the corresponding file.

      For example, the /etc/ssh/sshd_config file corresponds to the /mnt/etc/ssh/sshd_config directory on the data disk.

    Scenario 2: unavailable files or directories

    For example, to restore the lost /etc/security directory of a faulty ECS, perform the following steps:

    Run the following command to restore the lost /mnt/etc/security directory on the data disk of a normal ECS:

    cp -rfa /etc/security /mnt/etc/

    In this example, the lost /etc/security directory of a faulty ECS is restored. For other unavailable files or directories, change the directory of the corresponding file.

    For example, if the /etc/ssh/sshd_config file is lost, run the cp -rfa /etc/ssh/sshd_config /mnt/etc/ssh command to restore it.

    Scenario 3: incorrect file permission

    For example, to modify the incorrect permission of the /etc/ssh/ssh_host_ecdsa_key file of a faulty ECS, perform the following steps:

    Run the following command to modify the permission of the /mnt/etc/ssh/ssh_host_ecdsa_key file on the data disk of a normal ECS:

    chmod 600 /etc/ssh/ssh_host_ecdsa_key

    In this example, the permission of the /etc/ssh/ssh_host_ecdsa_key file of a faulty ECS is modified. To modify the permission of other files, change the directory of the corresponding file.

    For example, the /etc/ssh/ssh_host_ed25519_key file corresponds to the /mnt/etc/ssh/ssh_host_ed25519_key directory on the data disk.

    Scenario 4: kernel parameter optimization

    For example, to configure the vm.nr_hugepages parameter, perform the following steps:

    1. Run the following command to optimize the kernel parameters in the /mnt/etc/sysctl.conf file on the data disk of a normal ECS:

      vim /mnt/etc/sysctl.conf

    2. Press /, enter nr_hugepages, and press Enter to locate the configuration item.
    3. Press i to edit the configuration item and set it to a proper value. If the configuration item is not found, add the following configuration item:
      vm.nr_hugepages=4

      The value must be vm.nr_hugepages * hugepagesize < memory_total. The specific value should be calculated as needed.

    4. Press Esc to exit insert mode and enter :wq to save the modification and exit.
    5. Run the following command for the configuration to take effect:

      sysctl -p

  6. Restore the system disk of a faulty ECS.

    After the preceding steps are performed, the configuration on the system disk of the faulty ECS has been modified. You can detach the data disk (system disk of the faulty ECS) from a normal ECS and attach it back to the faulty ECS. The procedure is as follows:

    1. Detach the data disk (system disk of the faulty ECS) from a normal ECS. For details, see Detaching a Data Disk.

    2. Attach the system disk back to the faulty ECS.