Updated on 2024-11-06 GMT+08:00

Playbook Overview

Background

A malware attack is a process of spreading malware (such as viruses, worms, Trojans, and ransomware) to users through emails, remote downloads, and malicious advertisements, and executing malicious programs on target hosts. In this way, the attacker can manipulate remote hosts, hack the network system, steal sensitive information, or carry out other malicious activities. Such attacks pose a serious threat to the security of computer systems, networks, and personal devices, and may cause data leakage, system breakdown, personal privacy leakage, financial loss, and other security risks.

To solve the preceding problems, the required solution should effectively identify malicious programs such as backdoors, Trojans, mining software, worms, and viruses, and detect unknown malicious programs and virus variants on hosts through program feature and behavior detection, AI image fingerprint algorithms, and cloud-based antivirus. It can also detect ransomware embedded in media such as web pages, software, emails, and storage media. It is critical to prevent such attacks and reduce risks.

The following describes how this playbook isolates and kills malware and ransomware.

Response Solutions

This built-in playbook automatically isolates and kills malware detected on servers protected by HSS.

The HSS file isolation and killing playbook has matched the HSS file isolation and killing workflow. When a malware or ransomware alert is generated, the system checks the HSS version used for the attacked asset. If the professional edition or later is used but automatic isolation and killing are not enabled, the isolation and killing conditions are met. After the isolation and killing are manually approved, the alert is handled by this playbook. If the malware is successfully isolated, the alert is closed. If the playbook fails to isolate the malware, a comment is added, indicating that manual actions are required.

Incident Response

  1. Obtain, store, and record evidence.

    1. Based on your cloud environment configuration, you can configure HSS to detect security threats, such as malware and ransomware, through antivirus and HIPS tests.
    2. You can access the ECS using SSH and view the instance status and monitoring information to check whether any exception occurs. You can also check attack information or ransomware indicators you receive through other channels to discover potential threats.
    3. Once an attack is confirmed as an incident, the affected scope, attacked machines, affected services, and data information need to be evaluated.
    4. Use SecMaster to convert alerts to incidents and continue to monitor and record incident details.
    5. In addition, log information can be traced. All related log information can be reviewed through security analysis, and recorded in the incident management module for subsequent operation tracing.

  2. Contain incidents.

    1. Determine the attack type, affected servers, and service processes based on alerts and logs.
    2. Use the HSS file isolation and killing playbook to kill and isolate compromised processes and software. This will reduce the further security risks.
    3. Check the infection scope. If there is an infection risk, check and handle it in a timely manner.
    4. Other playbooks and workflows can also be used for risk control, such as host isolation. Security group access control policies can be used to isolate infected machines and contain risks from further spreading.

  3. Eradicate incidents.

    1. Evaluate whether the affected servers need to be hardened and restored. If the server has been compromised, you need to harden and restore it based on the source tracing result. If attacks are caused by security credential leakage, delete any unauthorized IAM users, roles, and policies, and revoke credentials to improve host security.
    2. Check affected hosts for vulnerabilities, outdated software, and unpatched vulnerabilities. These may cause more hosts to be affected. You can go to the Vulnerabilities page and fix the vulnerabilities for the affected hosts. Check for risky configurations. You can go to the Baseline Inspection and rectify risky configurations in a timely manner.
    3. Evaluate the impact scope. If other hosts have been affected, handle all affected hosts.

  4. Recover from incident.

    1. Determine the restoration points of all restoration operations performed from the backup.
    2. View the backup policy to determine whether all objects and files can be restored, depending on the lifecycle policy applied to the resource.
    3. Use the forensic method to confirm that the data is secure before the restoration, and then restore the data from the backup or restore the data to an earlier snapshot of the ECS instance.
    4. If you have successfully restored data using any open-source decryption tool, delete the data from the instance and perform necessary analysis to confirm that the data is secure. Then, restore the instance, terminate or isolate the instance, create a new instance, and restore the data to the new instance.
    5. If restoring or decrypting data from a backup is not feasible, evaluate the possibility of restarting in a new environment.

  5. Perform post-incident activities.

    1. Analyze alert details in the entire alert handling process, continuously operate and optimize the model, and improve the model alarm accuracy. If an alert is related to a service but there is no risk, the alert can be filtered by a model.
    2. By tracing alarms, you can better understand the entire process of an event, continuously optimize asset protection policies, reduce resource risks, and reduce the attack surface.
    3. Optimize the automatic processing playbook process based on the actual service scenario. For example, you can replace the manual review policy with the automatic processing policy to improve the alarm accuracy after analysis, improving the processing efficiency and quickly handling risks.
    4. Perform risk analysis based on all similar malware and ransomware attack points to control risks before incidents occur.