Help Center/ SecMaster/ User Guide/ Playbook Overview/ Ransomware Incident Response Solution
Updated on 2024-12-28 GMT+08:00

Ransomware Incident Response Solution

Incident Type: Ransomware Attacks

Ransomware is a special type of malware designed to deny a user or organization access to files on their computers. So ransomware attacks are classified as denial-of-access (DoS) attacks. Ransomware uses technical means to restrict victims from accessing their own systems or data in the systems, such as documents, emails, databases, and source code. To remove the restrictions, victims have to pay ransom to attackers.

Once a ransomware attack succeeded, it is hard to take measures to interrupt the attack or mitigate the damage. So, it is important to take preventive measures to reduce such attacks.

This document describes a series of measures and strategies designed to effectively control and respond to ransomware attacks.

Incident Response Solution: Ransomware Host Isolation Playbook

The Ransomware host isolation playbook preconfigured in SecMaster automatically isolates compromised hosts. When a ransomware alert is generated, the playbook adds the affected hosts to a security group and blocks all inbound and outbound traffic to isolate the hosts. This move is crucial to ransomware protection.

Figure 1 Host isolation - Malware

Incident Response Process

  1. Obtain, store, and record evidence.

    Based on your cloud environment configurations, you can identify potential ransomware incidents from many sources, including:
    1. An IT employee reported that an ECS could not be accessed over SSH or other similar methods.

      New ECSs can be created and no alerts are reported by Cloud Eye, but the ECS is inaccessible.

    2. A service ticket has been triggered by abnormal metrics or logs of an ECS instance and generated in your service ticket system.
    3. A network fault error was reported on the ECS console or by Cloud Eye for an ECS.
    4. Attackers received ransomware requests through other communication channels (such as emails).
    5. Cloud security services or other security tools detected that an ECS instance was attacked.
    6. Your cloud or other third-party monitoring systems generated alerts or reported abnormal metrics.
    7. If an event is identified as a security incident, it is critical to assess its impact scope, including the number of affected resources and the sensitivity of the data involved.
    8. Check whether there are any known events that may cause service interruptions or affect instance metrics. For example, the number of network metrics in Cloud Eye increases due to ongoing events.
    9. Use Cloud Eye or other application performance monitoring tools to compare the recorded performance baseline metrics of the application with the current abnormal metrics to check if there are any abnormal behavior.
    10. Identify the classifications of data stored in ECS instances, OBS buckets, or other storage media.
    11. Ensure that a service ticket has been created for an incident. If no tickets are generated automatically, manually create one.
    12. If there is a service ticket already, locate the alert or metric associated with the issue.

      Locate the reason for automatically generated service tickets or record the alerts or notifications for manually generated service tickets. Locate the events or other reasons for service interruptions. If no events or reasons can be located, record the actual attack medium.

    13. Check logs to identify when the ransomware attack occurred.

      You can use Cloud Trace Service (CTS) to collect, store, and query operation records of all cloud resources.

    14. Identify and record the impact on and experience of end users.

      If users are affected, record the details about how the attack occurred step by step in the service ticket to help with the attack medium identification and mitigation measure preparation.

    15. Identify the roles involved in the incident based on the incident response plan of your organization. Notify relevant roles, including legal personnel, technical teams, and developers, and ensure that they are added to the service ticket and war rooms for continuous responses.
    16. Ensure that the legal counsel of your organization is aware of and involved in the internal response to and external communication about the incident, and add colleagues responsible for public or external communication to the service ticket so that they can fulfill their communication responsibilities in a timely manner. If local or federal regulations require the reporting of such incidents, notify the authorities concerned and seek guidance from their legal counsel or law enforcement on the collection of evidence and the preservation of the chain of custody. Reporting such incidents to open databases, government agencies, or non-governmental organizations may also help to advance the response to such incidents, even if it is not required by laws or regulations.

  2. Contain incidents.

    Early detection of abnormal user behavior or network activities is the key to reducing the impact of ransomware incidents. You can take the following actions to contain an incident. You can follow the procedure below and work with the legal and compliance team of your organization to take any necessary response measures and continue the incident response process.

    1. Determine the type of ransomware involved in the attack. Common ransomware types are as follows:
      • Encryption ransomware: This type of ransomware encrypts files and objects for ransom.
      • Lock-in ransomware: This type of ransomware locks the access to a specific device.
      • Other types: Emerging types or types that are not recorded yet.
    2. For workloads affected by attacks, you can modify security groups, OBS bucket policies, or related identity and access management policies to isolate networks or Internet connections, minimizing the possibility of attack spreading, or minimizing the chances of attackers accessing these resources.

      Note that sometimes modifying a security group may not achieve the expected effect due to connection tracking.

    3. Evaluate whether the ECS instance needs to be restored. If the instance belongs to an auto scaling group, remove the instance from the group. If the event is related to a vulnerability in OSs, update the OSs and ensure that the vulnerability has been fixed.
    4. Check operation logs on CTS to see if there are unauthorized operations, such as creating unauthorized IAM users, policies, roles, or temporary credentials. If any, delete the unauthorized IAM users, roles, and policies, and revoke all temporary credentials.
    5. If the attack medium is related to unpatched software, OS updates, or expired anti-malware or antivirus tools, ensure that all ECS instances are updated to the latest OSs and all software packages and patches are up to date, and the virus feature codes and definition files on all ECSs are the latest. You can perform the following operations: For a variable architecture, patch it immediately. For an immutable architecture, deploy it again.
    6. According to the update in 5, delete all remaining resources that are identified as at risk of infection. These resources may possibly have accessed the same media through which the ransomware was downloaded, by email, visiting infected websites, or by other means. For resources managed through auto scaling, focus on identifying attack media and take measures to prevent other resources from being infected through the same media.

  3. Eradicate incidents.

    1. Assess whether the impact of an incident is limited to a specific part of the environment. If the ransomware data can be restored from a backup or snapshot, restore the data from the backup or snapshot.

      Investigating incidents in an isolated environment for root cause analysis is recommended as it is helpful in implementing controls to prevent similar incidents in the future.

    2. Use the latest antivirus or anti-malware software to eliminate ransomware.

      Exercise caution when performing this operation because it may alert attackers. It is recommended that you view locked or encrypted objects in an isolated forensic environment, for example, removing network access permissions from infected ECS instances.

    3. Delete all malware identified during forensic analysis and identify intrusion metrics.
    4. If a ransomware virus has been identified, check whether there are available third-party decryption tools or other online resources that may help decrypt data.

  4. Recover from incidents.

    1. Determine the restoration points of all restoration operations performed from the backup.
    2. Check the backup policy to see whether all objects and files can be restored. This depends on the lifecycle policy applied to the resources.
    3. Use the forensic method to confirm that the data is secure before the restoration, and then restore the data from the backup or restore the data to an earlier snapshot of the ECS instance.
    4. If you have successfully restored data using any open-source decryption tool, delete the data from the instance and perform necessary analysis to confirm that the data is secure. Then, restore the instance. Alternatively, you can terminate or isolate the instance, create an instance, and restore the data to the new instance.
    5. If neither restoring data from backups nor decrypting data is feasible, evaluate the possibility of restarting in a new environment.

  5. Perform post-incident activities.

    1. Document and apply lessons learned from simulations and real incidents to subsequent processes and procedures. This will help better understand how the incident occurred in the system configuration and processes (e.g., where weaknesses exist, where automation may fail, and where there is a lack of visibility) and how to enhance its overall security posture.
    2. If you have identified the initial attack medium or entry point, find out what is the best way to reduce the similar risk.

      For example, if malware is initially accessed through an unpatched public-facing ECS instance and you have applied the missing patch to all current instances, consider how to improve the patch management process to test and apply the patch more quickly and consistently to prevent similar problems in the future.

    3. If you have developed technical measures to address a particular threat, assess the probability that these steps will be automatically performed when the relevant threat is detected. Automated responses can help mitigate threats more quickly, thereby minimizing the scope and severity of the impact.
    4. Collect lessons learned from all roles in the response process and update your incident response plan, disaster recovery plan, and this response plan as needed. New technical capabilities and personnel skills should be considered and funded as well to fill the gaps identified.