Help Center/ SecMaster/ User Guide/ Playbook Overview/ Ransomware Incident Response Solution
Updated on 2024-11-21 GMT+08:00

Ransomware Incident Response Solution

Incident Type: Ransomware Attacks

Ransomware is a special type of malware designed to deny a user or organization access to files on their computer. So ransomware attacks are classified as denial-of-access (DoS) attacks. Ransomware uses technical means to restrict victims from accessing their own systems or data in the systems, such as documents, emails, databases, and source code. To remove the restrictions, victims have to pay money to attackers.

Once a ransomware attack succeeded, it is hard to take measures to interrupt the attack or mitigate the damage. Therefore, it is important to take preventive measures to reduce such attacks.

This document describes a series of steps and strategies designed to effectively manage and respond to ransomware attacks.

Incident Response Solution: Ransomware Host Isolation Playbook

The Ransomware host isolation playbook preconfigured in SecMaster automatically isolates compromised hosts. When a ransomware alarm is triggered, the playbook adds the affected host to the security group and blocks all inbound and outbound traffic to isolate the host. This isolation mode is critical to preventing ransomware from spreading on the network.

Figure 1 Host isolation - Malware

Incident Response Process

  1. Obtain, store, and record evidence.

    Based on your cloud environment configurations, you can identify potential ransomware according to any of the following symptoms:
    1. An IT employee reported that an ECS could not be accessed over SSH or other similar methods.

      New ECSs can be created and no alerts are reported by Cloud Eye, but the ECS is inaccessible.

    2. A service ticket is triggered by abnormal metrics or logs of ECS instances and generated in your service ticket system.
    3. A network fault error was reported on the ECS console or by Cloud Eye for the ECS.
    4. Attackers receive ransomware requests through other communication channels (such as emails).
    5. Cloud security services or other security tools detected that the ECS instance was attacked.
    6. Your cloud or other third-party monitoring systems generated alerts or reported abnormal metrics.
    7. When an incident is identified as a security incident, it is critical to assess its impact scope, including the number of affected resources and the sensitivity of the data involved.
    8. Check whether there are any known events that may cause service interruption or affect instance metrics. For example, the number of network metrics in Cloud Eye increases due to ongoing events.
    9. Use Cloud Eye or other application performance monitoring tools to compare the recorded performance baseline metrics of the application with the current abnormal metrics to determine whether abnormal behavior exists.
    10. Determine the classification level of data stored in ECS instances, OBS buckets, or other storage.
    11. Ensure that a ticket has been created for the incident. If no ticket generated automatically, manually create one.
    12. If there is a ticket already, determine the alert or metric associated with the problem.

      If a ticket is automatically generated due to an alarm or metric, the reason why the ticket is automatically generated can be specified. If the ticket is not automatically generated, the alarm or notification that causes the problem is recorded. Check whether the service interruption is caused by a known event or other causes. If you cannot determine whether the service interruption is caused by a known event or other causes, record the actual attack medium.

    13. Use log search to determine when a ransomware attack occurred.

      You can use Cloud Trace Service (CTS) to collect, store, and query operation records of all cloud resources.

    14. Determine and document the impact on end users and their experience.

      If users are affected, record the detailed steps that lead to the attack event in the service ticket to help identify the attack medium and develop appropriate mitigation policies.

    15. Determine the roles involved in the incident based on the incident response plan of your enterprise or organization. Notify related roles, including legal affairs personnel, technical teams, and developers, and ensure that they are added to the work order and WarRoom to continuously respond to the incident.
    16. Ensure that your organization's legal counsel is aware of and involved in the internal response and external communication of the incident, and add colleagues responsible for public or external communication to the work order so that they can fulfill their communication responsibilities in a timely manner. If local or federal regulations require the reporting of such incidents, please notify the authorities concerned and seek guidance from their legal counsel or law enforcement on the collection of evidence and the preservation of the chain of custody. Reporting such incidents to open databases, government agencies, or non-governmental organizations may also help to advance the response to such incidents, if not required by regulations.

  2. Contain incidents.

    Early detection of abnormal user behavior or network activities is the key to reducing the impact of ransomware incidents. To help you control events, perform the following steps: If the following steps apply, work with the legal and compliance team of your enterprise/organization to take any necessary response measures and continue the incident response process.

    1. Determine the type of ransomware involved in the attack event. Common ransomware types are as follows:
      • Encrypted ransomware: Encrypts files and objects.
      • Lock ransomware: Lock the access to the device.
      • Other types: new types or types that are not recorded before.
    2. For workloads affected by attacks, you can modify security groups, OBS bucket policies, or related identity and access management policies to isolate networks or Internet connections, minimizing the possibility of attack spreading, or minimize the chances of attackers accessing these resources.

      Note that sometimes modifying a security group may not achieve the expected effect due to connection tracking.

    3. Evaluate whether the ECS instance needs to be restored. If the instance belongs to an auto scaling group, remove the instance from the group. In addition, if the event is related to a vulnerability in the host operating system, update the system and ensure that the vulnerability has been fixed.
    4. View operation logs on CTS to check whether there are unauthorized operations, such as creating unauthorized IAM users, policies, roles, or temporary security credentials. If yes, delete any unauthorized IAM users, roles, and policies, and revoke all temporary credentials.
    5. If the attack medium is caused by unpatched software, OS updates, expired malware, or antivirus tools, ensure that all ECS instances are updated to the latest OS and all software packages and patches are up to date, in addition, the virus feature codes and definition files on all ECSs are the latest. You can perform the following operations: For a variable architecture, patch it immediately. For an immutable architecture, deploy it again.
    6. According to the update in 5, delete all remaining resources that are identified as at risk of infection (possibly accessing the same medium through which the ransomware is downloaded, whether by email, by visiting infected websites, or by other means). For resources managed through auto scaling, focus on identifying attack media and take measures to prevent other resources from being infected through the same media.

  3. Eradicate incidents.

    1. Assess whether the impact of an incident is limited to a specific part of the environment. If the ransomware data can be restored from the backup or snapshot, restore the data by referring to the backup or snapshot.

      Note that investigating incidents in an isolated environment for root cause analysis is of great value in implementing controls to prevent similar incidents in the future.

    2. Consider using the latest antivirus or anti-malware to eliminate ransomware.

      Exercise caution when performing this operation because it may alert attackers. It is recommended that you view locked or encrypted objects in an isolated forensic environment, for example, removing network access permissions from infected ECS instances.

    3. Delete all malware addresses identified during forensic analysis and identify intrusion metrics.
    4. If a ransomware virus has been identified, check whether there are available third-party decryption tools or other online resources that may help decrypt data.

  4. Recover from incident.

    1. Determine the restoration points of all restoration operations performed from the backup.
    2. View the backup policy to determine whether all objects and files can be restored, depending on the lifecycle policy applied to the resource.
    3. Use the forensic method to confirm that the data is secure before the restoration, and then restore the data from the backup or restore the data to an earlier snapshot of the ECS instance.
    4. If you have successfully restored data using any open-source decryption tool, delete the data from the instance and perform necessary analysis to confirm that the data is secure. Then, restore the instance, terminate or isolate the instance, create a new instance, and restore the data to the new instance.
    5. If restoring or decrypting data from a backup is not feasible, evaluate the possibility of restarting in a new environment.

  5. Perform post-incident activities.

    1. Documenting and applying lessons learned from simulations and live events to subsequent processes and procedures will enable the injured party to better understand how the event occurred in the system configuration and processes (e.g., where weaknesses exist, where automation may fail, and where there is a lack of visibility) and how to enhance its overall security posture.
    2. If you have identified the initial attack medium or entry point, what is the best way to reduce the risk of recurrence?

      For example, if malware is initially accessed through an unpatched public-facing ECS instance and you have applied the missing patch to all current instances, consider how to improve the patch management process to test and apply the patch more quickly and consistently to prevent similar problems in the future.

    3. If you have developed technical steps to address a particular threat, assess the probability that these steps will be automatically performed when the relevant threat is detected. Using automated processing can help mitigate threats more quickly, thereby minimizing the scope and severity of the impact.
    4. Collect lessons learned from all roles in the response process and update your incident response plan, disaster recovery plan, and this response plan as needed. New technical capabilities and personnel skills should be considered and funded as well to fill the gaps identified.