Updated on 2024-09-06 GMT+08:00

Playbook Overview

Background

An attack link is an important concept in the network security field. It refers to a series of attack steps and paths taken by an attacker on a target network or system to achieve an attack purpose. These steps and paths form an attack link, through which an attacker can gradually penetrate into the target system and finally achieve the attack target.

The attack link has great harm to the target network or system. Once an attacker successfully constructs an attack link and breaks through the defense measures of the target system, the attacker can perform any operation on the target system, including stealing sensitive information, damaging system data, and paralyzing system services. These hazards not only cause economic losses, but also may have a serious impact on national security and social stability.

Response Solution

If a domain name is attacked, the attacker usually further hacks into backend servers. This playbook analyzes attack chains and generates alerts. Once this playbook discovers that attacks are approaching servers, it notifies operations personnel.

The Attack link analysis alert notification playbook has been matched the Attack link analysis alert notification workflow. This workflow needs to use Simple Message Notification (SMN) to send notifications. So you need to create and subscribe to a notification topic in SMN.

The Attack link analysis alert notification workflow queries the list of website assets associated with affected assets that are marked by HSS alerts through asset associations. By default, a maximum of three website assets can be queried.

  • If there are associated website assets, the workflow queries WAF alerts generated for each website asset from 3 hours ago to the current time. A maximum of three alerts can be queried. The alert types include XSS, SQL injection, command injection, local file inclusion, remote file inclusion, web shell, and vulnerability exploits.
  • If there is an alert generated in WAF, the workflow associates the WAF alert with the corresponding HSS alert and sends a notification the email box you specified through SMN.

Incident Response

  1. Obtain, store, and record evidence.

    1. Based on the configurations of your Huawei cloud environment, use HSS and WAF to detect alerts.
    2. Access affected ECSs over SSH and check the instance status and monitoring information to see if there are any exceptions. Alerts you receive from other channels are supported.
    3. Once an attack is confirmed as an incident, the affected scope, attacked machines, affected services, and data information need to be assessed.
    4. Use SecMaster to convert alerts to incidents and continue to monitor and record incident details. For details, see Converting Alerts to Incidents.
    5. In addition, log information can be traced. All related log information can be reviewed through the security analysis capability, and recorded and archived in the event management module for subsequent operation tracing.

  2. Contain incidents.

    1. Determine the attack type, affected hosts, and service processes based on alarms and logs.
    2. Scripts such as isolation, killing, and policy blocking are used to perform operations such as process killing and software isolation on involved process software to reduce subsequent impacts.
    3. Check the infection scope. If there is an infection risk, check it. If there is an infection risk, handle it in a timely manner.
    4. In addition, other playbook processes can be used for risk control, such as host isolation. Security group policies can be used to isolate infected machines from access control and isolate network transmission risks.

  3. Eradicate incidents.

    1. Evaluate whether the affected hosts need to be hardened and restored. If the host has been damaged, you need to harden and restore the host based on the source tracing result. If attacks are caused by security credential leakage, delete any unauthorized IAM users, roles, and policies, and revoke credentials to harden the host.
    2. You can check for vulnerabilities, outdated software, and unpatched vulnerabilities on infected machines. These may cause continuous collapse of subsequent machines. You can use the vulnerability management function to check and fix the vulnerabilities of the corresponding machines. Check whether there are risky configurations. You can use the baseline check function to check the host configurations and rectify risky configurations in a timely manner.
    3. Evaluate the impact scope. If other hosts have been affected, handle all affected hosts.

  4. Recover from incident.

    1. Determine the restoration points of all restoration operations performed from the backup.
    2. View the backup policy to determine whether all objects and files can be restored, depending on the lifecycle policy applied to the resource.
    3. Use the forensic method to confirm that the data is secure before the restoration, and then restore the data from the backup or restore the data to an earlier snapshot of the ECS instance.
    4. If you have successfully restored data using any open-source decryption tool, delete the data from the instance and perform necessary analysis to confirm that the data is secure. Then, restore the instance, terminate or isolate the instance, create a new instance, and restore the data to the new instance.
    5. If restoring or decrypting data from a backup is not feasible, evaluate the possibility of restarting in a new environment.

  5. Perform post-incident activities.

    1. Analyze alarm details in the entire alarm handling process, continuously operate and optimize the model, and improve the model alarm accuracy. If it is determined that the alarm is related to a service and there is no risk, the alarm can be directly filtered by using a model.
    2. By tracing alarms, you can better understand the entire process of an event, continuously optimize asset protection policies, reduce resource risks, and reduce the attack surface.
    3. Optimize the automatic processing playbook process based on the actual service scenario. For example, you can replace the manual review policy with the automatic processing policy to improve the alarm accuracy after analysis, improving the processing efficiency and quickly handling risks.
    4. Based on risk analysis and attack link alarm analysis, perform risk control before an event occurs.