Alert Management

Threat Alerts

In general, threat alerts refer to threats that, due to natural, human, software, or hardware reasons, are detrimental to information systems or cause negative effects on the society. In SecMaster, threat alerts are detected security incidents that threaten asset security through big data technology.

Incidents

An incident is a broad concept. It can include but is not limited to alerts. It can be a part of normal system operations, exceptions, or errors. In the O&M and security fields, an incident usually refers to a problem or fault that has occurred and needs to be focused on, investigated, and handled. An incident may be triggered by one or more alerts or other factors, such as user operations and system logs.

An incident is usually used to record and report historical activities in a system for analysis and audits.

Alerts

An alert is a notification of abnormal signals in O&M. It is usually automatically generated by a monitoring system or security device when detecting an exception in the system or networks. For example, when the CPU usage of a server exceeds 90%, the system may generate an alert. These exceptions may include system faults, security threats, or performance bottlenecks.

Generally, an alert can clearly indicate the location, type, and impact of an exception. In addition, alerts can be classified by severity, such as critical, major, and minor, so that O&M personnel can determine which alerts need to be handled first based on their severity.

The purpose of an alert is to notify related personnel in a timely manner so that they can make a quick response and take measures to fix the problem.

When SecMaster detects an exception (for example, a malicious IP address attacks an asset or an asset has been hacked into) in cloud resources, it generates an alert and displays the threat information on the Alerts page in SecMaster.

Relationships Between Alerts and Incidents

This part describes the meanings and differences between alerts and incidents, reasons for converting alerts into incidents, and reasons for associating alerts with incidents.

Meanings and Differences Between Alerts and Incidents

**Table 1** **Meanings and differences between alerts and incidents**
Type	Description
Definition	Alerts An alert is a notification of abnormal signals in O&M. It is usually automatically generated by a monitoring system or security device when detecting an exception in the system or networks. For example, when the CPU usage of a server exceeds 90%, the system may generate an alert. These exceptions may include system faults, security threats, or performance bottlenecks. Generally, an alert can clearly indicate the location, type, and impact of an exception. In addition, alerts can be classified by severity, such as critical, major, and minor, so that O&M personnel can determine which alerts need to be handled first based on their severity. The purpose of an alert is to notify related personnel in a timely manner so that they can make a quick response and take measures to fix the problem. Incidents An incident is a broad concept, and may include, but is not limited to, an alert. An incident can be a part of the normal operation of the system, an exception, or an error. In the O&M and security fields, an incident usually refers to a problem or fault that has occurred and needs to be focused on, investigated, and handled. An incident may be triggered by one or more alerts or other factors, such as user operations and system logs. An incident is usually used to record and report historical activities in a system for analysis and audits.
Handling process	Alerts The alert handling process includes receiving, confirming, analyzing, responding to, and closing alerts. When the monitoring system generates an alert, O&M personnel need to confirm that the alert is a positive one. Then, they need to analyze the alert causes and impact scope, take measures to rectify the fault, and close the alert. Incidents The event handling process is more complex and comprehensive. In addition to each phase in the alert handling process, incident handling also involves incident investigation, impact assessment, risk analysis, emergency plan formulation, emergency response execution, and post-event summary. The objective of incident handling is to completely solve problems, prevent similar incidents in the future, and reduce the impact of incidents on services.
Importance and urgency	Alerts Generally, alerts need to be evaluated and responded immediately. The severity and importance of each alert vary depending on the alert type, severity, and impact scope. Some alerts may be simple reminders or warnings, while others may indicate that the system has been severely attacked or faces major fault risks. Incidents In some cases, incidents may need to be recorded, analyzed, and handled, but do not require immediate responses. An incident is usually of higher importance and urgency than an alert. Because an incident has occurred and has had an actual impact, immediate measures need to be taken to control the risk and solve the problem. If an incident is not handled in a timely manner, it may cause significant economic loss or reputation damage to the organization.

Causes for converting alerts into incidents or associating alerts with incidents

An alert is a notification generated when a system or service becomes abnormal or a potential fault occurs. These exceptions may directly affect service availability. So alerts must be handled in a timely manner to prevent service exceptions. When an alert is generated, you need to take corresponding measures to rectify the fault. Otherwise, services may be abnormal due to these exceptions or faults.

An incident is a notification generated when the system or service is running properly. It may involve some important status changes, but may not cause service exceptions. So incidents do not need to be handled. They are mainly used to analyze and locate problems.

**Table 2** Causes for converting alerts into incidents or associating alerts with incidents
Type	Description
Alert-to-Incident reasons	When the severity of an alert reaches a certain level, an alert appears continuously, or the impact scope is wide, the alert may not only be a signal that requires attention. It also indicates that a continuous problem exists in the system or network. In this case, the alert has evolved into an incident that needs to be handled immediately. So, we need to convert such alerts into incidents to further investigate the root causes and take necessary measures. Generally, an alert will be converted to an incident out of the following causes: Information aggregation and classification An alert is usually an instant response to a violation against a specific condition or threshold. The number of alerts is increasing over time. If they are handled independently, it would cause chaos and waste time and human resources. Aggregating these alerts into incidents helps related personnel classify alerts by alert type, source, and impact so that they can handle them more effectively. Simplified working processes During the process to convert alerts into incidents, alerts are filtered, deduplicated, and aggregated. So that similar alerts that may be triggered over and over again can be aggregated into one incident. In this way, the workload of handling alerts is reduced; the handling process is clearer; and the tracing and recording become easier. Higher problem-solving efficiency As an incident has much more context details than an alert, related personnel can easily identify the root cause. This helps quickly locate issues and take effective measures. Historical data review and trend analysis An incident usually records the entire process of how an issue occurred, evolved, and is resolved. So converting alerts into incidents provides helpful historical data for prevention of similar issues and system optimization. By analyzing the trend of an incident, O&M personnel can discover potential weak points in the system and take measures in advance. Cross-department collaboration enhanced In a large organization, different departments may need to participate in the handling of problems. After an alert is converted into an incident, related information can be shared among departments more easily, which promotes cross-department collaboration and improves problem solving efficiency. In a word, converting alerts to incidents helps simplify working processes, improve problem solving efficiency, and facilitate historical review and trend analysis.
Causes for associating alerts with incidents	As an important part of monitoring and fault management, associating alerts with incidents involve combining multiple independent but possibly correlated incidents or alerts to better understand the root cause and scope of a problem, facilitating troubleshooting and response. Generally, an alert will be associated with an incident out of the following causes: Dependencies In a complex system, there are complex dependencies between components. When a component becomes faulty, other components that depend on the component may be affected, causing a series of alerts. For example, in the microservice architecture, the crash of a service may cause problems in other services that use the service. Resource sharing When multiple systems or services share the same resource (such as a server, database, or network device), the problem of the resource may cause multiple systems or services to generate alerts at the same time. For example, a performance deterioration of a shared database server may trigger performance alerts for multiple applications that depend on the database. Chain reactions In some cases, an initial failure may trigger a series of chain reactions, affecting more components or systems. This chain reaction may be caused by improper system design, incomplete error handling mechanism, or resource limitations (such as performance deterioration caused by memory leakage). Configuration errors Incorrect or inconsistent configurations may cause system behavior exceptions, triggering multiple seemingly irrelevant alerts. For example, incorrect routing configurations may cause traffic to be incorrectly routed to unstable servers, causing multiple performance-related alerts. Software defects Software defects, such as bugs, may cause programs to be abnormal in specific conditions and trigger alerts. If these defects affect multiple components or systems, multiple associated alerts may be generated. External factors External factors, such as natural disasters (such as earthquakes and floods), network attacks, and infrastructure faults (such as power outages and network interruptions), may also cause problems in multiple systems or components at the same time and trigger a large number of alerts.

Related Operations

Incident management
- Viewing incidents: For details, see Viewing Incidents.
- Adding or editing an incident. For details, see Adding or Editing an Incident.
- Importing or exporting incidents: For details, see Importing or Exporting Incidents.
- Closing or deleting an incident: For details, see Closing or Deleting an Incident.
Alert management
- Viewing alert details: For details, see Viewing Alert Details.
- Handling common alerts: For details, see Suggestions on Handling Common Alerts.
- Converting an alert into an incident or associating an alert with an incident: For details, see Converting an Alert into an Incident or Associating an Alert with an Incident.
- One-click blocking or unblocking: For details, see One-Click Blocking or Unblocking.
- Closing or deleting an alert: For details, see Closing and Deleting an Alert.
- Adding and editing an alert: For details, see Adding and Editing an Alert.
- Importing and exporting alert: For details, see Importing and Exporting Alerts.