Threat Alerts
In general, threat alerts refer to threats that, due to natural, human, software, or hardware reasons, are detrimental to information systems or cause negative effects on the society. In SecMaster, threat alerts are detected security incidents that threaten asset security through big data technology.
Incidents
An incident is a broad concept. It can include but is not limited to alerts. It can be a part of normal system operations, exceptions, or errors. In the O&M and security fields, an incident usually refers to a problem or fault that has occurred and needs to be focused on, investigated, and handled. An incident may be triggered by one or more alerts or other factors, such as user operations and system logs.
An incident is usually used to record and report historical activities in a system for analysis and audits.
Alerts
An alert is a notification of abnormal signals in O&M. It is usually automatically generated by a monitoring system or security device when detecting an exception in the system or networks. For example, when the CPU usage of a server exceeds 90%, the system may generate an alert. These exceptions may include system faults, security threats, or performance bottlenecks.
Generally, an alert can clearly indicate the location, type, and impact of an exception. In addition, alerts can be classified by severity, such as critical, major, and minor, so that O&M personnel can determine which alerts need to be handled first based on their severity.
The purpose of an alert is to notify related personnel in a timely manner so that they can make a quick response and take measures to fix the problem.
When SecMaster detects an exception (for example, a malicious IP address attacks an asset or an asset has been hacked into) in cloud resources, it generates an alert and displays the threat information on the Alerts page in SecMaster.
Relationships Between Alerts and Incidents
This part describes the meanings and differences between alerts and incidents, reasons for converting alerts into incidents, and reasons for associating alerts with incidents.
- Meanings and Differences Between Alerts and Incidents
Table 1 Meanings and differences between alerts and incidents
Type |
Description |
Definition |
- Alerts
An alert is a notification of abnormal signals in O&M. It is usually automatically generated by a monitoring system or security device when detecting an exception in the system or networks. For example, when the CPU usage of a server exceeds 90%, the system may generate an alert. These exceptions may include system faults, security threats, or performance bottlenecks.
Generally, an alert can clearly indicate the location, type, and impact of an exception. In addition, alerts can be classified by severity, such as critical, major, and minor, so that O&M personnel can determine which alerts need to be handled first based on their severity.
The purpose of an alert is to notify related personnel in a timely manner so that they can make a quick response and take measures to fix the problem.
- Incidents
An incident is a broad concept, and may include, but is not limited to, an alert. An incident can be a part of the normal operation of the system, an exception, or an error. In the O&M and security fields, an incident usually refers to a problem or fault that has occurred and needs to be focused on, investigated, and handled. An incident may be triggered by one or more alerts or other factors, such as user operations and system logs.
An incident is usually used to record and report historical activities in a system for analysis and audits.
|
Handling process |
- Alerts
The alert handling process includes receiving, confirming, analyzing, responding to, and closing alerts. When the monitoring system generates an alert, O&M personnel need to confirm that the alert is a positive one. Then, they need to analyze the alert causes and impact scope, take measures to rectify the fault, and close the alert.
- Incidents
The event handling process is more complex and comprehensive. In addition to each phase in the alert handling process, incident handling also involves incident investigation, impact assessment, risk analysis, emergency plan formulation, emergency response execution, and post-event summary. The objective of incident handling is to completely solve problems, prevent similar incidents in the future, and reduce the impact of incidents on services.
|
Importance and urgency |
- Alerts
Generally, alerts need to be evaluated and responded immediately.
The severity and importance of each alert vary depending on the alert type, severity, and impact scope. Some alerts may be simple reminders or warnings, while others may indicate that the system has been severely attacked or faces major fault risks.
- Incidents
In some cases, incidents may need to be recorded, analyzed, and handled, but do not require immediate responses.
An incident is usually of higher importance and urgency than an alert. Because an incident has occurred and has had an actual impact, immediate measures need to be taken to control the risk and solve the problem. If an incident is not handled in a timely manner, it may cause significant economic loss or reputation damage to the organization.
|
- Causes for converting alerts into incidents or associating alerts with incidents
An alert is a notification generated when a system or service becomes abnormal or a potential fault occurs. These exceptions may directly affect service availability. So alerts must be handled in a timely manner to prevent service exceptions. When an alert is generated, you need to take corresponding measures to rectify the fault. Otherwise, services may be abnormal due to these exceptions or faults.
An incident is a notification generated when the system or service is running properly. An event may involve some important status changes, but may not cause service exceptions. So incidents do not need to be handled. They are mainly used to analyze and locate problems.
Table 2 Causes for converting alerts into incidents or associating alerts with incidents
Type |
Description |
Alert-to-Incident reasons |
When the severity of an alert reaches a certain level, an alert appears continuously, or the impact scope is wide, the alert may not only be a signal that requires attention. It also indicates that a continuous problem exists in the system or network. In this case, the alert has evolved into an incident that needs to be handled immediately. So, we need to convert such alerts into incidents to further investigate the root causes and take necessary measures. Generally, an alert will be converted to an incident out of the following causes:
- Information aggregation and classification
An alert is usually an instant response to a violation against a specific condition or threshold. The number of alerts is increasing over time. If they are handled independently, it would cause chaos and waste time and human resources. Aggregating these alerts into incidents helps related personnel classify alerts by alert type, source, and impact so that they can handle them more effectively.
- Simplified working processes
During the process to convert alerts into incidents, alerts are filtered, deduplicated, and aggregated. So that multiple similar alerts that may be triggered are integrated into a more representative incident. In this way, the workload of handling alerts is reduced; the handling process is clearer; and the tracing and recording become easier.
- Higher problem-solving efficiency
As an incident has much more context details than an alert, related personnel can easily identify the root cause. This helps quickly locate issues and take effective measures.
- Historical data review and trend analysis
An incident usually records the entire process of how an issue occurred, evolved, and is resolved. So converting alerts into incidents provides helpful historical data for prevention of similar issues and system optimization. By analyzing the trend of an incident, O&M personnel can discover potential weak points in the system and take measures in advance.
- Cross-department collaboration enhanced
In a large organization, different departments may need to participate in the handling of problems. After an alert is converted into an incident, related information can be shared among departments more easily, which promotes cross-department collaboration and improves problem solving efficiency.
In a word, converting alerts to incidents helps simplify working processes, improve problem solving efficiency, and facilitate historical review and trend analysis. |
Causes for associating alerts with incidents |
As an important part of monitoring and fault management, associating alerts with incidents involve combining multiple independent but possibly correlated incidents or alerts to better understand the root cause and scope of a problem, facilitating troubleshooting and response. Generally, an alert will be associated with an incident out of the following causes:
- Dependencies
In a complex system, there are complex dependencies between components. When a component becomes faulty, other components that depend on the component may be affected, causing a series of alerts. For example, in the microservice architecture, the crash of a service may cause problems in other services that use the service.
- Resource sharing
When multiple systems or services share the same resource (such as a server, database, or network device), the problem of the resource may cause multiple systems or services to generate alerts at the same time. For example, a performance deterioration of a shared database server may trigger performance alerts for multiple applications that depend on the database.
- Chain reactions
In some cases, an initial failure may trigger a series of chain reactions, affecting more components or systems. This chain reaction may be caused by improper system design, incomplete error handling mechanism, or resource limitations (such as performance deterioration caused by memory leakage).
- Configuration errors
Incorrect or inconsistent configurations may cause system behavior exceptions, triggering multiple seemingly irrelevant alerts. For example, incorrect routing configurations may cause traffic to be incorrectly routed to unstable servers, causing multiple performance-related alerts.
- Software defects
Software defects, such as bugs, may cause programs to be abnormal in specific conditions and trigger alerts. If these defects affect multiple components or systems, multiple associated alerts may be generated.
- External factors
External factors, such as natural disasters (such as earthquakes and floods), network attacks, and infrastructure faults (such as power outages and network interruptions), may also cause problems in multiple systems or components at the same time and trigger a large number of alerts.
|
Related Operations
- Incident management
- Alert management