Product Concepts

IDC

Internet data center (IDC): a professional physical facility that provides infrastructure services for centralized data storage, processing, and transmission.

Patch Baselines

A collection of preset patch management rules, including the OS type, patch category, and compliance level. Generally, patches are scanned and installed on instances based on the patch baseline.

Alarm Conversion Rules

Raw alarm information ingested to COC is converted to incidents or aggregated alarms based on a variety of triggering types and conditions, implementing alarm aggregation and noise reduction.

Incidents

An IT Operations (ITOps) concept. COC incidents are manually created, converted from alarms, or automatically generated based on alarm conversion rules. Incidents are abnormal statuses or service interruptions in an application and need to be quickly responded to and handled through a standard process. There are five standard incident levels: P1, P2, P3, P4, and P5.

Aggregated Alarms

Content automatically generated after the COC alarm conversion rules are triggered. You can use COC to clear aggregated alarms, convert alarms to incidents, and execute response plans.

Issues

An ITOps concept. Issues generally refer to the deep causes of incidents. The causes are determined through systematic investigations.

War Rooms

In COC, a war room is a meeting set up to quickly recover services when a group fault or major fault occurs. It enables joint operations of the O&M, R&D, and operations teams, and ensure quick service recovery. In a war room, you can use application diagnosis and response plans to quickly recover applications. In addition, you can start up DingTalk, WeCom, and Lark war room groups.

Improvement

An ITOps concept. Based on incident analysis and alarm handling, the architecture, configuration, and process are systematically optimized to continuously improve application quality and efficiency.

Change

An ITOps concept. It is a general term for a series of operations, such as adding, deleting, modifying, and querying applications, resources, architectures, and configurations.

PRR

A Production readiness review (PRR) in the O&M domain refers to a standardized process that systematically evaluates and verifies whether a service or application meets production environment requirements such as high availability, maintainability, and disaster recovery capabilities before it is rolled out.

SLI

SLI is short for Service level Indicator, which is a basic metric of the SLA and SLO. It directly reflects the key quality dimensions, such as delay and error rate, of services.

SLO

SLO is short for Service level objective, which is used to measure the system stability and reliability based on the SLI. It is the core basis of the SLA. Its core value lies in transforming the vague system stability into a quantifiable commitment (for example, "monthly availability ≥ 99.999%).

SLA

SLA is short for service level agreement, which is a service quality commitment that clearly defines the performance metrics, availability standards, and liability clauses that the service provider must meet. The core is to balance user requirements and service capabilities through quantitative objectives (for example, availability ≥ 99.999%).

Previous topic: COC and Other Services