Updated on 2022-06-02 GMT+08:00

Basic Concepts

Topology

A topology graphically displays call and dependency relationships between applications. It is composed of circles, arrows, and resources. Each circle represents an application, and each section in the circle represents an instance. The fraction in each circle indicates number of active instance/total number of instances. The data below the fraction indicates the service latency, call count, and error count. Each line with an arrow represents a call relationship. Thicker arrows indicate more calls. The values above a line separately indicate the throughput and overall latency. Throughput is the number of calls within the selected period. Application Performance Index (Apdex) is used in the topology to quantify user satisfaction with application performance. Different colors indicate different Apdex ranges, helping you quickly detect and locate faults.

Transaction

A transaction is usually an HTTP request. The process is as follows: user request > web server > database > web server > user request. Transactions are one-off tasks, which are completed by using applications. In the example of an e-commerce application, querying a product is a transaction, and making a payment is also a transaction.

Tracing

APM traces and records service calls, and visually presents the execution tracks and statuses of service requests in distributed systems, so that you can quickly locate performance bottlenecks and faults.

Application

You can put the same type of services into an application for better performance management. For example, you can put accounts, products, and payment applications into the Mall application.

Apdex

Apdex is an open standard developed by the Apdex alliance to measure application performance. The application response time is converted into user satisfaction with application performance. The Apdex value ranges from 0 to 1.

  • Apdex principles

Apdex defines the optimal threshold "T" for the application response time. "T" is determined based on performance expectations. Based on the actual response time and "T", user experience can be categorized as follows:

Satisfied: indicates that the actual response time is shorter than or equal to "T". For example, if "T" is 1.5s and the actual response time is 1s, user experience is satisfied.

Tolerating: indicates that the actual response time is greater than "T", but shorter than or equal to "4T". For example, if "T" is 1s, the tolerable upper threshold for the response time is 4s.

Frustrated: indicates that the actual response time is greater than "4T".

  • Apdex calculation method

In APM, "T" is the threshold set in Configuring Apdex Thresholds, the application response latency equals to the total service latency, and the Apdex value ranges from 0 to 1. The calculation formula is as follows:

Apdex = (Number of normal calls x 1 + Number of slow calls x 0.5)/Total number of calls

In the preceding information:

Number of normal calls: indicates the number of successful calls that are completed within a time period of greater than 0 but less than "T".

Number of slow calls: indicates the number of successful calls that are completed within a time period of greater than or equal to "T" but less than "4T".

Number of extremely slow calls: indicates number of successful calls that are completed within a time period of greater than "4T".

Total number of calls: indicates the total number of normal calls, slow calls, extremely slow calls, and incorrect calls.

The Apdex calculation formula is as follows:

Apdex value indicates application performance status, that is, user satisfaction with application performance. Different Apdex values are marked by different colors. For details, see Table 1.

Table 1 Apdex values

Apdex Value

Color

Description

0.75 ≤ Apdex ≤ 1

Green

Fast response; good user experience

0.3 ≤ Apdex < 0.75

Yellow

Slow response; fair user experience

0 ≤ Apdex < 0.3

Red

Very slow response; poor user experience

Configuring an Apdex threshold

You can configure the Apdex threshold based on your service requirements. For details, see Configuring Apdex Thresholds.

TP99 Latency

TP99 latency is the minimum time for meeting requirements of 99% requests. In APM, latency refers to TP99 latency.

For example, the time required for processing four requests is 10 ms, 100 ms, 500 ms, and 20 ms respectively.

In the four requests, the number of 99% requests can be calculated by multiplying 4 by 99%, and the rounding value is 4. That is, the number of 99% requests is 4. The minimum time required for the four requests is 500 ms. Therefore, TP99 latency is 500 ms.

Overall Latency/Service Latency

Latency refers to the period from initiating a request to getting a response. In APM, the overall latency refers to the total time consumed by a request, and the service latency refers to the time consumed by a service. For example, assume that service A calls service B, and service B calls service C, as shown in the following figure:

Overall latency = TA; Latency of service A = TA; Latency of service B = TB1 + TB2; Latency of service C = TC

Collection Probe

Probes use the bytecode enhancement technology to track calls and generate data. The data will be collected by the ICAgent and then displayed on the UI. If the memory monitoring mechanism is enabled and the instance memory usage is too high, probes enter the hibernation state and stop data collection. For details about the types of data collected by probes, see Scope and Usage.

ICAgent

ICAgent is the collection agent of APM. It runs on the server where applications are deployed and collects data obtained by probes in real time. Before using APM, ensure that the ICAgent is installed according to Installing the ICAgent.