OPS06-06 Performing Distributed Tracing

A trace indicates a series of causality-related distributed events that encode the end-to-end request flows through a distributed system.

Risk level
High
Key strategies
When a fault occurs in the system, the behavior and interaction of each component in the system need to be traced. Distributed traces in the system can quickly locate and rectify the fault.
Design suggestions
Link traces can be implemented by adding a trace identifier to the system. When the system receives a request, an identifier is added to the request and transferred throughout the system. Each component can add identifiers to their logs to facilitate troubleshooting when an issue occurs. Open-source tools such as Jaeger, Zipkin, SkyWalking, and CAT can be used for distributed traces. Huawei Cloud APM provides the tracing observation capability.

For details, see Locating the Causes of Request Errors.