Skip to content
ADP
API Design PrincipleBETA

[ADP-607] Event Tracing

Overview

In distributed systems, tracing the flow and processing of events is crucial for understanding system behavior, diagnosing issues, and optimizing performance. This ADP aims to define a standard method for implementing event tracing using W3C TraceContext and CloudEvents distributed tracing extension.

Guidelines

  1. MUST use the W3C TraceContext standard for cross-service distributed tracing.

  2. SHOULD use the CloudEvents distributed tracing extension to include tracing information in events.

  3. Event producers MUST add tracing context when generating events.

  4. Event consumers MUST propagate tracing context when processing events.

  5. Tracing implementation SHOULD be transparent to application code, minimizing intrusiveness.

W3C TraceContext

W3C TraceContext defines two HTTP headers:

  1. traceparent: Contains trace identifier and parent identifier
  2. tracestate: Contains vendor-specific tracing information

Example:

http
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE

CloudEvents Distributed Tracing Extension

CloudEvents defines two extension attributes to support distributed tracing:

  1. traceparent: Corresponds to W3C TraceContext's traceparent
  2. tracestate: Corresponds to W3C TraceContext's tracestate

Example CloudEvent:

json
{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "traceparent" : "00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01",
    "tracestate" : "rojo=00f067aa0ba902b7,congo=t61rcWkgMzE",
    "data" : {
        "message" : "Hello, World!"
    }
}

Implementation Recommendations

  1. Use tracing libraries or frameworks that support W3C TraceContext.

  2. Ensure correct propagation of traceparent and tracestate when passing events between services.

  3. Include trace IDs in log entries to facilitate correlation between logs and traces.

  4. Consider implementing sampling strategies to control the volume of trace data.

  5. Use distributed tracing visualization tools to analyze and present trace data.

Security Considerations

  1. Ensure that tracing information does not contain sensitive data.

  2. Implement appropriate access controls to limit access to trace data.

  3. Consider using sampling in production environments to reduce the volume of trace data, thereby lowering potential security risks.

References