Traces and Spans in OpenTelemetry

A simple mental model for traces and spans in distributed systems.

The basic idea

When a request travels through multiple services, it leaves a trace. A trace is made of spans.

Trace: GET /checkout
├─ Span: API Gateway
├─ Span: Auth service
├─ Span: Cart service
├─ Span: Payment service
└─ Span: Database query

Each span is one unit of work with a start, an end, and context.

Logs tell you what happened. Metrics tell you how often. Traces tell you where the request spent time and how work moved across services.

In practice, this is what makes bottlenecks and hidden dependencies visible.

Typical fields include:

Example:

{
  "name": "HTTP GET /users/:id",
  "duration_ms": 120,
  "trace_id": "3f7c1a...",
  "span_id": "a91bd2..."
}

A trace works only if every service forwards context:

Without propagation, a distributed request becomes fragmented telemetry.

Automatic instrumentation gives fast coverage for HTTP clients, DB calls, and frameworks. But it misses business intent.

The most valuable spans are often manual:

These spans connect system behavior to domain behavior.

For the next iteration, I want to add one manual business span per critical user flow and compare the trace readability before/after.