Revamp runtime OpenTelemetry tracing: attempt-based spans with early publishing

## Summary

Our current tracing experience has several issues that make it unsuitable for debugging live systems. As more customers request end-to-end tracing, we need to revisit how the runtime publishes OpenTelemetry spans.

## Current Problems

### 1. Spans per journal entry are confusing
- For `ctx.run`, the span duration includes internal retry timer backoffs and is only published when completed
- If an invocation is stuck in a retry loop, the span looks fine — you can't tell it's retrying
- Runtime-generated spans for `ctx.run` are **not correlatable** with user-created spans in the SDK

### 2. Parent span published too late
- Until the entire invocation completes, OTel traces are not correctly shown/correlated because the parent span is published at the end
- This makes tracing useless for debugging things that are actively on fire
- The problem compounds with e2e tracing — more info is uncorrelated

### Root Cause
The current tracing was designed **pre-UI** and optimized for post-hoc viewing (a "logical view" of the invocation). But the OpenTelemetry spec doesn't support giving a logical view while the invocation is still running.

## Proposal

Shift philosophy to **representing what's physically happening, while it's happening**.

1. **Focus on invocation attempts**, not the whole invocation
2. **Publish spans as soon as an invocation attempt ends** — this helps correlation and allows introspection of in-progress invocations
3. **Correlate all invocation attempts under a parent span** (e.g. a "started" span)
4. **Use events instead of spans for journal entries** inside the invocation attempt span (e.g. `ctx.run` becomes an event, not a child span)
5. **Still generate service-to-service spans** as children (or linked — TBD) to enable navigating between service calls

## Expected Outcome

- Users get a **physical view** of what happened during the invocation via OTel traces
- The **logical view** remains the Restate UI
- Users can search spans by invocation ID (navigate between Restate UI ↔ Jaeger and vice versa)

### Example

An invocation retried 3 times with a 3-second retry interval, doing a `ctx.run` then a call, would show:
- A parent span covering the full invocation
- 3 child attempt spans (one per attempt), each published as soon as the attempt ends
- Events within each attempt span for journal entries
- A service-to-service child/linked span for the outgoing call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp runtime OpenTelemetry tracing: attempt-based spans with early publishing #4530

Summary

Current Problems

1. Spans per journal entry are confusing

2. Parent span published too late

Root Cause

Proposal

Expected Outcome

Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revamp runtime OpenTelemetry tracing: attempt-based spans with early publishing #4530

Description

Summary

Current Problems

1. Spans per journal entry are confusing

2. Parent span published too late

Root Cause

Proposal

Expected Outcome

Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions