OTel Phase 3 — Span instrumentation (session / llm / tool / scheduled)

> Part of the **Observability — OpenTelemetry Tracing v1** initiative (master tracking: #108). Effort: **M (5–7 engineer-days)**. Risk: **medium-high** (many call sites, semconv evolution, redaction guarantees, content gating). Depends on: **Phase 2** (#103).

## Goal

Emit a correct nested span tree from real call sites. Redaction on by default. GenAI conventions spec-correct so Datadog / New Relic / SigNoz auto-map.

## Files

| File | Change |
|---|---|
| forge-core runtime executor entry (agent loop `Execute`) | Start root `forge.session` span; end on return |
| `forge-core/runtime` `LLMExecutor` (LLM round-trip) | Child GenAI inference span (convention-named) via the semconv helper |
| `forge-core/runtime/genai_semconv.go` | **New.** Single helper that owns ALL `gen_ai.*` emission + the version switch |
| `forge-core/tools` registry `Execute` | Child `forge.tool_exec` span (generic tool name only) |
| `forge-core/scheduler` dispatch | Root `forge.scheduled_task` span per tick |
| forge-core egress decision + guardrail check sites | `span.AddEvent` on the active span (not child spans) |

## Span tree

```
forge.session                 (root; one per task / Execute)
 ├─ chat {model}              (GenAI inference span, CLIENT kind — per round-trip)
 ├─ forge.tool_exec           (per tool call)
 │    └─ event: egress.allowed / egress.blocked  (domain attr)
 │    └─ event: guardrail.block / guardrail.redact (rule attr)
 └─ ...
forge.scheduled_task          (root; scheduler dispatch — no inbound ctx)
```

## `forge.session` attributes

`agent.id`, `agent.version`, `forge.task_id`, `forge.correlation_id`, `forge.channel` (if any), `forge.session.state` (set at end). Use `trace.SpanFromContext` / `Tracer().Start(ctx, ...)` so children inherit via ctx — **thread the returned ctx through the loop**, do not start orphan spans.

## GenAI inference span — spec-correct `gen_ai.*` (Datadog / New Relic / SigNoz auto-map)

> **All `gen_ai.*` emission goes through `genai_semconv.go` — one place.** The conventions are still experimental and rename often (tokens went `prompt`/`completion` → `input`/`output`; `gen_ai.system` → `gen_ai.provider.name`). Centralizing means the next rename is a one-file change, not a call-site hunt. The helper reads `OTEL_SEMCONV_STABILITY_OPT_IN`: when it contains `gen_ai_latest_experimental`, emit the newest names; otherwise dual-emit (new + legacy) for older backends. This is the spec's own transition mechanism — follow it rather than picking one naming and freezing.

**Span name:** `chat {model}` (e.g. `chat claude-sonnet-4-6`); **span kind `CLIENT`**. This convention is what GenAI-aware backends key on for auto-recognition — do NOT name it `forge.llm_call`.

### Tier 1 — always-on metadata (non-sensitive; safe with `CaptureContent=false`)

- `gen_ai.operation.name` = `chat`
- `gen_ai.provider.name` (`anthropic` / `openai` / `gcp.vertex_ai` / `ollama` / …); **dual-emit `gen_ai.system`** unless opted into latest-only
- `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.response.id`, `gen_ai.response.finish_reasons`
- `gen_ai.request.temperature`, `gen_ai.request.top_p`, `gen_ai.request.max_tokens` (only when set)
- `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens` (legacy `prompt_tokens`/`completion_tokens` only under dual-emit)
- `error.type` + `span.RecordError(err)` + `span.SetStatus(codes.Error, …)` on failure
- Forge-namespaced extras (won't collide with future `gen_ai.*`): `forge.llm.fallback_used` (bool), `forge.llm.fallback_provider`

### Tier 2 — content (ONLY when `CaptureContent=true`; default off)

Emit as **span events**, not large string attributes — `gen_ai.system.message`, `gen_ai.user.message`, `gen_ai.assistant.message`, `gen_ai.choice`. Events (not attributes) is where the spec landed and matches our default-off posture; downstream PII scrubbing is expected to happen in the Collector, so never rely on content being safe.

### Do NOT emit

`gen_ai.usage.cost` — it needs price tables that go stale; the Platform/backend derives cost from token counts. **No pricing logic in forge-core.**

## `forge.tool_exec` attributes

`forge.tool.name` (the registry name, e.g. `cli_execute`, `http_request`, `tavily_research`), `forge.skill` (skill id if skill-backed), `forge.tool.success` (bool), duration via span timing. **Never** set the underlying binary name, the raw command, or tool args when `Redact == true` (default). On failure: `RecordError` + error status.

### Redaction rule

When `cfg.Redact` (default true), tool spans omit args/commands/binary names; egress events keep only the domain (already coarse). `CaptureContent` is the only switch that can add prompt/response text, and it's independent of `Redact`.

## Verify

```bash
go build ./... && go test ./forge-core/...
go test ./forge-core/runtime/ -run GenAISemconv -v
# token rename + provider.name dual-emit covered

# With Jaeger from Phase 2 running:
FORGE_TRACING_ENABLED=true OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  forge run --port 8098 &
# send a task that triggers an LLM call + a tool call, then open http://localhost:16686
# Confirm:
#   - forge.session root with a "chat {model}" CLIENT span + forge.tool_exec children, correctly nested
#   - gen_ai.provider.name + gen_ai.usage.input_tokens/output_tokens present
#   - NO binary name / NO content on any span

# Latest-only mode:
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental ... forge run ...
# confirm legacy names dropped
```

## Anti-patterns to avoid

- Naming the inference span `forge.llm_call` instead of the `chat {model}` convention (breaks backend auto-mapping).
- Scattering `gen_ai.*` across call sites instead of `genai_semconv.go`.
- Freezing one naming version instead of honoring `OTEL_SEMCONV_STABILITY_OPT_IN`.
- Content as big attributes instead of events.
- Computing `gen_ai.usage.cost` in core.
- Flat sibling spans (broken nesting from not threading ctx).
- Binary names / commands in attributes (capability-enumeration guardrail violation).
- Content with `CaptureContent=false`.
- Starting spans in hooks (use call sites — hooks stay for audit only).

## Cross-reference with FWS-3 (#87)

FWS-3 captures token usage / duration / model / provider at the LLM call site for the **audit** path. Phase 3 of this issue is the OTel span side of that same capture point. They share the call site by design — when an LLM call completes, one piece of code captures token counts and duration once, then writes to audit AND sets span attributes (no-op when tracing disabled). See FWS-3's "Relationship with OTel tracing" section.


File	Change
forge-core runtime executor entry (agent loop `Execute`)	Start root `forge.session` span; end on return
`forge-core/runtime` `LLMExecutor` (LLM round-trip)	Child GenAI inference span (convention-named) via the semconv helper
`forge-core/runtime/genai_semconv.go`	New. Single helper that owns ALL `gen_ai.*` emission + the version switch
`forge-core/tools` registry `Execute`	Child `forge.tool_exec` span (generic tool name only)
`forge-core/scheduler` dispatch	Root `forge.scheduled_task` span per tick
forge-core egress decision + guardrail check sites	`span.AddEvent` on the active span (not child spans)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTel Phase 3 — Span instrumentation (session / llm / tool / scheduled) #104

Goal

Files

Span tree

`forge.session` attributes

GenAI inference span — spec-correct `gen_ai.*` (Datadog / New Relic / SigNoz auto-map)

Tier 1 — always-on metadata (non-sensitive; safe with `CaptureContent=false`)

Tier 2 — content (ONLY when `CaptureContent=true`; default off)

Do NOT emit

`forge.tool_exec` attributes

Redaction rule

Verify

Anti-patterns to avoid

Cross-reference with FWS-3 (#87)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OTel Phase 3 — Span instrumentation (session / llm / tool / scheduled) #104

Description

Goal

Files

Span tree

forge.session attributes

GenAI inference span — spec-correct gen_ai.* (Datadog / New Relic / SigNoz auto-map)

Tier 1 — always-on metadata (non-sensitive; safe with CaptureContent=false)

Tier 2 — content (ONLY when CaptureContent=true; default off)

Do NOT emit

forge.tool_exec attributes

Redaction rule

Verify

Anti-patterns to avoid

Cross-reference with FWS-3 (#87)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`forge.session` attributes

GenAI inference span — spec-correct `gen_ai.*` (Datadog / New Relic / SigNoz auto-map)

Tier 1 — always-on metadata (non-sensitive; safe with `CaptureContent=false`)

Tier 2 — content (ONLY when `CaptureContent=true`; default off)

`forge.tool_exec` attributes