Skip to content

decide StreamStopped ordering vs session-end hooks and document teardown trade-offs from #3070 #3074

@aheritier

Description

@aheritier

Summary

Follow-up to #3070 (fix for #3069). The fix is correct, but it made two deliberate behavior trade-offs around StreamStopped that deserve an explicit decision and documentation:

  1. Ordering change: StreamStopped is now emitted before session-end hooks run. Before fix: guard elicitation channel close against in-flight sends #3070 it was emitted after executeSessionEndHooks (and before executeOnUserInputHooks/telemetry).
  2. Delivery change: StreamStopped is now emitted non-blockingly — it is dropped when the events buffer is full and the consumer has gone away (previously a blocking send: guaranteed delivery, but it could deadlock teardown — the very issue fix: guard elicitation channel close against in-flight sends #3070 fixed).

1. Ordering: StreamStopped before session-end hooks

Current order in finalizeEventChannel (pkg/runtime/loop.go):

nonBlocking(&channelSink{ch: events}).Emit(StreamStopped(...))  // 1. emit
r.executeSessionEndHooks(context.WithoutCancel(ctx), sess, a)   // 2. hooks
r.executeOnUserInputHooks(ctx, sess.ID, "stream stopped")       // 3. hooks
r.telemetry.RecordSessionEnd(ctx)                                // 4. telemetry
r.elicitation.restoreAndClose(events, prevElicitationCh)         // 5. close (last)

Consumers (TUI, A2A forwarders, persistence observers) now observe StreamStopped while session-end hooks are still executing. If any consumer treats the event (rather than the channel close) as "all cleanup done", that is a semantic shift. The channel close — which terminates for range — still happens last and remains the authoritative terminal signal.

Option A (recommended): move the emit to just before restoreAndClose (after hooks/telemetry). One-line move, restores the pre-#3070 semantics, keeps close-last:

r.executeSessionEndHooks(...)
r.executeOnUserInputHooks(...)
r.telemetry.RecordSessionEnd(ctx)
nonBlocking(&channelSink{ch: events}).Emit(StreamStopped(...))
r.elicitation.restoreAndClose(events, prevElicitationCh)

Option B: keep the current order and document that the channel close, not the event, is the terminal signal.

Either way, audit consumers of StreamStoppedEvent for assumptions about hook completion.

2. Documentation: accepted trade-offs in restoreAndClose

Two deliberate behaviors from #3070 are currently undocumented and should be captured in doc comments so future maintainers don't "fix" them:

  • restoreAndClose waits for a parked in-flight elicitation send (the sender holds the bridge RLock). If the consumer is truly gone and both buffers are full, teardown blocks instead of panicking — a goroutine leak is the accepted alternative to crashing the process. Document this in the restoreAndClose comment.
  • StreamStopped is best-effort under buffer-full/no-consumer conditions: it is dropped rather than deadlocking teardown. Document this in the finalizeEventChannel comment (and in StreamStoppedEvent's doc if consumers are expected to rely on it).

Scope

  • One-line emit reorder (if Option A) in pkg/runtime/loop.go
  • Doc comments in pkg/runtime/elicitation.go and pkg/runtime/loop.go
  • Optional: a test asserting StreamStopped is the last event delivered before close

Relations

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentFor work that has to do with the general agent loop/agentic features of the apparea/sessionsFor features/issues/fixes related to session lifecycle (resume, persistence, export)

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions