Skip to content

events channel leaked and elicitation bridge left stale on early returns in runStreamLoop #3073

@aheritier

Description

@aheritier

Summary

runStreamLoop swaps this stream's events channel into the elicitation bridge before the deferred finalizeEventChannel cleanup is registered. Two early-return paths sit in that gap. On those paths the stream exits without ever restoring the bridge or closing the events channel: the observe forwarder goroutine never terminates, the external consumer waits forever on a channel that never closes, and the elicitation bridge is left pointing at a dead stream's channel.

This is pre-existing (it predates #3069 / #3070) but belongs to the same elicitation-channel lifecycle family. Found while reviewing #3070.

The gap

In pkg/runtime/loop.go, runStreamLoop:

prevElicitationCh := r.elicitation.swap(events)   // bridge now points at this stream

...

agentTools, err := r.getTools(ctx, a, sessionSpan, sink, true)
if err != nil {
    sink.Emit(ErrorWithCode(...))
    return                                        // ← early return #1: no finalize
}

...

if lastMsg.Role == chat.MessageRoleUser {
    stop, msg, ctxMsgs := r.executeUserPromptSubmitHooks(...)
    if stop {
        ...
        r.emitHookDrivenShutdown(ctx, a, sess, msg, sink)
        return                                    // ← early return #2: no finalize
    }
    ...
}

...

var streamReason string
defer func() {                                    // ← finalize only registered HERE
    r.finalizeEventChannel(ctx, sess, streamReason, prevElicitationCh, events)
}()

Consequences on the early-return paths

  1. events is never closed → observe's forwarder goroutine (for event := range inner) never exits → its out channel is never closed → any consumer doing for range RunStream(...) hangs forever. Run() is such a consumer.
  2. The elicitation bridge keeps pointing at the dead stream's channel. A later elicitation send targets a channel nobody drains: with the buffer full it parks forever holding the bridge RLock, which then wedges the next stream's teardown (restoreAndClose blocks on the write lock).
  3. For nested streams, the parent's elicitation channel is never restored.

Proposed fix

Register the cleanup defer immediately after the swap, so every exit path finalizes:

prevElicitationCh := r.elicitation.swap(events)

var streamReason string
defer func() {
    r.finalizeEventChannel(ctx, sess, streamReason, prevElicitationCh, events)
}()

finalizeEventChannel is already safe for these early paths (it emits StreamStopped non-blockingly and closes last, since #3070). The streamReason declaration just moves up with the defer.

Add a regression test: a RunStream whose getTools fails must still close its events channel (consumer's for range terminates) and restore the previous elicitation channel.

Relations

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentFor work that has to do with the general agent loop/agentic features of the app

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions