Skip to content

fix: restore per-cycle span duration for execute_event_loop_cycle#1939

Open
Di-Is wants to merge 3 commits intostrands-agents:mainfrom
Di-Is:fix/trace-span-base-exception-handling
Open

fix: restore per-cycle span duration for execute_event_loop_cycle#1939
Di-Is wants to merge 3 commits intostrands-agents:mainfrom
Di-Is:fix/trace-span-base-exception-handling

Conversation

@Di-Is
Copy link

@Di-Is Di-Is commented Mar 20, 2026

Description

Since v1.24.0 (PR #1293), execute_event_loop_cycle spans no longer reflect per-cycle duration. When a cycle performs tool use and recurses, the parent cycle's native OTel span stays open until all recursive children complete, producing cumulative bottom-up latency instead of per-step latency in observability backends (Langfuse, Jaeger, etc.).

The root cause: event_loop_cycle() is an async generator whose body was wrapped in use_span(end_on_exit=True). Because yield keeps the context manager open across recursive cycles, all span.end() calls fire simultaneously when the generator chain unwinds. The logical metadata (gen_ai.event.end_time attribute) is set at the correct time, but backends use the native OTel endTimeUnixNano from span.end().

Three commits, each reviewable independently:

  1. fix: restore explicit span.end() to fix span end_time regression
  • Core fix. Switches to end_on_exit=False and restores explicit span.end() calls in end_event_loop_cycle_span() and end_model_invoke_span(), with end_span_with_error() on exception paths.
  1. fix: handle BaseException in trace spans to prevent span leaks on KeyboardInterrupt
  • With end_on_exit=False, spans must be explicitly closed on all paths.
  • Adds except BaseException handlers and aclose() for async generators to cover KeyboardInterrupt and asyncio.CancelledError.
  1. perf: only force flush tracer provider when ending agent spans
  • Independent optimization. _end_span() previously called force_flush() on every span end; this limits it to agent span completion only.
  • Happy to split this into a separate PR if preferred.

Resolve #1930, #1938

Related Issues

Documentation PR

N/A

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Di-Is added 3 commits March 20, 2026 10:59
PR strands-agents#1293 wrapped event_loop_cycle() in use_span(end_on_exit=True) and removed
explicit span.end() calls. Because event_loop_cycle is an async generator,
yield keeps the context manager open across recursive cycles, causing all
execute_event_loop_cycle spans to share the same OTel end_time.

Switch to end_on_exit=False and explicitly call span.end() via _end_span()
in end_event_loop_cycle_span() and end_model_invoke_span(), restoring
end_span_with_error() in all exception paths.
…boardInterrupt

Trace spans were not properly closed when BaseException (e.g. KeyboardInterrupt,
asyncio.CancelledError) was raised. Add explicit BaseException handlers to close
spans and aclose() calls to ensure async generators are cleaned up.
Reduce overhead by limiting force_flush calls to agent span completion
instead of every span end. Add flush parameter to _end_span() with
default False, passing True only from end_agent_span().
@Di-Is Di-Is requested a deployment to manual-approval March 20, 2026 02:46 — with GitHub Actions Waiting
@Di-Is Di-Is requested a deployment to manual-approval March 20, 2026 02:46 — with GitHub Actions Waiting
@Di-Is Di-Is changed the title Fix/trace span base exception handling Fix: trace span base exception handling Mar 20, 2026
@Di-Is Di-Is changed the title Fix: trace span base exception handling fix: restore per-cycle span duration for execute_event_loop_cycle Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Regression: event_loop_cycle span duration becomes cumulative across recursive cycles

1 participant