Skip to content

fix: handle missing usage metadata on premature Anthropic stream termination#1898

Open
giulio-leone wants to merge 2 commits intostrands-agents:mainfrom
giulio-leone:fix/anthropic-stream-usage-crash
Open

fix: handle missing usage metadata on premature Anthropic stream termination#1898
giulio-leone wants to merge 2 commits intostrands-agents:mainfrom
giulio-leone:fix/anthropic-stream-usage-crash

Conversation

@giulio-leone
Copy link
Contributor

@giulio-leone giulio-leone commented Mar 13, 2026

Problem

When the Anthropic API stream terminates before sending the message_stop event (e.g., network timeout, connection reset, server error mid-stream), AnthropicModel.stream() crashes with AttributeError.

Crash site:

usage = event.message.usage  # type: ignore
yield self.format_chunk({"type": "metadata", "usage": usage.model_dump()})

Two failure modes:

  1. Empty stream: the event variable is never assigned by the async for loop → UnboundLocalError
  2. Premature termination: the last event lacks a .message attribute or .message.usage is NoneAttributeError

Root Cause

The code unconditionally accesses event.message.usage after the async iteration loop, assuming the Anthropic stream always completes normally with usage metadata. That assumption fails when the connection drops before the final usage-bearing event arrives.

Fix

  1. Initialize event = None before the loop
  2. Keep the safe post-loop access pattern, but fail clearly instead of silently synthesizing zero-usage metadata
  3. Raise explicit RuntimeErrors for:
    • empty stream: Anthropic stream terminated before receiving any events
    • incomplete stream: Anthropic stream ended without usage metadata
  4. Split the stray rate-limit assertions back into their own regression test

This keeps normal successful streams unchanged, but ensures interrupted requests fail clearly instead of masking a broken connection.

Tests

Added / updated regression coverage for:

  • test_stream_premature_termination
  • test_stream_empty_no_events
  • test_stream_rate_limit_error

Local validation:

  • ruff check src/strands/models/anthropic.py tests/strands/models/test_anthropic.py
  • ruff format --check src/strands/models/anthropic.py tests/strands/models/test_anthropic.py
  • pytest -q tests/strands/models/test_anthropic.py44 passed
  • runtime proof script covering:
    • normal completion still yielding metadata
    • premature termination raising a clear error
    • empty stream raising a clear error

Fixes #1868

@giulio-leone
Copy link
Contributor Author

Friendly ping — handles missing usage metadata on premature Anthropic stream termination, preventing AttributeError crashes when the stream is interrupted.

@mkmeral mkmeral self-assigned this Mar 17, 2026
@mkmeral
Copy link
Contributor

mkmeral commented Mar 17, 2026

/strands review and test if with this pr the request is successful. I do want requests to fail, if the connection to model is not properly handled/cut

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

"type": "metadata",
"usage": {"input_tokens": 0, "output_tokens": 0},
}
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The fix silently continues with synthetic zero-usage metadata when the stream terminates prematurely, which may mask connection failures.

Suggestion: Consider adding an optional behavior where users can choose to have connection issues raise an exception instead of silently continuing. This could be:

  1. A configuration option on AnthropicModel (e.g., strict_stream_completion=True)
  2. Raising a new StreamIncompleteException for the empty stream case specifically

The current graceful degradation is reasonable for partial responses, but for empty streams (no events at all), this likely indicates a severe connection issue that should probably raise an exception rather than return fake zero-usage metadata.

At minimum, consider differentiating between:

  • Premature termination (some events received): graceful degradation is reasonable
  • Empty stream (zero events): this is likely a connection failure and should raise an exception

Example approach:

if event is None:
    raise StreamIncompleteException("stream terminated without receiving any events")

This addresses the concern raised by @mkmeral about wanting requests to fail when connections are not properly handled.


# Should still yield a metadata event with zero usage
assert any("metadata" in str(e) for e in tru_events)
anthropic_client.messages.stream.side_effect = anthropic.RateLimitError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This test contains extraneous code (lines 791-797) that appears to be left over from a copy-paste error. The rate limit error test code runs within test_stream_empty_no_events but has nothing to do with testing empty streams.

Suggestion: Remove lines 791-797 from this test. The rate limit error scenario is already tested in a separate test function (or should be). The test should end at line 790 after verifying the metadata event is present.

@pytest.mark.asyncio
async def test_stream_empty_no_events(anthropic_client, model, agenerator, alist):
    """Test that stream handles an empty event sequence without crashing."""
    mock_context = unittest.mock.AsyncMock()
    mock_context.__aenter__.return_value = agenerator([])
    anthropic_client.messages.stream.return_value = mock_context

    messages = [{"role": "user", "content": [{"text": "hello"}]}]
    response = model.stream(messages, None, None)

    # Should not raise UnboundLocalError or AttributeError
    tru_events = await alist(response)

    # Should still yield a metadata event with zero usage
    assert any("metadata" in str(e) for e in tru_events)
    # END OF TEST - remove everything below this line

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: Request Changes

The fix correctly addresses the crash on premature stream termination, but there are two issues that should be addressed before merging.

Review Categories
  • Test Quality: The test_stream_empty_no_events test contains extraneous rate limit test code (lines 791-797) that should be removed or moved to a separate test function
  • Design Consideration: The fix silently continues with zero-usage metadata for empty streams, which may mask connection failures. Consider raising an exception for the empty stream case (event is None) since this likely indicates a severe connection issue rather than graceful premature termination

The core fix using getattr chain for safe attribute access is sound. Once the test cleanup is addressed, this will be ready to merge.

@giulio-leone
Copy link
Contributor Author

Addressed the requested changes on this branch:

  • empty stream now raises a clear RuntimeError instead of returning synthetic zero-usage metadata
  • incomplete stream without final usage now also fails clearly
  • the stray rate-limit assertions were split back into their own test

Local validation on the refreshed head:

  • ruff check
  • ruff format --check
  • pytest -q tests/strands/models/test_anthropic.py (44 passed)
  • runtime proof covering normal completion + both incomplete-stream failure modes

@giulio-leone
Copy link
Contributor Author

Leaving a focused follow-up while the fresh review jobs are still pending.

The visible bot review comments are tied to the older head ba341eb57734f2e2e4bf01cf182e860af53bb128. The latest head 503483038ae5cbfb6eaaed72b3ca096a3340c6f6 addresses both of those points:

  • the stray rate-limit assertions were split back out into their own test_stream_rate_limit_error
  • empty/incomplete Anthropic streams now fail clearly with RuntimeError instead of synthesizing zero-usage metadata

Local validation on the updated head is still clean:

  • ruff check
  • ruff format --check
  • pytest -q tests/strands/models/test_anthropic.py -> 44 passed
  • runtime proof script -> PASS

So if the old bot review state remains visible, it should now be read against the superseded commit rather than the current branch contents.

@giulio-leone giulio-leone force-pushed the fix/anthropic-stream-usage-crash branch from 5034830 to b2e2f9a Compare March 19, 2026 06:07
@github-actions github-actions bot added size/s and removed size/s labels Mar 19, 2026
@giulio-leone
Copy link
Contributor Author

Rebased this branch onto the latest main and re-ran the focused local verification for the incomplete-stream failure mode.

Local verification

  • tests/strands/models/test_anthropic.py::test_stream_premature_termination passes
  • tests/strands/models/test_anthropic.py::test_stream_empty_no_events passes
  • Runtime proof against the actual AnthropicModel.stream flow shows:
    • origin/main still crashes with AttributeError: Mock object has no attribute 'message' on premature termination
    • this branch now raises the intended clear RuntimeError("Anthropic stream ended without usage metadata")
    • this branch also raises the intended clear RuntimeError("Anthropic stream terminated before receiving any events") for empty streams

From the contributor side, this branch should now be in a merge-ready state again.

giulio-leone and others added 2 commits March 21, 2026 03:17
…ination

When the Anthropic API stream terminates before sending the
message_stop event (e.g. network timeout, connection reset), the code
crashes with AttributeError because event.message.usage is None.

The stream() method unconditionally accessed event.message.usage after
the async iteration loop, assuming a complete stream. Two failure modes:
1. Empty stream: 'event' variable is never assigned (UnboundLocalError)
2. Premature termination: event.message or event.message.usage is None

Fix: Initialize event=None before the loop, use safe attribute access
via getattr() chain, and emit zero-usage metadata with a warning log
when usage data is unavailable.

Added two regression tests:
- test_stream_premature_termination: stream ends without message.usage
- test_stream_empty_no_events: completely empty stream

Fixes strands-agents#1868
Raise clear runtime errors when the Anthropic stream ends before any events arrive or before final usage metadata is available, and split the stray rate-limit assertions back into their own test.

Validated with ruff check, ruff format --check, the full anthropic model test file, and a runtime proof script covering normal completion plus both incomplete-stream failure modes.
@giulio-leone giulio-leone force-pushed the fix/anthropic-stream-usage-crash branch from b2e2f9a to 72a0be9 Compare March 21, 2026 02:21
@github-actions github-actions bot added size/s and removed size/s labels Mar 21, 2026
@giulio-leone
Copy link
Contributor Author

Rebased onto the latest main and re-ran focused validation from refreshed head 72a0be91.

Two consecutive clean passes (no code changes between passes):

  • uv run hatch run test-format
  • uv run hatch run test-lint
  • uv run hatch run test -- tests/strands/models/test_anthropic.py -q
  • result: 44 passed on both passes
  • mypy ./src: Success: no issues found in 140 source files on both passes

Direct runtime proof against the actual AnthropicModel.stream() flow:

  • normal stream still succeeds and emits final metadata on both branch and origin/main
  • premature termination (last event has no .message):
    • origin/main still raises AttributeError: Mock object has no attribute message
    • this branch raises the intended clear RuntimeError("Anthropic stream ended without usage metadata")
  • empty stream:
    • origin/main still raises UnboundLocalError because event was never assigned
    • this branch raises the intended clear RuntimeError("Anthropic stream terminated before receiving any events")

So this PR keeps successful requests unchanged, but when the connection is interrupted it now fails clearly and explicitly instead of crashing with an internal attribute/local-variable error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] AttributeError on event.message.usage when Anthropic stream terminates before message_stop

2 participants