Skip to content

[python] update anthropic llmobs tests for new cache ttl metrics#6480

Draft
Yun-Kim wants to merge 5 commits intomainfrom
yunkim/llmobs-anthropic-ttl-cache-metrics
Draft

[python] update anthropic llmobs tests for new cache ttl metrics#6480
Yun-Kim wants to merge 5 commits intomainfrom
yunkim/llmobs-anthropic-ttl-cache-metrics

Conversation

@Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Mar 12, 2026

Motivation

Account for additional cache creation ttl breakdown metrics in the anthropic integration.

Changes

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

@Yun-Kim Yun-Kim requested a review from a team as a code owner March 12, 2026 14:22
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/integration_frameworks/llm/anthropic/test_anthropic_llmobs.py     @DataDog/ml-observability

Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would:

  1. update the manifest to say these tests are a missing_feature, so that we can land these test changes.
  2. update dd-trace-py to use the most recent system-test hash (if that's still the process) - then your PR should pass and these missing_feature will XPASS
  3. Then, once the feature is released, we can add the version to the manifest for these tests.

@Yun-Kim Yun-Kim requested review from a team as code owners March 12, 2026 19:47
@Yun-Kim Yun-Kim requested review from gnufede and quinna-h and removed request for a team March 12, 2026 19:47
@datadog-prod-us1-4
Copy link

datadog-prod-us1-4 bot commented Mar 12, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 13 Tests failed

tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-py@0.75.0] from system_tests_suite (Datadog) (Fix with Cursor)
assert {'_dd': {'apm...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b60000000005c7a15bfecf73db9',
  +          'span_id': '16570317305115008918',
...
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[True, anthropic-py@0.75.0] from system_tests_suite (Datadog) (Fix with Cursor)
assert {'_dd': {'apm...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b5c00000000f66340172acbe206',
  +          'span_id': '7918638595297211569',
...
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create[False, anthropic-py@0.75.0] from system_tests_suite (Datadog) (Fix with Cursor)
AssertionError: assert {'_dd': {'apm...11, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 14, 'output_tokens': 11, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b5500000000b067124339dc00f2',
  +          'span_id': '1412345837989640277',
...
View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 0bb83ed | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@Yun-Kim Yun-Kim marked this pull request as draft March 12, 2026 21:36
@Yun-Kim Yun-Kim marked this pull request as ready for review March 18, 2026 20:18
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bb83ed092

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

write_span_event, read_span_event = span_events

assert write_span_event["metrics"]["cache_write_input_tokens"] == 6163
assert write_span_event["metrics"]["ephemeral_5m_input_tokens"] == 6163

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate prompt-caching TTL assertion to Python

test_create_prompt_caching is shared across libraries, but this new assertion is unconditional. manifests/nodejs.yml:1590-1599 still enables TestAnthropicLlmObsMessages for Node.js and explicitly marks the other Anthropic methods as missing_feature because the TTL breakdown metrics are not released there yet; this prompt-caching test is not skipped, so Node.js runs will now fail with a missing ephemeral_5m_input_tokens metric even though only the Python manifest was updated.

Useful? React with 👍 / 👎.

@Yun-Kim Yun-Kim marked this pull request as draft March 18, 2026 20:35
@Yun-Kim
Copy link
Contributor Author

Yun-Kim commented Mar 18, 2026

Keeping as draft until Python tracer containing fix is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants