[python] update anthropic llmobs tests for new cache ttl metrics by Yun-Kim · Pull Request #6480 · DataDog/system-tests

Yun-Kim · 2026-03-12T14:22:23Z

Motivation

Account for additional cache creation ttl breakdown metrics in the anthropic integration.

Changes

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

github-actions · 2026-03-12T14:23:15Z

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/integration_frameworks/llm/anthropic/test_anthropic_llmobs.py     @DataDog/ml-observability

sabrenner

i would:

update the manifest to say these tests are a missing_feature, so that we can land these test changes.
update dd-trace-py to use the most recent system-test hash (if that's still the process) - then your PR should pass and these missing_feature will XPASS
Then, once the feature is released, we can add the version to the manifest for these tests.

…ic-ttl-cache-metrics

datadog-prod-us1-4 · 2026-03-12T19:54:54Z

⚠️ Tests

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 13 Tests failed

tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-py@0.75.0]

from system_tests_suite (Datadog) (Fix with Cursor)

assert {'_dd': {'apm...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b60000000005c7a15bfecf73db9',
  +          'span_id': '16570317305115008918',
...

tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[True, anthropic-py@0.75.0]

from system_tests_suite (Datadog) (Fix with Cursor)

assert {'_dd': {'apm...00, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 28, 'output_tokens': 100, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b5c00000000f66340172acbe206',
  +          'span_id': '7918638595297211569',
...

tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create[False, anthropic-py@0.75.0]

from system_tests_suite (Datadog) (Fix with Cursor)

AssertionError: assert {'_dd': {'apm...11, ...}, ...} == {'_dd': <ANY>...Y>, ...}, ...}
  Omitting 9 identical items, use -vv to show
  Differing items:
  {'metrics': {'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'input_tokens': 14, 'output_tokens': 11, ...}} != {'metrics': {'cache_read_input_tokens': <ANY>, 'cache_write_input_tokens': <ANY>, 'ephemeral_1h_input_tokens': <ANY>, 'ephemeral_5m_input_tokens': <ANY>, ...}}
  Full diff:
    {
  -  '_dd': <ANY>,
  -  'duration': <ANY>,
  +  '_dd': {'apm_trace_id': '69bb0b5500000000b067124339dc00f2',
  +          'span_id': '1412345837989640277',
...

View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 0bb83ed | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

manifests/python.yml

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bb83ed092

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T20:22:45Z

tests/integration_frameworks/llm/anthropic/test_anthropic_llmobs.py

        write_span_event, read_span_event = span_events

        assert write_span_event["metrics"]["cache_write_input_tokens"] == 6163
+        assert write_span_event["metrics"]["ephemeral_5m_input_tokens"] == 6163


Gate prompt-caching TTL assertion to Python

test_create_prompt_caching is shared across libraries, but this new assertion is unconditional. manifests/nodejs.yml:1590-1599 still enables TestAnthropicLlmObsMessages for Node.js and explicitly marks the other Anthropic methods as missing_feature because the TTL breakdown metrics are not released there yet; this prompt-caching test is not skipped, so Node.js runs will now fail with a missing ephemeral_5m_input_tokens metric even though only the Python manifest was updated.

Useful? React with 👍 / 👎.

Yun-Kim · 2026-03-18T20:35:21Z

Keeping as draft until Python tracer containing fix is released.

Update anthropic tests only for python new cache metrics

5c5ca69

Yun-Kim requested a review from a team as a code owner March 12, 2026 14:22

sabrenner reviewed Mar 12, 2026

View reviewed changes

cbeauchesne approved these changes Mar 12, 2026

View reviewed changes

Yun-Kim added 2 commits March 12, 2026 15:41

Merge remote-tracking branch 'origin/main' into yunkim/llmobs-anthrop…

dd9afe6

…ic-ttl-cache-metrics

Revert missing_feature for python

a33f54b

Yun-Kim requested review from a team as code owners March 12, 2026 19:47

Yun-Kim requested review from gnufede and quinna-h and removed request for a team March 12, 2026 19:47

gnufede reviewed Mar 12, 2026

View reviewed changes

manifests/python.yml Show resolved Hide resolved

Yun-Kim marked this pull request as draft March 12, 2026 21:36

Merge branch 'main' into yunkim/llmobs-anthropic-ttl-cache-metrics

f557ccf

Yun-Kim marked this pull request as ready for review March 18, 2026 20:18

Merge branch 'main' into yunkim/llmobs-anthropic-ttl-cache-metrics

0bb83ed

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

Yun-Kim marked this pull request as draft March 18, 2026 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] update anthropic llmobs tests for new cache ttl metrics#6480

[python] update anthropic llmobs tests for new cache ttl metrics#6480
Yun-Kim wants to merge 5 commits intomainfrom
yunkim/llmobs-anthropic-ttl-cache-metrics

Yun-Kim commented Mar 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

sabrenner left a comment

Uh oh!

datadog-prod-us1-4 bot commented Mar 12, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

Yun-Kim commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Yun-Kim commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Workflow

Reviewer checklist

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

datadog-prod-us1-4 bot commented Mar 12, 2026 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Yun-Kim commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Yun-Kim commented Mar 12, 2026 •

edited

Loading

github-actions bot commented Mar 12, 2026 •

edited

Loading

datadog-prod-us1-4 bot commented Mar 12, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading