LiteLLM integration does not report cached, reasoning, or cache-write token usage

### How do you use Sentry?

Sentry Saas (sentry.io)

### Version

2.52.0

### Steps to Reproduce

1. Initialize Sentry with the LiteLLM integration and tracing enabled
2. Make a completion call through LiteLLM to a provider that supports prompt caching (e.g., OpenAI, Anthropic, etc.)
3. Inspect the resulting span data in Sentry's AI Agents dashboard

### Expected Result

The span should include all available token usage detail attributes, just like [the OpenAI and Anthropic integrations do](https://github.com/getsentry/sentry-python/blob/5a8f060965e8a9ecb63a51ddc072a92ec4e3922b/sentry_sdk/integrations/openai.py#L165-L178):
- `gen_ai.usage.input_tokens` (total input tokens)
- `gen_ai.usage.input_tokens.cached` (cached input tokens, subset of total)
- `gen_ai.usage.input_tokens.cache_write` (cache write tokens, if available)
- `gen_ai.usage.output_tokens` (total output tokens)
- `gen_ai.usage.output_tokens.reasoning` (reasoning tokens, subset of total)
- `gen_ai.usage.total_tokens`

This data is necessary for Sentry to correctly calculate model costs using the formula documented [here](https://docs.sentry.io/product/insights/ai/agents/costs/):
```
input cost = (input_tokens - cached_tokens) x input_rate + cached_tokens x cached_rate
```

Without cached/reasoning token breakdown, all tokens are charged at the full standard rate, producing inaccurate cost estimates.

### Actual Result

The LiteLLM integration's _success_callback only extracts [three basic fields](https://github.com/getsentry/sentry-python/blob/5a8f060965e8a9ecb63a51ddc072a92ec4e3922b/sentry_sdk/integrations/litellm.py#L225-L230):
```python
record_token_usage(
    span,
    input_tokens=getattr(usage, "prompt_tokens", None),
    output_tokens=getattr(usage, "completion_tokens", None),
    total_tokens=getattr(usage, "total_tokens", None),
)
```

The `input_tokens_cached`, `input_tokens_cache_write`, and `output_tokens_reasoning` parameters of `record_token_usage()` are never passed. Therefore, cost calculations in the AI Agents dashboard overestimate costs for cached-heavy workloads (all input tokens billed at the full rate) and misattribute output vs. reasoning token costs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteLLM integration does not report cached, reasoning, or cache-write token usage #5455

How do you use Sentry?

Version

Steps to Reproduce

Expected Result

Actual Result

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LiteLLM integration does not report cached, reasoning, or cache-write token usage #5455

Description

How do you use Sentry?

Version

Steps to Reproduce

Expected Result

Actual Result

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions