-
Notifications
You must be signed in to change notification settings - Fork 583
Description
How do you use Sentry?
Sentry Saas (sentry.io)
Version
2.52.0
Steps to Reproduce
- Initialize Sentry with the LiteLLM integration and tracing enabled
- Make a completion call through LiteLLM to a provider that supports prompt caching (e.g., OpenAI, Anthropic, etc.)
- Inspect the resulting span data in Sentry's AI Agents dashboard
Expected Result
The span should include all available token usage detail attributes, just like the OpenAI and Anthropic integrations do:
gen_ai.usage.input_tokens(total input tokens)gen_ai.usage.input_tokens.cached(cached input tokens, subset of total)gen_ai.usage.input_tokens.cache_write(cache write tokens, if available)gen_ai.usage.output_tokens(total output tokens)gen_ai.usage.output_tokens.reasoning(reasoning tokens, subset of total)gen_ai.usage.total_tokens
This data is necessary for Sentry to correctly calculate model costs using the formula documented here:
input cost = (input_tokens - cached_tokens) x input_rate + cached_tokens x cached_rate
Without cached/reasoning token breakdown, all tokens are charged at the full standard rate, producing inaccurate cost estimates.
Actual Result
The LiteLLM integration's _success_callback only extracts three basic fields:
record_token_usage(
span,
input_tokens=getattr(usage, "prompt_tokens", None),
output_tokens=getattr(usage, "completion_tokens", None),
total_tokens=getattr(usage, "total_tokens", None),
)The input_tokens_cached, input_tokens_cache_write, and output_tokens_reasoning parameters of record_token_usage() are never passed. Therefore, cost calculations in the AI Agents dashboard overestimate costs for cached-heavy workloads (all input tokens billed at the full rate) and misattribute output vs. reasoning token costs.
Metadata
Metadata
Assignees
Projects
Status