feat(llma): pass raw provider usage metadata for backend cost calculations #411

richardsolomou · 2026-01-27T11:56:35Z

Add raw_usage field to TokenUsage type to capture raw provider usage metadata (OpenAI, Anthropic, Gemini). This enables the backend to extract modality-specific token counts (text vs image vs audio) for accurate cost calculations.

Add raw_usage field to TokenUsage TypedDict
Update all provider converters to capture raw usage:
- OpenAI: capture response.usage and chunk usage
- Anthropic: capture usage from message_start and message_delta events
- Gemini: capture usage_metadata from responses and chunks
Pass raw usage as $ai_usage property in PostHog events
Update merge_usage_stats to handle raw_usage in both modes
Add tests verifying $ai_usage is captured for all providers

Backend will extract provider-specific details and delete $ai_usage after processing to avoid bloating properties.

Add raw_usage field to TokenUsage type to capture raw provider usage metadata (OpenAI, Anthropic, Gemini). This enables the backend to extract modality-specific token counts (text vs image vs audio) for accurate cost calculations. - Add raw_usage field to TokenUsage TypedDict - Update all provider converters to capture raw usage: - OpenAI: capture response.usage and chunk usage - Anthropic: capture usage from message_start and message_delta events - Gemini: capture usage_metadata from responses and chunks - Pass raw usage as $ai_usage property in PostHog events - Update merge_usage_stats to handle raw_usage in both modes - Add tests verifying $ai_usage is captured for all providers Backend will extract provider-specific details and delete $ai_usage after processing to avoid bloating properties. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

andrewm4894

just double checking one or two things

posthog/ai/anthropic/anthropic_converter.py

posthog/ai/utils.py

posthog/test/ai/openai/test_openai.py

@andrewm4894

Address PR review feedback from @andrewm4894: 1. **Serialization**: Add serialize_raw_usage() helper with fallback chain: - .model_dump() for Pydantic models (OpenAI/Anthropic) - .to_dict() for protobuf-like objects - vars() for simple objects - str() as last resort This ensures we never pass unserializable objects to PostHog client. 2. **Data loss prevention**: Change from replacing to merging raw_usage in incremental mode. For Anthropic streaming, message_start has input token details and message_delta has output token details - merging preserves both instead of losing input data. 3. **Test coverage**: Enhanced tests to verify: - JSON serializability with json.dumps() - Expected structure of raw_usage dicts - Coverage for both non-streaming and streaming modes - Fixed Gemini test mocks to return proper dicts from model_dump() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

@andrewm4894

Address PR feedback from @andrewm4894 - serialize in converters, not utils. **Problem:** Utils was receiving raw Pydantic/protobuf objects and serializing them, which meant provider-specific knowledge leaked into generic code. **Solution:** Move serialization into converters where provider context exists: Converters (NEW): - OpenAI: serialize_raw_usage(response.usage) → dict - Anthropic: serialize_raw_usage(event.usage) → dict - Gemini: serialize_raw_usage(metadata) → dict Utils (SIMPLIFIED): - Just passes dicts through, no serialization needed - Merge operations work with dicts only **Benefits:** 1. Type correctness: raw_usage is always Dict[str, Any] 2. Separation of concerns: converters handle provider formats 3. Fail fast: serialization errors in converters with context 4. Cleaner abstraction: utils doesn't know about Pydantic/protobuf **Flow:** Provider object → Converter serializes → dict → Utils → PostHog Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix mypy error: "Need type annotation for 'current_raw'" Extract value first, then apply explicit type annotation with ternary conditional to satisfy mypy's type checker. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

richardsolomou requested a review from a team January 27, 2026 11:56

andrewm4894 self-assigned this Jan 27, 2026

andrewm4894 added the team/llm-analytics label Jan 27, 2026

andrewm4894 reviewed Jan 27, 2026

View reviewed changes

posthog/ai/anthropic/anthropic_converter.py Outdated Show resolved Hide resolved

posthog/ai/utils.py Outdated Show resolved Hide resolved

posthog/test/ai/openai/test_openai.py Show resolved Hide resolved

richardsolomou requested a review from a team January 27, 2026 16:47

richardsolomou requested review from andrewm4894 and removed request for a team January 27, 2026 17:08

richardsolomou and others added 2 commits January 27, 2026 19:28

fix: add type annotation for current_raw to satisfy mypy

e880de3

Fix mypy error: "Need type annotation for 'current_raw'" Extract value first, then apply explicit type annotation with ternary conditional to satisfy mypy's type checker. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

andrewm4894 approved these changes Jan 28, 2026

View reviewed changes

richardsolomou merged commit c32c783 into master Jan 28, 2026
31 of 32 checks passed

richardsolomou deleted the richardsolomou/raw-usage-metadata branch January 28, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llma): pass raw provider usage metadata for backend cost calculations #411

feat(llma): pass raw provider usage metadata for backend cost calculations #411

Uh oh!

richardsolomou commented Jan 27, 2026

Uh oh!

andrewm4894 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(llma): pass raw provider usage metadata for backend cost calculations #411

feat(llma): pass raw provider usage metadata for backend cost calculations #411

Uh oh!

Conversation

richardsolomou commented Jan 27, 2026

Uh oh!

andrewm4894 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants