Skip to content

Conversation

@richardsolomou
Copy link
Member

Add raw_usage field to TokenUsage type to capture raw provider usage metadata (OpenAI, Anthropic, Gemini). This enables the backend to extract modality-specific token counts (text vs image vs audio) for accurate cost calculations.

  • Add raw_usage field to TokenUsage TypedDict
  • Update all provider converters to capture raw usage:
    • OpenAI: capture response.usage and chunk usage
    • Anthropic: capture usage from message_start and message_delta events
    • Gemini: capture usage_metadata from responses and chunks
  • Pass raw usage as $ai_usage property in PostHog events
  • Update merge_usage_stats to handle raw_usage in both modes
  • Add tests verifying $ai_usage is captured for all providers

Backend will extract provider-specific details and delete $ai_usage after processing to avoid bloating properties.

Add raw_usage field to TokenUsage type to capture raw provider usage metadata (OpenAI, Anthropic, Gemini). This enables the backend to extract modality-specific token counts (text vs image vs audio) for accurate cost calculations.

- Add raw_usage field to TokenUsage TypedDict
- Update all provider converters to capture raw usage:
  - OpenAI: capture response.usage and chunk usage
  - Anthropic: capture usage from message_start and message_delta events
  - Gemini: capture usage_metadata from responses and chunks
- Pass raw usage as $ai_usage property in PostHog events
- Update merge_usage_stats to handle raw_usage in both modes
- Add tests verifying $ai_usage is captured for all providers

Backend will extract provider-specific details and delete $ai_usage after processing to avoid bloating properties.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@richardsolomou richardsolomou requested a review from a team January 27, 2026 11:56
@andrewm4894 andrewm4894 self-assigned this Jan 27, 2026
Copy link
Member

@andrewm4894 andrewm4894 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checking one or two things

@richardsolomou richardsolomou requested a review from a team January 27, 2026 16:47
Address PR review feedback from @andrewm4894:

1. **Serialization**: Add serialize_raw_usage() helper with fallback chain:
   - .model_dump() for Pydantic models (OpenAI/Anthropic)
   - .to_dict() for protobuf-like objects
   - vars() for simple objects
   - str() as last resort
   This ensures we never pass unserializable objects to PostHog client.

2. **Data loss prevention**: Change from replacing to merging raw_usage in
   incremental mode. For Anthropic streaming, message_start has input token
   details and message_delta has output token details - merging preserves
   both instead of losing input data.

3. **Test coverage**: Enhanced tests to verify:
   - JSON serializability with json.dumps()
   - Expected structure of raw_usage dicts
   - Coverage for both non-streaming and streaming modes
   - Fixed Gemini test mocks to return proper dicts from model_dump()

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@richardsolomou richardsolomou requested review from andrewm4894 and removed request for a team January 27, 2026 17:08
richardsolomou and others added 2 commits January 27, 2026 19:28
Address PR feedback from @andrewm4894 - serialize in converters, not utils.

**Problem:**
Utils was receiving raw Pydantic/protobuf objects and serializing them,
which meant provider-specific knowledge leaked into generic code.

**Solution:**
Move serialization into converters where provider context exists:

Converters (NEW):
- OpenAI: serialize_raw_usage(response.usage) → dict
- Anthropic: serialize_raw_usage(event.usage) → dict
- Gemini: serialize_raw_usage(metadata) → dict

Utils (SIMPLIFIED):
- Just passes dicts through, no serialization needed
- Merge operations work with dicts only

**Benefits:**
1. Type correctness: raw_usage is always Dict[str, Any]
2. Separation of concerns: converters handle provider formats
3. Fail fast: serialization errors in converters with context
4. Cleaner abstraction: utils doesn't know about Pydantic/protobuf

**Flow:**
Provider object → Converter serializes → dict → Utils → PostHog

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix mypy error: "Need type annotation for 'current_raw'"

Extract value first, then apply explicit type annotation with ternary
conditional to satisfy mypy's type checker.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@richardsolomou richardsolomou merged commit c32c783 into master Jan 28, 2026
31 of 32 checks passed
@richardsolomou richardsolomou deleted the richardsolomou/raw-usage-metadata branch January 28, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants