refactor: use inline usage cost from OpenRouter instead of generation cost API#4328
refactor: use inline usage cost from OpenRouter instead of generation cost API#4328
Conversation
… cost API OpenRouter now includes cost directly in streaming/non-streaming responses via `usage.cost`. This eliminates the need for the separate generation cost polling endpoint, removing the backoff/retry logic and simplifying the billing flow significantly. Closes CS-10506 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Host Test Results2 194 tests +97 2 179 ✅ +97 2h 4m 12s ⏱️ - 13m 7s Results for commit ab107a2. ± Comparison against base commit 0bdc6eb. This pull request removes 11 and adds 108 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
Realm Server Test Results 1 files ± 0 1 suites ±0 13m 51s ⏱️ + 1m 19s Results for commit ab107a2. ± Comparison against base commit 0bdc6eb. This pull request removes 9 and adds 25 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
When a user cancels a stream mid-way, the final chunk containing usage.cost never arrives. In this case, fall back to polling OpenRouter's /generation endpoint using the generationId to ensure credits are still deducted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d531ef2670
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return; | ||
| } | ||
|
|
||
| const generationId = response?.id; |
There was a problem hiding this comment.
Restore fallback generation ID extraction
When inline usage.cost is missing, this now falls back using only response.id, but the previous implementation also handled response.choices[0].id and response.usage.generation_id. For forwarded OpenRouter responses that do not include a top-level id, the fallback /generation?id=... lookup is skipped entirely, so those requests will not deduct credits even though they previously would have been billed.
Useful? React with 👍 / 👎.
jurgenwerk
left a comment
There was a problem hiding this comment.
I tried running this locally and I saw that credits are deducted correctly
However, when I stopped AI generation in the middle of AI response, I did not see any credits being spent. I am not sure if this was how it worked before this change but something you might want to take a look at.
| await spendUsageCost(this.pgAdapter, matrixUserId, costInUsd); | ||
| } else if (generationId) { | ||
| log.info( | ||
| `No inline cost for user ${matrixUserId}, falling back to generation cost API (generationId: ${generationId})`, |
There was a problem hiding this comment.
In which case there is no inline cost?
There was a problem hiding this comment.
Pull request overview
Refactors OpenRouter billing to primarily use inline usage.cost from responses, with a fallback to the /generation?id= cost API when inline cost is unavailable (e.g., interrupted streams).
Changes:
- Update realm-server request-forward billing to deduct credits via inline
usage.costand retain/generationpolling as a fallback. - Simplify credit strategy interface/implementations to route all deductions through
saveUsageCost. - Update ai-bot and realm-server tests to cover inline-cost streaming and generation-cost fallback paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/realm-server/tests/request-forward-test.ts | Updates streaming test to use inline usage.cost and adds a fallback test for /generation polling. |
| packages/realm-server/lib/credit-strategies.ts | Refactors OpenRouter strategy to spend from inline cost first, then fallback to generation cost API. |
| packages/realm-server/handlers/handle-request-forward.ts | Captures usage.cost during SSE proxying and passes cost/generationId into saveUsageCost; simplifies non-stream deduction flow. |
| packages/billing/ai-billing.ts | Removes old saveUsageCost helper and exports fetchGenerationCostWithBackoff for shared fallback usage. |
| packages/ai-bot/main.ts | Switches ai-bot usage tracking to inline cost fast-path with generation-cost fallback. |
Comments suppressed due to low confidence (1)
packages/billing/ai-billing.ts:147
- fetchGenerationCostWithBackoff now appears to be the primary fallback path after removing saveUsageCost, but on terminal failure it only logs an error and returns null. Because this can lead to permanently unbilled generations, consider capturing this failure in Sentry (or otherwise surfacing it) and including enough context (generationId, possibly matrixUserId when available) to investigate billing gaps.
export async function fetchGenerationCostWithBackoff(
generationId: string,
openRouterApiKey: string,
): Promise<number | null> {
let startedAt = Date.now();
let delayMs = INITIAL_BACKOFF_MS;
for (let attempt = 1; attempt <= MAX_FETCH_ATTEMPTS; attempt++) {
try {
let cost = await fetchGenerationCost(generationId, openRouterApiKey);
if (cost !== null) {
return cost;
}
} catch (error) {
log.warn(
`Attempt ${attempt} to fetch generation cost failed (generationId: ${generationId})`,
error,
);
}
let elapsed = Date.now() - startedAt;
if (attempt === MAX_FETCH_ATTEMPTS || elapsed >= MAX_FETCH_RUNTIME_MS) {
break;
}
let remainingTime = MAX_FETCH_RUNTIME_MS - elapsed;
let sleepMs = Math.min(delayMs, remainingTime);
await delay(sleepMs);
delayMs = Math.min(delayMs * 2, MAX_BACKOFF_DELAY_MS);
}
log.error(
`Failed to fetch generation cost within ${MAX_FETCH_ATTEMPTS} attempts or ${Math.round(MAX_FETCH_RUNTIME_MS / 60000)} minutes (generationId: ${generationId})`,
);
return null;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
I also experienced the same issue on the main branch, where credits were not deducted after cancellation. However, after some investigation, I found that the issue is not caused by our code. It occurs because the cost data for canceled events takes longer to be recorded by OpenRouter. We have implemented a backoff strategy to minimize the issue, but it can still occasionally occur. |
Summary
usage.cost, so we primarily use that instead of polling a separate endpointsaveUsageCostflow that always polled OpenRouter's/generationAPI, and replaced it with directspendUsageCostcalls using the inline costCreditStrategyinterface by removing the separatespendUsageCostmethod —saveUsageCostnow handles both inline cost extraction and fallbacksaveUsageCost,extractGenerationIdFromResponse)Why we still need the generation cost API as a fallback
OpenRouter includes
usage.costin the final streaming chunk (the one withfinish_reason). However, if a user cancels/stops the stream before that final chunk arrives, the cost is never received. Without a fallback, these interrupted generations would go unbilled. The generation cost API polling (/generation?id=) is retained as a fallback for this case — OpenRouter still tracks the cost server-side even for interrupted streams, so we can retrieve it after the fact.Flow:
usage.costavailable → use it directly (fast path, no extra API call)generationIdavailable → poll/generation?id=endpoint with backoff (fallback for cancelled streams)Closes CS-10506
Test plan
🤖 Generated with Claude Code