refactor: use inline usage cost from OpenRouter instead of generation cost API by FadhlanR · Pull Request #4328 · cardstack/boxel

FadhlanR · 2026-04-03T12:08:03Z

Summary

OpenRouter now includes cost directly in responses via usage.cost, so we primarily use that instead of polling a separate endpoint
Removed the old saveUsageCost flow that always polled OpenRouter's /generation API, and replaced it with direct spendUsageCost calls using the inline cost
Simplified CreditStrategy interface by removing the separate spendUsageCost method — saveUsageCost now handles both inline cost extraction and fallback
~140 lines of billing code removed (old saveUsageCost, extractGenerationIdFromResponse)

Why we still need the generation cost API as a fallback

OpenRouter includes usage.cost in the final streaming chunk (the one with finish_reason). However, if a user cancels/stops the stream before that final chunk arrives, the cost is never received. Without a fallback, these interrupted generations would go unbilled. The generation cost API polling (/generation?id=) is retained as a fallback for this case — OpenRouter still tracks the cost server-side even for interrupted streams, so we can retrieve it after the fact.

Flow:

Inline usage.cost available → use it directly (fast path, no extra API call)
No inline cost but generationId available → poll /generation?id= endpoint with backoff (fallback for cancelled streams)
Neither available → log warning, skip deduction

Closes CS-10506

Test plan

Verify AI chat generates responses and credits are deducted correctly (inline cost path)
Verify streaming responses that are cancelled mid-way still deduct credits (fallback path)
Verify non-streaming forwarded requests deduct credits from inline cost
Run realm-server request-forward tests (includes both inline and fallback test cases)

🤖 Generated with Claude Code

… cost API OpenRouter now includes cost directly in streaming/non-streaming responses via `usage.cost`. This eliminates the need for the separate generation cost polling endpoint, removing the backoff/retry logic and simplifying the billing flow significantly. Closes CS-10506 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-03T12:27:43Z

Host Test Results

2 194 tests +97 2 179 ✅ +97 2h 4m 12s ⏱️ - 13m 7s
1 suites ± 0 15 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit ab107a2. ± Comparison against base commit 0bdc6eb.

This pull request removes 11 and adds 108 tests. Note that renamed tests count towards both.

Chrome ‑ Integration | Command | host command schema generation test > command schema generation: getInputJsonSchema for OpenCreatePRModalCommand
Chrome ‑ Integration | commands | open-create-pr-modal: dismissCreatePRModal clears the payload
Chrome ‑ Integration | commands | open-create-pr-modal: stores modal payload in operator mode state
Chrome ‑ Integration | commands | open-create-pr-modal: stores modal payload without listingName
Chrome ‑ Integration | components | create-pr-modal: cancel button dismisses the modal
Chrome ‑ Integration | components | create-pr-modal: does not show a separate realm field in modal
Chrome ‑ Integration | components | create-pr-modal: does not show change action when catalog chooser is unavailable
Chrome ‑ Integration | components | create-pr-modal: modal renders when payload is set
Chrome ‑ Integration | components | create-pr-modal: shows the listing pill in modal
Chrome ‑ Integration | components | create-pr-modal: submit shows success state
…

Chrome ‑ Acceptance | code submode | create-file tests > when a selected spec uses a prefix-form ref: can create new card definition in workspace A that extends a card from workspace B via prefix-form ref
Chrome ‑ Acceptance | markdown BFM card references: code mode restores embedded markdown card references after navigating away and back
Chrome ‑ Acceptance | markdown BFM card references: code mode shows overlays for markdown card references and clicking navigates
Chrome ‑ Acceptance | markdown BFM card references: interact mode shows overlays for markdown card references and clicking navigates
Chrome ‑ Acceptance | markdown BFM card references: math placeholders are rendered with KaTeX
Chrome ‑ Acceptance | markdown BFM card references: mermaid code blocks are rendered as SVG diagrams
Chrome ‑ Acceptance | markdown BFM card references: renders inline card reference in atom format and block card reference in embedded format
Chrome ‑ Acceptance | markdown BFM card references: shows fallback text for unresolvable card references
Chrome ‑ Integration | Command | host command schema generation test > command schema generation: getInputJsonSchema for CreateAndOpenSubmissionWorkflowCardCommand
Chrome ‑ Integration | Command | host command schema generation test > command schema generation: getInputJsonSchema for CreateSubmissionWorkflowCommand
…

♻️ This comment has been updated with latest results.

github-actions · 2026-04-03T12:30:52Z

Realm Server Test Results

1 files ± 0 1 suites ±0 13m 51s ⏱️ + 1m 19s
844 tests +16 844 ✅ +16 0 💤 ±0 0 ❌ ±0
915 runs +16 915 ✅ +16 0 💤 ±0 0 ❌ ±0

Results for commit ab107a2. ± Comparison against base commit 0bdc6eb.

This pull request removes 9 and adds 25 tests. Note that renamed tests count towards both.

default ‑ extracts PR number from check_run event
default ‑ extracts PR number from check_suite event
default ‑ extracts PR number from pull_request event
default ‑ extracts realm from local Submission Card URL
default ‑ extracts realm from production Submission Card URL
default ‑ extracts realm from staging Submission Card URL
default ‑ returns null when no PR number found
default ‑ returns null when no Submission Card line exists
default ‑ should handle streaming requests

default ‑ can successfully run a command
default ‑ card responses reflect updated realm config without re-indexing
default ‑ cardTypeName extracts type from absolute URL
default ‑ cardTypeName extracts type from relative path
default ‑ cardTypeName handles deeply nested URLs
default ‑ cardTypeName returns Card for empty string
default ‑ cardTypeName returns single segment as type name
default ‑ cardTypeName strips .json extension before extracting
default ‑ cardTypeName strips trailing slash
default ‑ extracts branch name from check_run event
…

♻️ This comment has been updated with latest results.

When a user cancels a stream mid-way, the final chunk containing usage.cost never arrives. In this case, fall back to polling OpenRouter's /generation endpoint using the generationId to ensure credits are still deducted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d531ef2670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T15:26:08Z

packages/realm-server/lib/credit-strategies.ts

+      return;
+    }
+
+    const generationId = response?.id;


Restore fallback generation ID extraction

When inline usage.cost is missing, this now falls back using only response.id, but the previous implementation also handled response.choices[0].id and response.usage.generation_id. For forwarded OpenRouter responses that do not include a top-level id, the fallback /generation?id=... lookup is skipped entirely, so those requests will not deduct credits even though they previously would have been billed.

Useful? React with 👍 / 👎.

jurgenwerk

I tried running this locally and I saw that credits are deducted correctly

However, when I stopped AI generation in the middle of AI response, I did not see any credits being spent. I am not sure if this was how it worked before this change but something you might want to take a look at.

jurgenwerk · 2026-04-07T08:58:49Z

packages/ai-bot/main.ts

+        await spendUsageCost(this.pgAdapter, matrixUserId, costInUsd);
+      } else if (generationId) {
+        log.info(
+          `No inline cost for user ${matrixUserId}, falling back to generation cost API (generationId: ${generationId})`,


In which case there is no inline cost?

Copilot

Pull request overview

Refactors OpenRouter billing to primarily use inline usage.cost from responses, with a fallback to the /generation?id= cost API when inline cost is unavailable (e.g., interrupted streams).

Changes:

Update realm-server request-forward billing to deduct credits via inline usage.cost and retain /generation polling as a fallback.
Simplify credit strategy interface/implementations to route all deductions through saveUsageCost.
Update ai-bot and realm-server tests to cover inline-cost streaming and generation-cost fallback paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/realm-server/tests/request-forward-test.ts	Updates streaming test to use inline `usage.cost` and adds a fallback test for `/generation` polling.
packages/realm-server/lib/credit-strategies.ts	Refactors OpenRouter strategy to spend from inline cost first, then fallback to generation cost API.
packages/realm-server/handlers/handle-request-forward.ts	Captures `usage.cost` during SSE proxying and passes cost/generationId into `saveUsageCost`; simplifies non-stream deduction flow.
packages/billing/ai-billing.ts	Removes old `saveUsageCost` helper and exports `fetchGenerationCostWithBackoff` for shared fallback usage.
packages/ai-bot/main.ts	Switches ai-bot usage tracking to inline cost fast-path with generation-cost fallback.

Comments suppressed due to low confidence (1)

packages/billing/ai-billing.ts:147

fetchGenerationCostWithBackoff now appears to be the primary fallback path after removing saveUsageCost, but on terminal failure it only logs an error and returns null. Because this can lead to permanently unbilled generations, consider capturing this failure in Sentry (or otherwise surfacing it) and including enough context (generationId, possibly matrixUserId when available) to investigate billing gaps.

export async function fetchGenerationCostWithBackoff(
  generationId: string,
  openRouterApiKey: string,
): Promise<number | null> {
  let startedAt = Date.now();
  let delayMs = INITIAL_BACKOFF_MS;

  for (let attempt = 1; attempt <= MAX_FETCH_ATTEMPTS; attempt++) {
    try {
      let cost = await fetchGenerationCost(generationId, openRouterApiKey);
      if (cost !== null) {
        return cost;
      }
    } catch (error) {
      log.warn(
        `Attempt ${attempt} to fetch generation cost failed (generationId: ${generationId})`,
        error,
      );
    }

    let elapsed = Date.now() - startedAt;
    if (attempt === MAX_FETCH_ATTEMPTS || elapsed >= MAX_FETCH_RUNTIME_MS) {
      break;
    }

    let remainingTime = MAX_FETCH_RUNTIME_MS - elapsed;
    let sleepMs = Math.min(delayMs, remainingTime);
    await delay(sleepMs);
    delayMs = Math.min(delayMs * 2, MAX_BACKOFF_DELAY_MS);
  }

  log.error(
    `Failed to fetch generation cost within ${MAX_FETCH_ATTEMPTS} attempts or ${Math.round(MAX_FETCH_RUNTIME_MS / 60000)} minutes (generationId: ${generationId})`,
  );
  return null;
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/realm-server/handlers/handle-request-forward.ts

packages/realm-server/lib/credit-strategies.ts

packages/ai-bot/main.ts

FadhlanR · 2026-04-10T07:21:43Z

I tried running this locally and I saw that credits are deducted correctly

However, when I stopped AI generation in the middle of AI response, I did not see any credits being spent. I am not sure if this was how it worked before this change but something you might want to take a look at.

I also experienced the same issue on the main branch, where credits were not deducted after cancellation. However, after some investigation, I found that the issue is not caused by our code. It occurs because the cost data for canceled events takes longer to be recorded by OpenRouter. We have implemented a backoff strategy to minimize the issue, but it can still occasionally occur.

FadhlanR marked this pull request as ready for review April 3, 2026 15:21

FadhlanR requested a review from jurgenwerk April 3, 2026 15:21

chatgpt-codex-connector bot reviewed Apr 3, 2026

View reviewed changes

jurgenwerk approved these changes Apr 7, 2026

View reviewed changes

jurgenwerk reviewed Apr 7, 2026

View reviewed changes

habdelra requested review from Copilot April 7, 2026 12:46

Copilot started reviewing on behalf of habdelra April 7, 2026 12:47 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

packages/realm-server/handlers/handle-request-forward.ts Outdated Show resolved Hide resolved

packages/realm-server/lib/credit-strategies.ts Show resolved Hide resolved

packages/ai-bot/main.ts Show resolved Hide resolved

Address feedback

ab107a2

FadhlanR merged commit 3a405f1 into main Apr 10, 2026
57 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use inline usage cost from OpenRouter instead of generation cost API#4328

refactor: use inline usage cost from OpenRouter instead of generation cost API#4328
FadhlanR merged 3 commits intomainfrom
cs-10506-refactor-ai-credits-spending

FadhlanR commented Apr 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Uh oh!

jurgenwerk left a comment

Uh oh!

jurgenwerk Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FadhlanR commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

FadhlanR commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why we still need the generation cost API as a fallback

Test plan

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Host Test Results

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Realm Server Test Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jurgenwerk left a comment

Choose a reason for hiding this comment

Uh oh!

jurgenwerk Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FadhlanR commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FadhlanR commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading