Skip to content

[Bug] remains_time drains passively without API calls + cache-read discount not verifiable in Token Plan Plus #47

@juancspjr

Description

@juancspjr

Environment

Field Value
Plan Token Plan Plus ($20/month)
Model MiniMax-M3
Interface Claude Code CLI (Anthropic-compatible endpoint)
Date 2026-06-02
Endpoints tested https://www.minimax.io/v1/token_plan/remains / https://api.minimax.io/v1/chat/completions

Bug 1 — CRITICAL: remains_time drains passively (no API calls made)

The remains_time field in /v1/token_plan/remains decreases continuously
even when zero API calls are being made. This strongly suggests
remains_time is a real-time countdown timer, not a token-consumption
counter — which is not documented anywhere.

Measured evidence (raw timestamps from log):

Timestamp (UTC) Event remains_time Delta
23:11:05 Snapshot — NO call 2,933,989 baseline
23:11:29 Snapshot — NO call 2,910,400 −23,589 ms in 24s
23:11:40 POST test call (192 tokens) 2,899,177 −11,223 ms
23:11:47 Snapshot — NO call 2,891,691 −7,486 ms in 7s
23:12:04 POST 2 long calls (701 tokens each) 2,875,313 −16,378 ms

Between 23:11:05 → 23:11:29 (24 seconds, ZERO API calls):
remains_time dropped 23,589 ms

Between 23:11:40 → 23:11:47 (7 seconds, ZERO API calls):
remains_time dropped 7,486 ms

Expected behavior: remains_time should only decrease when tokens
are consumed via API calls, not as a real-time countdown.

Actual behavior: Balance drains continuously regardless of usage.

Impact: Token Plan Plus ($20/month) exhausted in ~4–5 hours of
agentic coding work (Claude Code + MiniMax-M3), making it unusable
for professional AI-assisted development. Extrapolated rate: the
5-hour window would exhaust in ~50 minutes under continuous agent load.


Bug 2 — Cache-read discount not verifiable (Transparency Bug)

The API correctly reports cached_tokens in the response body,
confirming the cache is active at the inference level. However, there
is no way to verify whether the published 10:1 cache discount
($0.03/M cache-read vs $0.30/M input) is actually applied to
Token Plan balance deductions.

Evidence — Two identical calls, vastly different cache rates:

Call A — LONG_CALL_1 (first call):

{
  "id": "066e963b985425f3648a2e5f8ed6bee3",
  "usage": {
    "total_tokens": 701,
    "prompt_tokens": 681,
    "completion_tokens": 20,
    "prompt_tokens_details": {
      "cached_tokens": 114
    }
  }
}

→ 16.7% of prompt was cached

Call B — LONG_CALL_2 (identical repeat, seconds later):

{
  "id": "066e9642367623db33cb5a8d7bc40a48",
  "usage": {
    "total_tokens": 701,
    "prompt_tokens": 681,
    "completion_tokens": 20,
    "prompt_tokens_details": {
      "cached_tokens": 667
    }
  }
}

97.9% of prompt was cached

Expected cost difference (per published MiniMax pricing):

Call A Call B
Non-cached input tokens 567 @ $0.30/M 14 @ $0.30/M
Cache-read tokens 114 @ $0.03/M 667 @ $0.03/M
Completion tokens 20 @ $1.20/M 20 @ $1.20/M
Expected cost ~197.5 µUSD ~48.2 µUSD
Expected ratio 4.10x more expensive baseline

Problem: The remains_time deduction for both calls combined
(16,378 ms) cannot be audited per-call, and there is no way to
confirm Call B was charged 4x less than Call A. The balance
endpoint provides no per-call history.


Bug 3 — Missing billing fields in API response

Other providers expose cache billing explicitly for auditability.
MiniMax M3 only returns cached_tokens as informational metadata,
but does not expose the fields needed to verify billing:

Current MiniMax response:

"prompt_tokens_details": {
  "cached_tokens": 114
}

What Anthropic exposes (industry standard):

"usage": {
  "cache_creation_input_tokens": N,
  "cache_read_input_tokens": N,
  "input_tokens": N,
  "output_tokens": N
}

Requested addition to MiniMax response:

"prompt_tokens_details": {
  "cached_tokens": N,
  "cache_read_input_tokens": N,       ← ADD: billable cache reads
  "cache_creation_input_tokens": N,   ← ADD: cache write cost
  "non_cached_input_tokens": N        ← ADD: regular input cost
}

Steps to Reproduce

For Bug 1 (passive drain):

export KEY="your-token-plan-subscription-key"

# Snapshot 1
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json'

# Wait 30 seconds — make NO API calls

# Snapshot 2
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json'

# Observe: remains_time has decreased despite zero token consumption

For Bug 2 (cache discount not verifiable):

# Call 1 — note cached_tokens
curl -s 'https://api.minimax.io/v1/chat/completions' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json' \
  -d '{"model":"MiniMax-M3","messages":[{"role":"user","content":"[500+ token prompt here]"}],"max_tokens":20}'

# Call 2 — identical prompt, should hit cache
# Same command as above — observe cached_tokens jumps from ~17% to ~98%
# Then check remains_time — no way to verify 4x cost difference was applied

Related Issues


Questions for MiniMax Team

  1. Is remains_time a real-time countdown or a token-consumption counter?
    If it is time-based, what is the documented conversion to tokens/USD?
  2. Does the 10:1 cache discount apply to Token Plan subscriptions?
    If yes, why does the remains_time deduction not reflect this?
  3. Will cache_read_input_tokens be added to the API response body?
  4. Will a per-call billing breakdown be added to the Token Plan dashboard?

Full API log with all raw JSON responses available upon request.
Tested on: Ubuntu 22.04 / curl 7.81 / Token Plan Plus $20/month

minimax_evidence_REDACTED.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions