Environment
| Field |
Value |
| Plan |
Token Plan Plus ($20/month) |
| Model |
MiniMax-M3 |
| Interface |
Claude Code CLI (Anthropic-compatible endpoint) |
| Date |
2026-06-02 |
| Endpoints tested |
https://www.minimax.io/v1/token_plan/remains / https://api.minimax.io/v1/chat/completions |
Bug 1 — CRITICAL: remains_time drains passively (no API calls made)
The remains_time field in /v1/token_plan/remains decreases continuously
even when zero API calls are being made. This strongly suggests
remains_time is a real-time countdown timer, not a token-consumption
counter — which is not documented anywhere.
Measured evidence (raw timestamps from log):
| Timestamp (UTC) |
Event |
remains_time |
Delta |
| 23:11:05 |
Snapshot — NO call |
2,933,989 |
baseline |
| 23:11:29 |
Snapshot — NO call |
2,910,400 |
−23,589 ms in 24s |
| 23:11:40 |
POST test call (192 tokens) |
2,899,177 |
−11,223 ms |
| 23:11:47 |
Snapshot — NO call |
2,891,691 |
−7,486 ms in 7s |
| 23:12:04 |
POST 2 long calls (701 tokens each) |
2,875,313 |
−16,378 ms |
Between 23:11:05 → 23:11:29 (24 seconds, ZERO API calls):
remains_time dropped 23,589 ms
Between 23:11:40 → 23:11:47 (7 seconds, ZERO API calls):
remains_time dropped 7,486 ms
Expected behavior: remains_time should only decrease when tokens
are consumed via API calls, not as a real-time countdown.
Actual behavior: Balance drains continuously regardless of usage.
Impact: Token Plan Plus ($20/month) exhausted in ~4–5 hours of
agentic coding work (Claude Code + MiniMax-M3), making it unusable
for professional AI-assisted development. Extrapolated rate: the
5-hour window would exhaust in ~50 minutes under continuous agent load.
Bug 2 — Cache-read discount not verifiable (Transparency Bug)
The API correctly reports cached_tokens in the response body,
confirming the cache is active at the inference level. However, there
is no way to verify whether the published 10:1 cache discount
($0.03/M cache-read vs $0.30/M input) is actually applied to
Token Plan balance deductions.
Evidence — Two identical calls, vastly different cache rates:
Call A — LONG_CALL_1 (first call):
{
"id": "066e963b985425f3648a2e5f8ed6bee3",
"usage": {
"total_tokens": 701,
"prompt_tokens": 681,
"completion_tokens": 20,
"prompt_tokens_details": {
"cached_tokens": 114
}
}
}
→ 16.7% of prompt was cached
Call B — LONG_CALL_2 (identical repeat, seconds later):
{
"id": "066e9642367623db33cb5a8d7bc40a48",
"usage": {
"total_tokens": 701,
"prompt_tokens": 681,
"completion_tokens": 20,
"prompt_tokens_details": {
"cached_tokens": 667
}
}
}
→ 97.9% of prompt was cached
Expected cost difference (per published MiniMax pricing):
|
Call A |
Call B |
| Non-cached input tokens |
567 @ $0.30/M |
14 @ $0.30/M |
| Cache-read tokens |
114 @ $0.03/M |
667 @ $0.03/M |
| Completion tokens |
20 @ $1.20/M |
20 @ $1.20/M |
| Expected cost |
~197.5 µUSD |
~48.2 µUSD |
| Expected ratio |
4.10x more expensive |
baseline |
Problem: The remains_time deduction for both calls combined
(16,378 ms) cannot be audited per-call, and there is no way to
confirm Call B was charged 4x less than Call A. The balance
endpoint provides no per-call history.
Bug 3 — Missing billing fields in API response
Other providers expose cache billing explicitly for auditability.
MiniMax M3 only returns cached_tokens as informational metadata,
but does not expose the fields needed to verify billing:
Current MiniMax response:
"prompt_tokens_details": {
"cached_tokens": 114
}
What Anthropic exposes (industry standard):
"usage": {
"cache_creation_input_tokens": N,
"cache_read_input_tokens": N,
"input_tokens": N,
"output_tokens": N
}
Requested addition to MiniMax response:
"prompt_tokens_details": {
"cached_tokens": N,
"cache_read_input_tokens": N, ← ADD: billable cache reads
"cache_creation_input_tokens": N, ← ADD: cache write cost
"non_cached_input_tokens": N ← ADD: regular input cost
}
Steps to Reproduce
For Bug 1 (passive drain):
export KEY="your-token-plan-subscription-key"
# Snapshot 1
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
-H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json'
# Wait 30 seconds — make NO API calls
# Snapshot 2
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
-H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json'
# Observe: remains_time has decreased despite zero token consumption
For Bug 2 (cache discount not verifiable):
# Call 1 — note cached_tokens
curl -s 'https://api.minimax.io/v1/chat/completions' \
-H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json' \
-d '{"model":"MiniMax-M3","messages":[{"role":"user","content":"[500+ token prompt here]"}],"max_tokens":20}'
# Call 2 — identical prompt, should hit cache
# Same command as above — observe cached_tokens jumps from ~17% to ~98%
# Then check remains_time — no way to verify 4x cost difference was applied
Related Issues
Questions for MiniMax Team
- Is
remains_time a real-time countdown or a token-consumption counter?
If it is time-based, what is the documented conversion to tokens/USD?
- Does the 10:1 cache discount apply to Token Plan subscriptions?
If yes, why does the remains_time deduction not reflect this?
- Will
cache_read_input_tokens be added to the API response body?
- Will a per-call billing breakdown be added to the Token Plan dashboard?
Full API log with all raw JSON responses available upon request.
Tested on: Ubuntu 22.04 / curl 7.81 / Token Plan Plus $20/month
minimax_evidence_REDACTED.log
Environment
https://www.minimax.io/v1/token_plan/remains/https://api.minimax.io/v1/chat/completionsBug 1 — CRITICAL: remains_time drains passively (no API calls made)
The
remains_timefield in/v1/token_plan/remainsdecreases continuouslyeven when zero API calls are being made. This strongly suggests
remains_timeis a real-time countdown timer, not a token-consumptioncounter — which is not documented anywhere.
Measured evidence (raw timestamps from log):
Between 23:11:05 → 23:11:29 (24 seconds, ZERO API calls):
remains_time dropped 23,589 ms
Between 23:11:40 → 23:11:47 (7 seconds, ZERO API calls):
remains_time dropped 7,486 ms
Expected behavior:
remains_timeshould only decrease when tokensare consumed via API calls, not as a real-time countdown.
Actual behavior: Balance drains continuously regardless of usage.
Impact: Token Plan Plus ($20/month) exhausted in ~4–5 hours of
agentic coding work (Claude Code + MiniMax-M3), making it unusable
for professional AI-assisted development. Extrapolated rate: the
5-hour window would exhaust in ~50 minutes under continuous agent load.
Bug 2 — Cache-read discount not verifiable (Transparency Bug)
The API correctly reports
cached_tokensin the response body,confirming the cache is active at the inference level. However, there
is no way to verify whether the published 10:1 cache discount
(
$0.03/Mcache-read vs$0.30/Minput) is actually applied toToken Plan balance deductions.
Evidence — Two identical calls, vastly different cache rates:
Call A — LONG_CALL_1 (first call):
{ "id": "066e963b985425f3648a2e5f8ed6bee3", "usage": { "total_tokens": 701, "prompt_tokens": 681, "completion_tokens": 20, "prompt_tokens_details": { "cached_tokens": 114 } } }→ 16.7% of prompt was cached
Call B — LONG_CALL_2 (identical repeat, seconds later):
{ "id": "066e9642367623db33cb5a8d7bc40a48", "usage": { "total_tokens": 701, "prompt_tokens": 681, "completion_tokens": 20, "prompt_tokens_details": { "cached_tokens": 667 } } }→ 97.9% of prompt was cached
Expected cost difference (per published MiniMax pricing):
Problem: The
remains_timededuction for both calls combined(16,378 ms) cannot be audited per-call, and there is no way to
confirm Call B was charged 4x less than Call A. The balance
endpoint provides no per-call history.
Bug 3 — Missing billing fields in API response
Other providers expose cache billing explicitly for auditability.
MiniMax M3 only returns
cached_tokensas informational metadata,but does not expose the fields needed to verify billing:
Current MiniMax response:
What Anthropic exposes (industry standard):
Requested addition to MiniMax response:
Steps to Reproduce
For Bug 1 (passive drain):
For Bug 2 (cache discount not verifiable):
Related Issues
Questions for MiniMax Team
remains_timea real-time countdown or a token-consumption counter?If it is time-based, what is the documented conversion to tokens/USD?
If yes, why does the
remains_timededuction not reflect this?cache_read_input_tokensbe added to the API response body?Full API log with all raw JSON responses available upon request.
Tested on: Ubuntu 22.04 / curl 7.81 / Token Plan Plus $20/month
minimax_evidence_REDACTED.log