[Bug] remains_time drains passively without API calls + cache-read discount not verifiable in Token Plan Plus

## Environment

| Field | Value |
|-------|-------|
| Plan | Token Plan Plus ($20/month) |
| Model | MiniMax-M3 |
| Interface | Claude Code CLI (Anthropic-compatible endpoint) |
| Date | 2026-06-02 |
| Endpoints tested | `https://www.minimax.io/v1/token_plan/remains` / `https://api.minimax.io/v1/chat/completions` |

---

## Bug 1 — CRITICAL: remains_time drains passively (no API calls made)

The `remains_time` field in `/v1/token_plan/remains` decreases continuously
even when **zero API calls** are being made. This strongly suggests 
`remains_time` is a **real-time countdown timer**, not a token-consumption 
counter — which is not documented anywhere.

### Measured evidence (raw timestamps from log):

| Timestamp (UTC) | Event | remains_time | Delta |
|---|---|---|---|
| 23:11:05 | Snapshot — NO call | 2,933,989 | baseline |
| 23:11:29 | Snapshot — NO call | 2,910,400 | **−23,589 ms in 24s** |
| 23:11:40 | POST test call (192 tokens) | 2,899,177 | −11,223 ms |
| 23:11:47 | Snapshot — NO call | 2,891,691 | **−7,486 ms in 7s** |
| 23:12:04 | POST 2 long calls (701 tokens each) | 2,875,313 | −16,378 ms |

Between 23:11:05 → 23:11:29 (**24 seconds, ZERO API calls**):
**remains_time dropped 23,589 ms**

Between 23:11:40 → 23:11:47 (**7 seconds, ZERO API calls**):
**remains_time dropped 7,486 ms**

**Expected behavior:** `remains_time` should only decrease when tokens 
are consumed via API calls, not as a real-time countdown.

**Actual behavior:** Balance drains continuously regardless of usage.

**Impact:** Token Plan Plus ($20/month) exhausted in ~4–5 hours of 
agentic coding work (Claude Code + MiniMax-M3), making it unusable 
for professional AI-assisted development. Extrapolated rate: the 
5-hour window would exhaust in ~50 minutes under continuous agent load.

---

## Bug 2 — Cache-read discount not verifiable (Transparency Bug)

The API **correctly reports** `cached_tokens` in the response body, 
confirming the cache is active at the inference level. However, there 
is **no way to verify** whether the published 10:1 cache discount 
(`$0.03/M` cache-read vs `$0.30/M` input) is actually applied to 
Token Plan balance deductions.

### Evidence — Two identical calls, vastly different cache rates:

**Call A** — LONG_CALL_1 (first call):
```json
{
  "id": "066e963b985425f3648a2e5f8ed6bee3",
  "usage": {
    "total_tokens": 701,
    "prompt_tokens": 681,
    "completion_tokens": 20,
    "prompt_tokens_details": {
      "cached_tokens": 114
    }
  }
}
```
→ 16.7% of prompt was cached

**Call B** — LONG_CALL_2 (identical repeat, seconds later):
```json
{
  "id": "066e9642367623db33cb5a8d7bc40a48",
  "usage": {
    "total_tokens": 701,
    "prompt_tokens": 681,
    "completion_tokens": 20,
    "prompt_tokens_details": {
      "cached_tokens": 667
    }
  }
}
```
→ **97.9% of prompt was cached**

### Expected cost difference (per published MiniMax pricing):

| | Call A | Call B |
|---|---|---|
| Non-cached input tokens | 567 @ $0.30/M | 14 @ $0.30/M |
| Cache-read tokens | 114 @ $0.03/M | 667 @ $0.03/M |
| Completion tokens | 20 @ $1.20/M | 20 @ $1.20/M |
| **Expected cost** | **~197.5 µUSD** | **~48.2 µUSD** |
| **Expected ratio** | 4.10x more expensive | baseline |

**Problem:** The `remains_time` deduction for both calls combined 
(16,378 ms) cannot be audited per-call, and there is no way to 
confirm Call B was charged 4x less than Call A. The balance 
endpoint provides no per-call history.

---

## Bug 3 — Missing billing fields in API response

Other providers expose cache billing explicitly for auditability. 
MiniMax M3 only returns `cached_tokens` as informational metadata,
but does **not** expose the fields needed to verify billing:

**Current MiniMax response:**
```json
"prompt_tokens_details": {
  "cached_tokens": 114
}
```

**What Anthropic exposes (industry standard):**
```json
"usage": {
  "cache_creation_input_tokens": N,
  "cache_read_input_tokens": N,
  "input_tokens": N,
  "output_tokens": N
}
```

**Requested addition to MiniMax response:**
```json
"prompt_tokens_details": {
  "cached_tokens": N,
  "cache_read_input_tokens": N,       ← ADD: billable cache reads
  "cache_creation_input_tokens": N,   ← ADD: cache write cost
  "non_cached_input_tokens": N        ← ADD: regular input cost
}
```

---

## Steps to Reproduce

**For Bug 1 (passive drain):**
```bash
export KEY="your-token-plan-subscription-key"

# Snapshot 1
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json'

# Wait 30 seconds — make NO API calls

# Snapshot 2
curl -s 'https://www.minimax.io/v1/token_plan/remains' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json'

# Observe: remains_time has decreased despite zero token consumption
```

**For Bug 2 (cache discount not verifiable):**
```bash
# Call 1 — note cached_tokens
curl -s 'https://api.minimax.io/v1/chat/completions' \
  -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json' \
  -d '{"model":"MiniMax-M3","messages":[{"role":"user","content":"[500+ token prompt here]"}],"max_tokens":20}'

# Call 2 — identical prompt, should hit cache
# Same command as above — observe cached_tokens jumps from ~17% to ~98%
# Then check remains_time — no way to verify 4x cost difference was applied
```

---

## Related Issues

- #44 — Claude Code CLI timeouts on MiniMax Anthropic API
- #43 — API Error 400 invalid message role with Claude Code CLI
- #42 — Unauthorized charge / refund request  
- openclaw/openclaw#52335 — usage tracker reports 0% left incorrectly

---

## Questions for MiniMax Team

1. Is `remains_time` a real-time countdown or a token-consumption counter?
   If it is time-based, what is the documented conversion to tokens/USD?
2. Does the 10:1 cache discount apply to Token Plan subscriptions?
   If yes, why does the `remains_time` deduction not reflect this?
3. Will `cache_read_input_tokens` be added to the API response body?
4. Will a per-call billing breakdown be added to the Token Plan dashboard?

---

*Full API log with all raw JSON responses available upon request.*  
*Tested on: Ubuntu 22.04 / curl 7.81 / Token Plan Plus $20/month*

[minimax_evidence_REDACTED.log](https://github.com/user-attachments/files/28530327/minimax_evidence_REDACTED.log)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] remains_time drains passively without API calls + cache-read discount not verifiable in Token Plan Plus #47

Environment

Bug 1 — CRITICAL: remains_time drains passively (no API calls made)

Measured evidence (raw timestamps from log):

Bug 2 — Cache-read discount not verifiable (Transparency Bug)

Evidence — Two identical calls, vastly different cache rates:

Expected cost difference (per published MiniMax pricing):

Bug 3 — Missing billing fields in API response

Steps to Reproduce

Related Issues

Questions for MiniMax Team

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
Plan	Token Plan Plus ($20/month)
Model	MiniMax-M3
Interface	Claude Code CLI (Anthropic-compatible endpoint)
Date	2026-06-02
Endpoints tested	`https://www.minimax.io/v1/token_plan/remains` / `https://api.minimax.io/v1/chat/completions`

Timestamp (UTC)	Event	remains_time	Delta
23:11:05	Snapshot — NO call	2,933,989	baseline
23:11:29	Snapshot — NO call	2,910,400	−23,589 ms in 24s
23:11:40	POST test call (192 tokens)	2,899,177	−11,223 ms
23:11:47	Snapshot — NO call	2,891,691	−7,486 ms in 7s
23:12:04	POST 2 long calls (701 tokens each)	2,875,313	−16,378 ms

	Call A	Call B
Non-cached input tokens	567 @ $0.30/M	14 @ $0.30/M
Cache-read tokens	114 @ $0.03/M	667 @ $0.03/M
Completion tokens	20 @ $1.20/M	20 @ $1.20/M
Expected cost	~197.5 µUSD	~48.2 µUSD
Expected ratio	4.10x more expensive	baseline

[Bug] remains_time drains passively without API calls + cache-read discount not verifiable in Token Plan Plus #47

Description

Environment

Bug 1 — CRITICAL: remains_time drains passively (no API calls made)

Measured evidence (raw timestamps from log):

Bug 2 — Cache-read discount not verifiable (Transparency Bug)

Evidence — Two identical calls, vastly different cache rates:

Expected cost difference (per published MiniMax pricing):

Bug 3 — Missing billing fields in API response

Steps to Reproduce

Related Issues

Questions for MiniMax Team

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions