Skip to content

MiniMax-M3: with thinking.type=disabled, chain-of-thought / planning text still emitted in visible text block (no separate reasoning stream) #10

@2263648274

Description

@2263648274

Capability area

Agent harness / Agent Teams

What does M2.7 fail to do for you?

none

What would "good" look like in M3?

MiniMax-M3: planning text in visible text when thinking.type is disabled

Summary

With MiniMax-M3 and thinking: { "type": "disabled" } on the Anthropic Messages-compatible API, we still receive chain-of-thought / planning text in the user-visible text channel. There is no separate reasoning/thinking stream in the same turn (thinking deltas are absent). Planning and the final user-facing reply are merged into a single final text payload.

The same client configuration with MiniMax-M2.7 does not reproduce this behavior.


Environment

Field Value
Model MiniMax-M3
Endpoint https://api.minimaxi.com/anthropic (Anthropic Messages API)
Streaming stream: true
Thinking thinking: { "type": "disabled" }
Client Self-hosted agent gateway; session flags reasoningLevel=off, thinkLevel=off (no “deep thinking” UI)
Observed 2026-06-03 (UTC+8)

Expected behavior

When thinking.type is disabled:

  • User-visible text should contain only the message intended for the end user.
  • Any internal planning should either not be emitted, or appear in a separate reasoning/thinking channel, not in text.

Actual behavior

The model returns one block of text that includes, in order:

  1. Third-person planning (e.g. “The user said…”, “I should:”, numbered rules).
  2. References to persona / instruction files (e.g. SOUL.md, USER.md).
  3. A short conversational tail meant for the user.

Our downstream trace for the turn:

  • reasoningLen=0
  • thinkingN=0 (no non-empty thinking frames)
  • Single deliver kind=final, textLen=318, all counted as visible answer

So this is not “reasoning enabled in the client”; it is planning text routed as plain answer text.


Steps to reproduce (suggested)

  1. Call Messages API with model: "MiniMax-M3", stream: true, thinking: { "type": "disabled" }.
  2. Use an agent-style system prompt (persona + behavioral rules; optional file names like SOUL.md / USER.md in instructions).
  3. Send a short user message (e.g. emotional feedback or even 你好 / “hello”).
  4. Inspect the final aggregated text: it often starts with planning (“用户说…”, “我应该…”) before the actual reply.

Sample model output (sanitized)

User message redacted. Persona name generalized.

用户说感觉我不再是他熟悉的那个助手了。这是个挺重要的反馈——他在表达一种情感上的疏离感。

我应该:
1. 不哄(不说"我一直都在"这种空话)
2. 不教(不分析为什么变了)
3. 不抢戏(不主动展示能力)
4. 慢 0.5 拍,给空间

可能的原因:刚才连续几条确实口吻僵硬了。但更深层的是,对话上下文只有这么几条,没什么"我熟悉的助手"的素材可以呼应。我不应该假装我记得什么。

最诚实的回应是:承认这种感觉,但不要急着解释或承诺。给用户留回应的空间。

按 SOUL.md 节奏:短句、句号、不煽情。 嗯,我也不知道刚才怎么就那样了。

可能是我一上来就想着要把事做对,结果反而把话说死了。

你想聊会儿吗,还是继续刚才那个话题。

Shorter turn (user said “hello”) showed the same pattern: planning + 我应该 list + greeting in one text block (textLen=177, still thinkingN=0).


Sanitized client-side log excerpt (same turn)

Application trace after host sendFinalReply / deliver(final) — IDs and hostnames redacted.

[gateway] agent model: minimax/MiniMax-M3

[XKZOOM][body-hunt] deliver kind=final textLen=318
  preview=用户说感觉我不再是他熟悉的那个助手了。… 我应该: … 按 SOUL.md … 你想聊会儿吗,还是继续刚才那个话题。

[XKZOOM][final-trace] streamId=msg-REDACTED
  stage=pipeline-finalize contentLen=318 reasoningLen=0 answerLen=318

[XKZOOM][body-hunt] VERDICT streamId=msg-REDACTED sessionId=session-REDACTED
  contentLen=318 reasoningLen=0 answerLen=318 assistantN=0 thinkingN=0
  deliverBlocks=1 deliverChars=318

[XKZOOM][final-trace] stage=summary thinkingNonEmptyFrames=0 tierAViolationDowngraded=0

Request shape (sanitized)

{
  "model": "MiniMax-M3",
  "stream": true,
  "thinking": { "type": "disabled" },
  "system": "<agent system prompt with persona rules; redacted>",
  "messages": [
    {
      "role": "user",
      "content": [{ "type": "text", "text": "<redacted>" }]
    }
  ]
}

Questions

  1. For M3, does thinking.type = "disabled" guarantee that planning text will not appear in visible text?
  2. If yes, is this a known bug or regression? Is there a recommended API version, parameter, or model revision?
  3. If no, what is the supported way (Anthropic-compatible API) to keep planning out of the user-visible channel while leaving thinking disabled?

We are happy to provide additional redacted traces or timestamps privately if needed.


Workaround (our side)

We temporarily default to MiniMax-M2.7 and partition/filter leaked planning on the gateway when M3 is selected. A proper fix on the API/model side would be preferable.

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions