Skip to content

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52

Open
hallerite wants to merge 1 commit into
mainfrom
fix/qwen35-arg-coercion-vllm-parity
Open

fix(parsing): vLLM-parity type ladder for XML tool-call arg coercion#52
hallerite wants to merge 1 commit into
mainfrom
fix/qwen35-arg-coercion-vllm-parity

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

Closes #47. Renderers' XML-style parsers (Qwen3.5, GLM, MiniMax, Laguna) flagged <parameter=x>True</parameter> as INVALID_JSON for boolean params because json.loads("True") fails — even though both SGLang's Qwen3CoderDetector and vLLM's Qwen3CoderToolParser accept Pythonic literals via case-folded comparison. Models freely emit True/False at inference because the reference parsers normalize them, but the same outputs came back malformed through renderers.

This swaps _coerce_arg_value from a string-or-json.loads dispatch to the full _convert_param_value ladder both reference parsers share:

Declared type Coercion
(none) — no schema passed json.loads, fallback to raw text + INVALID_JSON (historical behavior preserved)
string / str / text / varchar / char / enum return verbatim
int / uint / long / short / unsigned int(text), fall back to string + INVALID_JSON
num / float float(text) with SGLang's source-string int-demotion heuristic, fall back to string + INVALID_JSON
boolean / bool / binary text.lower() == "true"; non-true/falseFalse + INVALID_JSON
object / array / dict* / list* / anyOf json.loads, then ast.literal_eval fallback (Python literals like {'k': 1}), then string + INVALID_JSON
Unknown ast.literal_eval, fall back to string + INVALID_JSON
null literal (any case) Noneexcept for declared string-family types (see deviation below)

INVALID_JSON is preserved as the verifier / RL-loss signal whenever the value had to degenerate (e.g. yes for a bool, abc for an int).

Deliberate deviation from vLLM/SGLang

Both reference parsers null-coerce "null" before checking type, so a string-typed arg of "null" returns Python None. Renderers keeps "null" verbatim for type: "string" so the existing tests/test_tool_arg_type_preservation.py::test_string_arg_preserves_type[*-string-null] contract holds across all five XML renderers. The XML wire format already can't distinguish the string "null" from JSON null, but when the schema explicitly says type: "string" we honour it.

Reference parser sources

The two are functionally near-identical for this code path; vLLM is slightly more permissive (extra anyOf → object branch). Renderers now matches vLLM's superset, with the one string-null deviation noted above.

Test plan

  • pytest tests/test_arg_coercion.py — 43 new unit tests cover every branch of the ladder, including the issue's True/False regression through full parse_qwen35 round-trip.
  • pytest tests/test_tool_arg_type_preservation.py — 30/30 string-preservation cases pass across all five XML renderers (the string-null case still round-trips verbatim).
  • Full suite: 1371 passed, 53 skipped, 1 xfailed.

🤖 Generated with Claude Code

Renderers' Qwen3.5/GLM/MiniMax/Laguna XML parsers flagged
``<parameter=x>True</parameter>`` as INVALID_JSON for boolean params
because ``json.loads("True")`` fails. Both SGLang's
``Qwen3CoderDetector`` and vLLM's ``Qwen3CoderToolParser`` accept it via
``param_value.lower() == "true"`` — so Pythonic literals the model
freely emits round-trip cleanly at inference but came back malformed
through renderers.

Replaces the string-or-``json.loads`` dispatch with the full
``_convert_param_value`` ladder shared by both reference parsers:
case-insensitive ``null``, ``int()`` / ``float()`` for numeric families
with int demotion, case-folded bool, ``json.loads`` → ``ast.literal_eval``
for objects/arrays/anyOf, ``ast.literal_eval`` catch-all. INVALID_JSON
remains as the verifier / RL-loss signal for values that had to
degenerate (e.g. ``yes`` for bool, ``abc`` for int).

One deliberate deviation from vLLM/SGLang: declared ``string`` types
preserve ``"null"`` verbatim instead of coercing to Python ``None``, so
the existing ``test_tool_arg_type_preservation`` string-verbatim
contract still holds.

Fixes #47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5 tool-call parser is stricter than SGLang qwen3_coder on boolean args

1 participant