Skip to content

Rectify: Codex Config Schema Drift Immunity — TOML Key Quoting + Pre-Flight Validation#3361

Merged
Trecek merged 7 commits into
developfrom
codex-config-schema-drift-immunity-pre-flight-validation-ses/3335
May 31, 2026
Merged

Rectify: Codex Config Schema Drift Immunity — TOML Key Quoting + Pre-Flight Validation#3361
Trecek merged 7 commits into
developfrom
codex-config-schema-drift-immunity-pre-flight-validation-ses/3335

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 31, 2026

Summary

Python's tomllib validates TOML syntax but cannot detect Codex's Rust serde schema rejections. The session config setup copies ~/.codex/config.toml verbatim via shutil.copy2, propagating schema-drifted configs that crash Codex at launch. Additionally, _serialize_toml() emits dotted TOML keys unquoted, corrupting configs on round-trip writes.

This is crash #6 in a chain of 6 crashes in the same code path, indicating systemic architectural weakness. The immunity plan addresses the root causes: (1) add binary-driven pre-flight validation via codex doctor --json, (2) fix the TOML serializer's key quoting, and (3) add round-trip fidelity tests that catch serialization corruption by construction.

Part A (this PR): TOML key quoting fix (REQ-REM-005) + Pre-flight validation via codex doctor --json (REQ-REM-001).

Requirements

REQ-REM-001: Pre-flight config validation via codex doctor --json

Add a pre-flight check inside CodexBackend.ensure_pre_launch() at codex.py:669-675 that runs codex doctor --json and parses the structured output for checks["config.load"]["status"]. If status is not "ok", return a user-facing error string including the check's summary and remediation fields.

Why --json instead of exit code: codex doctor exits 0 even for some config errors (they appear as "Notes" not "Failures"). The --json output provides per-check structured status that is reliable regardless of exit-code semantics.

Ordering: The codex doctor call must come AFTER ensure_codex_mcp_registered() (which may write to config.toml), so the doctor validates the post-write config state.

Edge cases:

  • codex doctor --json includes a websocket connectivity check with a 15-second timeout. Use timeout=20 for the subprocess to avoid killing it before doctor finishes. Treat subprocess.TimeoutExpired as a non-blocking warning, not an error.
  • OSError from the subprocess is NOT unreachable: catch OSError defensively.
  • The user-facing error message should include: (a) the summary and remediation from the JSON output, and (b) a suggestion to run codex doctor for full diagnostics.

REQ-REM-005: Fix _serialize_toml dotted-key quoting

_serialize_toml() in _codex_config.py must quote TOML keys that contain dots. Currently, a key like gpt-5.5 is emitted unquoted, causing TOML to interpret it as nested keys on re-parse. The fix: quote any key string containing a . character (e.g., emit "gpt-5.5" = 1 instead of gpt-5.5 = 1).

Conflict Resolution Decisions

The following files had merge conflicts that were automatically resolved.

None

Closes #3335

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-161302-198051/.autoskillit/temp/rectify/rectify_codex_config_schema_drift_immunity_2026-05-30_200000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
rectify* opus[1m] 1 88 20.1k 3.5M 161.2k 282 169.4k 29m 28s
review_approach* sonnet 1 2.6k 7.6k 208.4k 48.4k 73 58.8k 5m 12s
dry_walkthrough* opus 1 40 6.2k 563.9k 56.2k 75 38.7k 5m 18s
implement* sonnet 1 574 19.8k 3.6M 103.3k 105 87.9k 14m 38s
audit_impl* sonnet 1 514 5.9k 203.0k 35.3k 35 28.1k 3m 0s
prepare_pr* sonnet 1 154.2k 5.8k 278.6k 31.0k 28 43.9k 2m 18s
compose_pr* sonnet 1 52.1k 1.8k 213.8k 31.0k 16 15.6k 46s
review_pr* sonnet 2 324 86.3k 2.2M 116.0k 139 195.4k 21m 27s
resolve_review* opus 2 125 33.2k 4.2M 99.3k 138 184.1k 25m 36s
Total 210.7k 186.6k 14.9M 161.2k 822.0k 1h 47m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
rectify 0
review_approach 0
dry_walkthrough 0
implement 265 13533.1 331.8 74.5
audit_impl 0
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 45 93927.0 4092.2 736.8
Total 310 48186.2 2651.6 601.9

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
opus[1m] 1 88 20.1k 3.5M 169.4k 29m 28s
sonnet 6 210.4k 127.1k 6.7M 429.7k 47m 23s
opus 2 165 39.3k 4.8M 222.9k 30m 55s

@Trecek Trecek force-pushed the codex-config-schema-drift-immunity-pre-flight-validation-ses/3335 branch from 93142be to 73c3d02 Compare May 31, 2026 00:59
Trecek and others added 7 commits May 30, 2026 18:21
…Flight Validation

- Add `_quote_key()` helper and `_BARE_KEY_RE` to `_codex_config.py`; apply to
  all 8 key-emission sites so dotted/special keys are quoted per TOML spec
- Add `_validate_codex_config()` to `codex.py`; calls `codex doctor --json`
  and checks `config.load` status after MCP registration in `ensure_pre_launch()`
- Add `TestSerializeTomlKeyQuoting` (8 tests) and 2 preservation tests to
  `test_codex_config.py` to catch round-trip corruption by construction
- Add `TestCodexEnsurePreLaunchConfigValidation` (8 tests) to
  `test_codex_backend.py` covering error, ok, timeout, OSError, ordering,
  malformed JSON, missing check, and integration scenarios

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Project arch rule requires `import regex as re` in src/ outside hooks/ and core/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…simplify exception handler

Add returncode != 0 guard before JSON parsing to avoid silent false negatives
when codex doctor exits non-zero. Simplify redundant `except (json.JSONDecodeError,
ValueError)` to `except json.JSONDecodeError` since JSONDecodeError is a subclass
of ValueError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add control character escaping (\b, \t, \n, \f, \r) in _quote_key() to prevent
malformed TOML basic strings when keys contain control characters. Per the TOML
spec, these must be escaped in basic strings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… TOML keys

Add test_round_trip_key_with_backslash and test_round_trip_key_with_dot_and_double_quote
to TestSerializeTomlKeyQuoting to cover the backslash-escape and combined special
character code paths in _quote_key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use getattr for result.returncode and result.stdout to handle cases
where subprocess.run is mocked with minimal result objects in test
environments that don't provide all CompletedProcess attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ex_config

Cover the graceful-degradation path added in b2f0483 where codex doctor
exits non-zero — ensure ensure_pre_launch() returns [] without attempting
JSON parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the codex-config-schema-drift-immunity-pre-flight-validation-ses/3335 branch from 73c3d02 to f8f2b4b Compare May 31, 2026 01:21
@Trecek Trecek added this pull request to the merge queue May 31, 2026
Merged via the queue into develop with commit 867c27e May 31, 2026
3 checks passed
@Trecek Trecek deleted the codex-config-schema-drift-immunity-pre-flight-validation-ses/3335 branch May 31, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant