Skip to content

Codex Backend Architectural Immunity — Capability Consumption, Symmetric Env, and NotImplementedError Arch Tests#3395

Open
Trecek wants to merge 5 commits into
developfrom
codex-backend-architectural-immunity-capability-consumption/3385
Open

Codex Backend Architectural Immunity — Capability Consumption, Symmetric Env, and NotImplementedError Arch Tests#3395
Trecek wants to merge 5 commits into
developfrom
codex-backend-architectural-immunity-capability-consumption/3385

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 31, 2026

Summary

Add 3 new arch/server tests (plus verify 1 existing) to enforce structural invariants that prevent recurring Codex backend bugs. The tests cover: (1) verifying existing BackendCapabilities consumption test coverage, (2) symmetric env var contract between build_skill_session_cmd and build_food_truck_cmd, (3) no raise NotImplementedError in registered backend classes, and (4) tool-list visibility delta after kitchen enable within a live client session. Also update tests/arch/CLAUDE.md to register the new test files.

Requirements

1. BackendCapabilities Consumption Contract

  • Purpose: Every field on BackendCapabilities must be consumed by at least one production read site in src/, OR registered in _FORWARD_DECLARED with a linked issue number
  • Location: tests/arch/test_capability_consumption.py — already exists. Verify coverage includes any new fields added by Rectify: Codex MCP Tool/Resource Visibility — notifications/tools/list_changed Immunity #3382 (supports_tool_list_changed). Currently has 4 entries in _FORWARD_DECLARED.
  • Note: The _FORWARD_DECLARED dict requires a real #NNNN issue reference per field — enforced by test_forward_declared_has_linked_issues().

2. Symmetric Command Builder Env Vars

  • Purpose: build_skill_session_cmd and build_food_truck_cmd on every backend must set the same required base env vars
  • Location: tests/arch/ (new test file or extend test_backend_coherence.py)
  • Rationale: G5 was caused by the two methods being written independently. build_skill_session_cmd sets AGENT_BACKEND_ENV_VAR, AUTOSKILLIT_HEADLESS_AUTO_GATE, AUTOSKILLIT_APPLICABLE_GUARDS — all missing from build_food_truck_cmd.
  • Implementation: Call both methods on each backend in BACKEND_REGISTRY. The food_truck_cmd requires plugin_source arg — use DirectInstall(plugin_dir=Path("/plugins")) per existing pattern.
  • Important: Some env vars are intentionally asymmetric. AUTOSKILLIT_SKILL_NAME is IL-1-only (should NOT be in food truck). The test must define a precise "required-in-both" subset: AUTOSKILLIT_HEADLESS, AUTOSKILLIT_SESSION_TYPE, MAX_MCP_OUTPUT_TOKENS, AGENT_BACKEND_ENV_VAR.

3. No NotImplementedError in Production Backends

  • Purpose: No method on any backend class registered in BACKEND_REGISTRY should raise NotImplementedError
  • Location: tests/arch/ (new test file)
  • Rationale: CodingAgentBackend is a typing.Protocol (with @runtime_checkable), NOT an ABC — so there are no legitimate abstract stubs. Any raise NotImplementedError in a registered backend class is a bug.
  • Implementation: AST-scan for raise NotImplementedError in classes from BACKEND_REGISTRY.values().
  • Note: This test is expected to fail until Rectify: CodexBackend Command Builder Gaps — Missing Env Vars, Sandbox Flags, and Stub Methods #3383 lands (which implements the actual fixes for validate_skill_content and list_plugins).

4. Notification-Support Integration Test

  • Purpose: Verify that after open_kitchen is called, the tool list visible to the client actually changes
  • Location: tests/server/ (extend test_server_init_gate.py)
  • Rationale: G1 was completely invisible to the existing test suite.
  • Implementation: Use from fastmcp.client import Client; async with Client(mcp) as client: tools = await client.list_tools() pattern.

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260530-215854-975848/.autoskillit/temp/make-plan/codex_backend_arch_immunity_plan_2026-05-30_220600.md

🤖 Generated with Claude Code via AutoSkillit

@Trecek Trecek force-pushed the codex-backend-architectural-immunity-capability-consumption/3385 branch from 823947a to dae82fa Compare May 31, 2026 06:40
Trecek and others added 4 commits May 30, 2026 23:59
- tests/arch/test_env_symmetry.py: Verify build_skill_session_cmd and
  build_food_truck_cmd share required env vars (xfail: AGENT_BACKEND_ENV_VAR
  missing from food_truck)
- tests/arch/test_no_not_implemented.py: Verify no BACKEND_REGISTRY class
  raises NotImplementedError (xfail: CodexBackend has two unimplemented methods)
- tests/server/test_server_init_gate.py: Add
  test_tool_list_changes_after_enable_within_session to verify kitchen tag
  enable propagates to client tool list
- tests/arch/CLAUDE.md: Register new test files
- resolve-review/SKILL.md: Add to Architectural Constraint Catalog

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
xfail without a tracking issue number has no discoverable removal path.
Adds #3385 reference so the xfail can be cleaned up when AUTOSKILLIT_AGENT_BACKEND
is wired into build_food_truck_cmd.
…otation

- assert new_tools & GATED_TOOLS instead of bare truthiness check — verifies
  that the newly visible tools are actually kitchen-gated tools, not arbitrary
  tools from a misconfigured tag.
- Add -> None return type to test_tool_list_changes_after_enable_within_session
  for consistency with the surrounding class.
…ntedError detection

- test_env_symmetry.py: assert BACKEND_REGISTRY before both loops so vacuous
  passes (empty registry → skipped assertions → false green) are detected
- test_no_not_implemented.py: extend AST check to catch bare `raise
  NotImplementedError` (ast.Name), which the existing ast.Call-only branch
  silently missed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the codex-backend-architectural-immunity-capability-consumption/3385 branch from dae82fa to aafec3d Compare May 31, 2026 06:59
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

Comment thread tests/server/test_server_init_gate.py
Comment thread tests/arch/test_no_not_implemented.py
Comment thread tests/arch/test_env_symmetry.py
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review: 1 critical issue (tab/space IndentationError in test_server_init_gate.py) and 2 warnings found. See inline comments.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observations accumulated from 1 local review rounds:


violations: list[str] = []
for name, cls in BACKEND_REGISTRY.items():
source = inspect.getsource(cls)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation from local review round 0:

[warning] defense: inspect.getsource(cls) can raise OSError (source unavailable, C extension, etc.) with no error context identifying which backend triggered it. Wrap in try/except and include name in the error to preserve context.

Evidence: Marginal improvement in test diagnostics for arch test code. Human judgment needed on whether defensive wrapping in test code is warranted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — flagged for design decision. The question is whether defensive try/except with backend name context is warranted in test code for inspect.getsource(). Human judgment needed on complexity vs. diagnostic value tradeoff.

violations: list[str] = []
for name, cls in BACKEND_REGISTRY.items():
source = inspect.getsource(cls)
tree = ast.parse(source)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation from local review round 0:

[warning] defense: ast.parse(source) can raise SyntaxError with no indication of which backend's source failed to parse. Wrap in try/except and include name in the error message.

Evidence: SyntaxError in well-maintained Python backends is unlikely but possible. Adding try/except with name improves triage. Human decision on whether worth adding complexity to test code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — flagged for design decision. SyntaxError wrapping in ast.parse() for well-maintained Python backends is unlikely but possible. Human decision on whether the added complexity is worth the diagnostic improvement.

Ensure consistent env isolation across all tests in test_env_symmetry.py by
deleting AUTOSKILLIT_AGENT_BACKEND in the autouse _clean_env fixture — preventing
host-env state from leaking into test_skill_and_food_truck_share_required_env_vars.
@Trecek Trecek added this pull request to the merge queue May 31, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 31, 2026
@Trecek Trecek added this pull request to the merge queue May 31, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 31, 2026
@Trecek Trecek added this pull request to the merge queue May 31, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant