Codex Backend Architectural Immunity — Capability Consumption, Symmetric Env, and NotImplementedError Arch Tests#3395
Conversation
823947a to
dae82fa
Compare
- tests/arch/test_env_symmetry.py: Verify build_skill_session_cmd and build_food_truck_cmd share required env vars (xfail: AGENT_BACKEND_ENV_VAR missing from food_truck) - tests/arch/test_no_not_implemented.py: Verify no BACKEND_REGISTRY class raises NotImplementedError (xfail: CodexBackend has two unimplemented methods) - tests/server/test_server_init_gate.py: Add test_tool_list_changes_after_enable_within_session to verify kitchen tag enable propagates to client tool list - tests/arch/CLAUDE.md: Register new test files - resolve-review/SKILL.md: Add to Architectural Constraint Catalog Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
xfail without a tracking issue number has no discoverable removal path. Adds #3385 reference so the xfail can be cleaned up when AUTOSKILLIT_AGENT_BACKEND is wired into build_food_truck_cmd.
…otation - assert new_tools & GATED_TOOLS instead of bare truthiness check — verifies that the newly visible tools are actually kitchen-gated tools, not arbitrary tools from a misconfigured tag. - Add -> None return type to test_tool_list_changes_after_enable_within_session for consistency with the surrounding class.
…ntedError detection - test_env_symmetry.py: assert BACKEND_REGISTRY before both loops so vacuous passes (empty registry → skipped assertions → false green) are detected - test_no_not_implemented.py: extend AST check to catch bare `raise NotImplementedError` (ast.Name), which the existing ast.Call-only branch silently missed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dae82fa to
aafec3d
Compare
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review: 1 critical issue (tab/space IndentationError in test_server_init_gate.py) and 2 warnings found. See inline comments.
Trecek
left a comment
There was a problem hiding this comment.
Observations accumulated from 1 local review rounds:
|
|
||
| violations: list[str] = [] | ||
| for name, cls in BACKEND_REGISTRY.items(): | ||
| source = inspect.getsource(cls) |
There was a problem hiding this comment.
Observation from local review round 0:
[warning] defense: inspect.getsource(cls) can raise OSError (source unavailable, C extension, etc.) with no error context identifying which backend triggered it. Wrap in try/except and include name in the error to preserve context.
Evidence: Marginal improvement in test diagnostics for arch test code. Human judgment needed on whether defensive wrapping in test code is warranted.
There was a problem hiding this comment.
Valid observation — flagged for design decision. The question is whether defensive try/except with backend name context is warranted in test code for inspect.getsource(). Human judgment needed on complexity vs. diagnostic value tradeoff.
| violations: list[str] = [] | ||
| for name, cls in BACKEND_REGISTRY.items(): | ||
| source = inspect.getsource(cls) | ||
| tree = ast.parse(source) |
There was a problem hiding this comment.
Observation from local review round 0:
[warning] defense: ast.parse(source) can raise SyntaxError with no indication of which backend's source failed to parse. Wrap in try/except and include name in the error message.
Evidence: SyntaxError in well-maintained Python backends is unlikely but possible. Adding try/except with name improves triage. Human decision on whether worth adding complexity to test code.
There was a problem hiding this comment.
Valid observation — flagged for design decision. SyntaxError wrapping in ast.parse() for well-maintained Python backends is unlikely but possible. Human decision on whether the added complexity is worth the diagnostic improvement.
Ensure consistent env isolation across all tests in test_env_symmetry.py by deleting AUTOSKILLIT_AGENT_BACKEND in the autouse _clean_env fixture — preventing host-env state from leaking into test_skill_and_food_truck_share_required_env_vars.
Summary
Add 3 new arch/server tests (plus verify 1 existing) to enforce structural invariants that prevent recurring Codex backend bugs. The tests cover: (1) verifying existing
BackendCapabilitiesconsumption test coverage, (2) symmetric env var contract betweenbuild_skill_session_cmdandbuild_food_truck_cmd, (3) noraise NotImplementedErrorin registered backend classes, and (4) tool-list visibility delta after kitchen enable within a live client session. Also updatetests/arch/CLAUDE.mdto register the new test files.Requirements
1. BackendCapabilities Consumption Contract
BackendCapabilitiesmust be consumed by at least one production read site insrc/, OR registered in_FORWARD_DECLAREDwith a linked issue numbertests/arch/test_capability_consumption.py— already exists. Verify coverage includes any new fields added by Rectify: Codex MCP Tool/Resource Visibility — notifications/tools/list_changed Immunity #3382 (supports_tool_list_changed). Currently has 4 entries in_FORWARD_DECLARED._FORWARD_DECLAREDdict requires a real#NNNNissue reference per field — enforced bytest_forward_declared_has_linked_issues().2. Symmetric Command Builder Env Vars
build_skill_session_cmdandbuild_food_truck_cmdon every backend must set the same required base env varstests/arch/(new test file or extendtest_backend_coherence.py)build_skill_session_cmdsetsAGENT_BACKEND_ENV_VAR,AUTOSKILLIT_HEADLESS_AUTO_GATE,AUTOSKILLIT_APPLICABLE_GUARDS— all missing frombuild_food_truck_cmd.BACKEND_REGISTRY. Thefood_truck_cmdrequiresplugin_sourcearg — useDirectInstall(plugin_dir=Path("/plugins"))per existing pattern.AUTOSKILLIT_SKILL_NAMEis IL-1-only (should NOT be in food truck). The test must define a precise "required-in-both" subset:AUTOSKILLIT_HEADLESS,AUTOSKILLIT_SESSION_TYPE,MAX_MCP_OUTPUT_TOKENS,AGENT_BACKEND_ENV_VAR.3. No NotImplementedError in Production Backends
BACKEND_REGISTRYshould raiseNotImplementedErrortests/arch/(new test file)CodingAgentBackendis atyping.Protocol(with@runtime_checkable), NOT an ABC — so there are no legitimate abstract stubs. Anyraise NotImplementedErrorin a registered backend class is a bug.raise NotImplementedErrorin classes fromBACKEND_REGISTRY.values().validate_skill_contentandlist_plugins).4. Notification-Support Integration Test
open_kitchenis called, the tool list visible to the client actually changestests/server/(extendtest_server_init_gate.py)from fastmcp.client import Client; async with Client(mcp) as client: tools = await client.list_tools()pattern.Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-20260530-215854-975848/.autoskillit/temp/make-plan/codex_backend_arch_immunity_plan_2026-05-30_220600.md🤖 Generated with Claude Code via AutoSkillit