Skip to content

[FIX] CancelledError immunity — _cancellation_shield decorator + run_skill inner handler#3390

Merged
Trecek merged 11 commits into
developfrom
asyncio-cancellederror-from-mcp-transport-teardown-drops-ses/3373
May 31, 2026
Merged

[FIX] CancelledError immunity — _cancellation_shield decorator + run_skill inner handler#3390
Trecek merged 11 commits into
developfrom
asyncio-cancellederror-from-mcp-transport-teardown-drops-ses/3373

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 31, 2026

Summary

asyncio.CancelledError from MCP transport teardown (SIGTERM/SIGINT/SIGHUP) propagates through MCP tool handlers (run_skill, dispatch_food_truck, record_gate_dispatch) without producing a structured SkillResult. The retry FSM is never consulted, no SkillResult is returned, and pipeline orchestrators proceed with stale artifacts. Part A adds: a SkillResult.cancelled() factory, the CANCELLED retry reason enum variant, a _cancellation_shield decorator for all MCP tool handlers, an explicit inner except asyncio.CancelledError handler in run_skill, and tests that prove the gap and verify the fix. Part B (separate) covers recipe schema routing.

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-193133-478133/.autoskillit/temp/rectify/rectify_cancelled_error_skill_result_gap_2026-05-30_193500.md

Closes #3373

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
rectify* opus[1m] 1 90 29.2k 2.8M 128.7k 235 112.4k 25m 14s
review_approach* sonnet 1 3.5k 7.5k 235.7k 51.5k 80 39.1k 5m 9s
dry_walkthrough* opus 2 86 16.3k 988.4k 62.4k 216 106.0k 10m 24s
implement* sonnet 2 1.1k 69.0k 16.5M 176.1k 375 276.5k 21m 59s
audit_impl* sonnet 2 864 28.6k 637.6k 76.8k 132 126.5k 11m 28s
make_plan* sonnet 1 95 6.1k 352.9k 42.4k 34 44.7k 2m 38s
prepare_pr* sonnet 1 120.6k 4.8k 226.2k 33.0k 24 55.3k 1m 30s
compose_pr* sonnet 1 108.5k 1.4k 246.5k 27.8k 20 16.5k 46s
review_pr* sonnet 2 324 60.8k 2.0M 95.9k 164 150.4k 15m 35s
resolve_review* opus 2 135 38.8k 6.9M 100.4k 174 224.6k 32m 55s
resolve_pre_review_conflicts* opus 1 37 6.2k 1.2M 85.5k 42 69.0k 2m 31s
Total 235.3k 268.7k 32.1M 176.1k 1.2M 2h 10m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
rectify 0
review_approach 0
dry_walkthrough 0
implement 242 68208.6 1142.5 285.3
audit_impl 0
make_plan 0
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 26 265776.8 8636.6 1492.2
resolve_pre_review_conflicts 2345 502.7 29.4 2.6
Total 2613 12266.6 467.2 102.8

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
opus[1m] 1 90 29.2k 2.8M 112.4k 25m 14s
sonnet 7 235.0k 178.3k 20.2M 709.0k 59m 8s
opus 3 258 61.3k 9.1M 399.5k 45m 51s

@Trecek Trecek force-pushed the asyncio-cancellederror-from-mcp-transport-teardown-drops-ses/3373 branch from 7335655 to 9bd95b7 Compare May 31, 2026 05:02
Trecek and others added 11 commits May 30, 2026 22:28
…ed phase)

- RetryReason.CANCELLED: transport-induced asyncio.CancelledError variant
- SkillResult.cancelled(): retriable factory (needs_retry=True, subtype='cancelled')
- fleet/state_types._ABANDON_REASONS: include CANCELLED (non-resumable teardown)
- test_retry_reason_values: add CANCELLED to exhaustiveness set
- test_retry_reason_has_cancelled_variant: new assertion
- test_skill_result_cancelled_factory: verify cancelled() factory fields
- test_retry_reason_value_count_is_16: updated from 15
- _ROUTING_EXCLUDED: add CANCELLED (transport event; not orchestrator-routed)
- _has_cancellation_shield: AST helper in tests/arch/_helpers.py
- test_all_mcp_tool_handlers_have_cancellation_shield: structural enforcement (fails until Phase 2)
- test_run_skill_returns_structured_result_on_cancelled_error: gap test (fails until Phase 2)
- test_dispatch_food_truck_returns_structured_result_on_cancelled_error: gap test (fails until Phase 2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…skill inner handler (green phase)

- _cancellation_shield: decorator catches asyncio.CancelledError at MCP tool boundary,
  returns schema-aware structured JSON (fleet_error, run_cmd, run_python, generic)
- run_skill: inner except asyncio.CancelledError returns SkillResult.cancelled() with
  resolved_command/effective_order_id provenance
- Applied @_cancellation_shield to all 52 @mcp.tool() handlers across 18 tool files
- Updated subpackage file count budget (server/tools: 22 → 23)
- Updated _LEGACY_JSON_WRITES line numbers for decorator insertion drift

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ror test

Strengthens test_dispatch_food_truck_returns_structured_result_on_cancelled_error
to verify the fleet_error envelope identifies a CancelledError specifically, not
just any failure envelope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… trim docstring duplication, fix run_python description

- Wrap logger.warning() inside shield in try/except Exception to prevent
  skipped return if logger raises
- Trim decorator docstring to usage-specific lines (removes duplication
  with module docstring)
- Give run_python its own description line instead of "same as run_cmd"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion without logger call

The reviewer's suggestion to wrap logger.warning() in try/except Exception
violates ARCH-003 (DS-001: broad except without logger call). Reclassified
as REJECT with category arch_violation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lScope

Wraps the logger.warning call in the inner CancelledError handler with
anyio.CancelScope(shield=True), consistent with _cancellation_shield.py pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or test

Adds a human-readable message check so a regression producing blank or
misleading result text would be caught.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The standalone assertion (RetryReason.CANCELLED == "cancelled") is already
covered by test_retry_reason_values which asserts CANCELLED in the full set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dler

Fixes test_all_mcp_tool_handlers_have_cancellation_shield arch test —
lock_ingredients was the only @mcp.tool() handler missing the decorator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek force-pushed the asyncio-cancellederror-from-mcp-transport-teardown-drops-ses/3373 branch from 9bd95b7 to b73f40c Compare May 31, 2026 05:28
@Trecek Trecek added this pull request to the merge queue May 31, 2026
Merged via the queue into develop with commit a19de6e May 31, 2026
3 checks passed
@Trecek Trecek deleted the asyncio-cancellederror-from-mcp-transport-teardown-drops-ses/3373 branch May 31, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant