fix: deterministic completion signal for codex-rescue (stop waiting on finished runs)#346
fix: deterministic completion signal for codex-rescue (stop waiting on finished runs)#346zknpr wants to merge 12 commits into
Conversation
… running signal Every Agent(codex:codex-rescue) return is suffixed by the Claude Code harness with `agentId: <id> (use SendMessage with to: '<id>' to continue this agent)`. This is a resume token for the same agent thread, not a "still running" or "in background" status indicator. Callers (including Claude itself) have repeatedly misread it as the latter and then incorrectly told users that Codex is still running in the background, when the work has actually completed. Adds one explicit caller-side rule in two places: - plugins/codex/skills/codex-result-handling/SKILL.md — the internal guidance the caller consults when presenting Codex output. - plugins/codex/commands/rescue.md Operating rules — the most common user-facing entry point (`/codex:rescue`). No code changes; both files are caller-side documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ops waiting on finished runs A synchronous codex-rescue return carried no machine-checkable completion signal — only raw Codex text plus the harness resume-token suffix — so the main thread inferred status from prose and concluded "still running". For long runs the Claude Code Bash tool auto-backgrounds the companion call past its ~600s cap and returns "Command running in background... you will be notified", which the subagent faithfully relays; the detached run then completes but is orphaned. This is model-independent (reproduced on Opus), so it is not fixable by subagent instructions or a model bump. Fix at the structural layer: - Companion stamps a deterministic sentinel: [[codex-task status=complete]] (foreground) / [[codex-task status=dispatched id=<jobId>]] (background); the background dispatch message no longer falsely promises a notification. - New PostToolUse hook (codex-rescue-completion-hook.mjs) injects an authoritative line into the main thread on every codex-rescue return: complete -> done/verify-on-disk; Bash auto-background -> still finishing detached, don't wait; dispatched -> poll /codex:status. - Reliable --background routing: explicit --background is honored, long open-ended work defaults to task --background, foreground only for short bounded runs. - Docs (rescue.md, codex-cli-runtime, codex-result-handling) tell the main thread to trust the sentinel/hook over prose. Tests: 92/92, incl. new task-status-token + hook coverage and updated command/ runtime assertions. Validated a real --background lifecycle end to end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d651c502a3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| function isSynchronousAgentReturn(toolResponse) { | ||
| const status = String(toolResponse?.status ?? "").toLowerCase(); | ||
| return status !== "async_launched" && status !== "running"; |
There was a problem hiding this comment.
Restrict completion context to successful Agent returns
Treating every non-running/async_launched status as synchronous completion causes false “COMPLETED and exited” guidance for failed rescue calls. In isSynchronousAgentReturn, statuses like failed/errored pass this check, and buildCompletionContext then falls back to COMPLETE_CONTEXT when no sentinel is present, so the parent thread is told the run finished successfully even when the Agent tool actually failed. This can suppress retries/escalation and mislead users about the true execution state.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e1b8ae3. Success is now asserted only when the [[codex-task status=complete]] sentinel is present. A failure status (failed/error/timeout/...) or an empty return now yields a non-success line — "exited WITHOUT a success signal ... may have failed or produced no result ... re-run or escalate" — and a tokenless-but-otherwise-fine return gets a neutral "exited, verify on disk" line, never a false "COMPLETED ... final result". Every branch still emits "do not wait", so a failed run can't reintroduce the original hang. Added failed-status and empty-return test cases.
Addresses the PR openai#346 Codex review (P1): buildCompletionContext fell back to the "COMPLETED ... final result" line for any synchronous return without a sentinel — including failed/errored/empty returns — falsely signaling success and suppressing retry/escalation. Now success is asserted only when the [[codex-task status=complete]] sentinel is present. A failure status (failed/error/timeout/...) or an empty return (codex-rescue returns nothing when Codex can't be invoked) gets a failure-aware "exited without a success signal — may have failed, verify/retry" line; any other tokenless return gets a neutral "exited, verify on disk" line. Every branch still says "do not wait" so a failed run doesn't reintroduce the original waiting bug. Tests: 94/94 (added failed-status and empty-return cases; tokenless default is now neutral). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1b8ae3407
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const message = String(parsedResult?.failureMessage ?? "").trim() || "Codex did not return a final message."; | ||
| return `${message}\n`; | ||
| return appendTaskStatusToken(message, completeToken); |
There was a problem hiding this comment.
Avoid stamping complete token on failure fallback output
When rawOutput is empty, this branch still appends [[codex-task status=complete]] to a failure fallback message, so non-zero/failed task runs can carry a synthetic "complete" sentinel. The new PostToolUse hook treats that token as authoritative and emits the "COMPLETED and exited" context before checking failure status, which can misclassify failed rescue executions as successful completions and suppress retry/escalation behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 65d8320. renderTaskResult now appends the complete sentinel only on a successful exit (status 0); empty/failure fallback output renders without it, so a failed run no longer carries a success signal. Added a failure-no-sentinel test.
A foreground `task` ran the Codex turn in-process, so when the Bash tool auto-backgrounded a long foreground call and the owning subagent was reaped, the run was killed mid-task and the work was lost with no resumable record. Now the foreground path enqueues the same detached (`detached` + `unref`) worker the `--background` path uses and waits for it inline, printing the worker's stored result. If the inline wait is killed, the detached worker survives and the result is recoverable via /codex:status. Completed-run output is byte-identical (the worker stamps the [[codex-task status=complete]] sentinel). The inline wait degrades to a "still running" status after the existing wait cap instead of blocking forever. enqueueBackgroundTask now writes the job record before spawning to close a worker-startup race. Tests: 95/95 (new test proves foreground goes through the detached worker via the queued-job record + log line, with identical stdout). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
The task handler documents --wait as an explicit foreground choice, but when both flags are present this branch still routes to background because it only checks options.background and never errors on the conflict. In that input case, the command silently does the opposite of an explicit wait request, which can unexpectedly detach work and require manual status polling.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const snapshot = await waitForSingleJobSnapshot(cwd, job.id, { | ||
| pollIntervalMs: FOREGROUND_TASK_POLL_INTERVAL_MS | ||
| }); |
There was a problem hiding this comment.
Avoid timing out foreground task waits after 4 minutes
runForegroundTaskWorker waits on waitForSingleJobSnapshot without a timeout override, so it inherits the 240000ms default and exits non-zero once that deadline is hit even though the detached worker may still be running. Any foreground rescue that legitimately runs longer than ~4 minutes will be reported as failed/unfinished to the caller, which can trigger incorrect retry/escalation behavior and duplicate work while the original task continues in the background.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 65d8320. runForegroundTaskWorker now passes an unbounded timeout (timeoutMs: Infinity) instead of inheriting the 240s status default, so long xhigh foreground runs wait for the real result; the detached worker still provides survival if the waiting process is killed.
| status: "queued", | ||
| phase: "queued", | ||
| pid: child.pid ?? null, | ||
| pid: null, |
There was a problem hiding this comment.
Preserve worker PID for queued task cancellation
The queued job record now stores pid: null, and the detached worker PID is not persisted before returning. /codex:cancel relies on job.pid to terminate active jobs, so cancelling immediately after dispatch (while still queued) can mark the job as cancelled without actually stopping the worker; that worker can then start and finish anyway, potentially applying writes after the user thought cancellation succeeded.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 65d8320. After spawning the detached worker, the job record is updated with the worker pid (guarded by isActiveJobStatus to avoid clobbering a finished job), so /codex:cancel can terminate it. The pre-spawn record write is kept to avoid the worker-startup race.
| const message = String(parsedResult?.failureMessage ?? "").trim() || "Codex did not return a final message."; | ||
| return `${message}\n`; | ||
| return appendTaskStatusToken(message, completeToken); |
There was a problem hiding this comment.
Gate complete sentinel on successful task exit status
renderTaskResult now appends [[codex-task status=complete]] unconditionally, including fallback failure messages. runTrackedJob marks non-zero task exits as failed, but the stored rendered output still carries this success token, and the new PostToolUse hook treats that token as authoritative completion success. This lets failed/interrupted rescue runs be misclassified as completed, suppressing retry/escalation exactly when the task did not succeed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 65d8320. The complete sentinel is now gated on a successful exit status (status 0) rather than on rawOutput presence, so non-zero/failed task exits no longer emit it.
…reground wait, worker pid Three regressions from e1b8ae3/c4722b5: - renderTaskResult stamped [[codex-task status=complete]] on failure/empty fallback output, so a failed run carried the success sentinel and the completion hook read it as success. The sentinel is now appended only on a successful exit (status 0); failure/empty output renders without it. - runForegroundTaskWorker inherited the 240s status-wait default, so foreground xhigh runs (routinely >4min) timed out instead of returning the result. The inline wait is now unbounded; the detached worker provides survival. - enqueueBackgroundTask left the queued record pid: null and never persisted the spawned worker pid, breaking /codex:cancel. The worker pid is now written after spawn, guarded against clobbering a finished job. Tests: 97/97 (added failure-no-sentinel + worker-pid coverage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A companion --background job has no push channel to the main thread, and the subagent often paraphrases the [[...dispatched...]] sentinel away — so the hook could even report a still-running job as "done" (found in real testing). - The PostToolUse hook now uses companion STATE as ground truth: if the workspace has an active (queued/running) task job after a codex-rescue return, it injects a watcher instruction — arm `status <jobId> --wait` via the Bash tool with run_in_background=true, which exits and re-invokes Claude on terminal status. Robust against paraphrased/dropped dispatch tokens and the Bash auto-background case. - The companion --background dispatch message and the rescue/result-handling docs now carry the same watcher command. Gives background codex-rescue a reliable completion notification — the backgrounded `status --wait` watcher isn't subject to the 600s foreground cap. Tests: 98/98 (new state-seeded watcher test incl. a paraphrased-dispatch case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2071e212aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input)); | ||
| if (activeTaskJob) { | ||
| return buildWatcherContext(activeTaskJob.id); |
There was a problem hiding this comment.
Check completion token before inferring active task state
buildCompletionContext returns a watcher message as soon as any queued/running task exists in companion state, before it inspects the current tool output for [[codex-task status=complete]]. In a workspace with an unrelated background task already running, a newly completed codex:codex-rescue foreground run will be misclassified as still running, causing the parent thread to ignore the actual final result and keep waiting/polling. This regression is introduced by prioritizing global active-job state over the response’s explicit completion sentinel.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 11db7d5. buildCompletionContext now checks the completion/dispatch token BEFORE the active-task-state inference, so a return carrying the complete sentinel returns COMPLETE even when an unrelated active task job exists; the state inference is only a fallback for paraphrased/dropped dispatch tokens.
| [...loadState(cwd).jobs] | ||
| .filter((job) => job?.jobClass === "task" && ACTIVE_TASK_STATUSES.has(String(job.status ?? ""))) |
There was a problem hiding this comment.
Scope active-task detection to the current Claude session
findNewestActiveTaskJob scans all tracked task jobs in the workspace without any session filter, so the PostToolUse hook can attach a watcher for a different Claude session’s job. When two sessions run in the same repo, a rescue return in session A can be annotated with session B’s running job id, sending the user to poll/cancel the wrong job and masking the actual state of the current run.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 11db7d5. findNewestActiveTaskJob is now scoped to the hook input's session_id (matching job.sessionId), with an unscoped fallback only when no session_id is provided, so it won't point the watcher at another Claude session's active job.
| const BASH_AUTO_BACKGROUND_OUTPUT_PREFIX = "Output is being written to:"; | ||
|
|
||
| function buildWatcherContext(jobId) { | ||
| return `codex-rescue background job ${jobId} is RUNNING — there is no automatic push notification. To be notified, arm a watcher: run this via the Bash tool with run_in_background=true: node ${COMPANION_SCRIPT_PATH} status ${jobId} --wait --timeout-ms 1800000 — it blocks until the job is terminal, then exits and re-invokes you. If it returns and the job is still running, re-arm the same command. Do NOT treat the job as done until the watcher reports a terminal status.`; |
There was a problem hiding this comment.
Quote companion script path in watcher command text
The watcher guidance interpolates COMPANION_SCRIPT_PATH into a shell command without quoting it. On installations where the plugin path contains spaces (for example under Application Support), the suggested node ... status command is split into multiple arguments and fails, so users who follow the injected context cannot arm the completion watcher and may miss terminal job transitions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 11db7d5. The watcher command now quotes the companion path: node "<COMPANION_SCRIPT_PATH>" status ... — handles paths with spaces.
…ession scope, quoting - Check the completion/dispatch token BEFORE inferring active-task state, so a return that actually completed isn't masked by an unrelated active job. - Scope findNewestActiveTaskJob to the hook's session_id (job.sessionId match), with an unscoped fallback when no session_id is present — avoids pointing the watcher at another Claude session's job. - Quote the companion path in the watcher command (handles paths with spaces). Tests: 100/100 (added complete-token-beats-active-job, session-scoping, and quoted-path cases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ackground/--wait conflict - Hook: the dispatched-token branch now emits the full watcher instruction (buildWatcherContext) instead of a bare "poll /codex:status", matching the state-inference fallback; removed the now-unused buildDispatchedContext. - Companion: handleTask now rejects conflicting --background + --wait with an error, mirroring the --resume/--fresh guard (PR openai#346 review P2 — previously --background silently won, ignoring the explicit foreground --wait). Tests: 101/101. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed the P2 from the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 153bd63aec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input), input?.session_id); | ||
| if (activeTaskJob) { | ||
| return buildWatcherContext(activeTaskJob.id); | ||
| } |
There was a problem hiding this comment.
Guard active-job fallback with invocation-specific evidence
This fallback infers that the current codex:codex-rescue call is still running whenever any task in the same session is queued/running, and it runs before the failure/empty-return handling. In a session with an unrelated background rescue already active, a new foreground rescue that exits without a sentinel (for example, Codex invocation failure or token loss in agent output) will be misclassified as still running and mapped to the wrong job ID, which can hide a failed run and send users to poll/cancel the wrong task. The fallback should only trigger when the current invocation is tied to that job (e.g., dispatched token/job ID or other invocation-specific signal).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 186e103. The global active-job fallback now only fires when THIS return contains invocation-specific background evidence (a BACKGROUND_INTENT_PATTERN match), so a sentinel-less/failed foreground return with an unrelated concurrent active task is no longer misclassified as still-running — it falls through to the failure/neutral branch. Session scoping retained.
| writeJobFile(job.workspaceRoot, job.id, queuedRecord); | ||
| upsertJob(job.workspaceRoot, queuedRecord); | ||
|
|
||
| const child = spawnDetachedTaskWorker(cwd, job.id); |
There was a problem hiding this comment.
Re-check cancellation before spawning detached worker
Fresh evidence in this commit: enqueueBackgroundTask now persists a cancellable queued record (pid: null) before spawn, but still unconditionally spawns the detached worker afterward. If /codex:cancel lands in that window, the job is marked cancelled with no PID to kill, and this code can still start the worker anyway, allowing writes to continue after the user saw a successful cancel. Re-read job state before spawn (or atomically claim start) and skip spawning when status is no longer active.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 186e103. enqueueBackgroundTask re-reads the job record immediately before spawnDetachedTaskWorker and skips the spawn (returning the terminal status, logging the skip) when /codex:cancel marked it non-active in the write->spawn window.
| const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input), input?.session_id); | ||
| if (activeTaskJob) { | ||
| return buildWatcherContext(activeTaskJob.id); |
There was a problem hiding this comment.
Prefer Bash auto-background marker over global active-job fallback
When the Agent output explicitly contains Bash’s auto-background marker, that is invocation-specific evidence that this run detached past the foreground cap; however, the hook first checks for any active task in session state and returns a watcher for that job. In sessions with another active task, auto-backgrounded foreground runs will be misattributed to the unrelated job and lose the output-path guidance needed to follow the detached process. Check the auto-background marker before global state fallback (or require job correlation for fallback).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 186e103. The Bash auto-background marker (invocation-specific) is now checked BEFORE the global state fallback, so an auto-backgrounded run keeps its streamed-output-path guidance instead of being misattributed to an unrelated active job.
| const snapshot = await waitForSingleJobSnapshot(cwd, job.id, { | ||
| // PR #346 review: foreground xhigh waits must survive beyond the 240s status default. | ||
| timeoutMs: Infinity, | ||
| pollIntervalMs: FOREGROUND_TASK_POLL_INTERVAL_MS |
There was a problem hiding this comment.
Bound foreground waits when detached worker fails to start
Foreground task now always waits with timeoutMs: Infinity after dispatching a detached worker. If that worker never transitions the job out of queued/running (for example, process spawn failure/resource exhaustion or early worker bootstrap failure), this wait loop never terminates and the caller hangs indefinitely instead of surfacing an actionable error. Keep a finite guard timeout or detect launch failure before entering an unbounded wait.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 186e103. The foreground Infinity wait now detects when the detached worker pid has exited while the job is still queued/running (failWhenWorkerExits + pid-liveness) and bails with 'background worker exited before completing; check /codex:status ' instead of hanging. Liveness-based, so legitimately long xhigh runs still wait.
- Hook: gate the global active-job fallback on invocation-specific background evidence (BACKGROUND_INTENT_PATTERN) and check the Bash auto-background marker before it, so a sentinel-less foreground failure isn't misrouted to an unrelated concurrent task's job. (P1 + P2) - Companion: enqueueBackgroundTask re-reads the record before spawning and skips the worker if /codex:cancel already marked it terminal in the write->spawn window. (P1) - Companion: the foreground Infinity wait now bails with an actionable error when the detached worker pid exits while the job is still active (pid-liveness, so long runs still wait), plus a brief missing-job retry for the worker-writes-record race. (P2) Tests: 104/104. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 186e103504
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const statusToken = buildTaskDispatchedStatusToken(payload.jobId); | ||
| return [ | ||
| statusToken, | ||
| `${payload.title} dispatched as background job ${payload.jobId}.`, |
There was a problem hiding this comment.
Skip dispatched token when launch status is terminal
renderQueuedTaskLaunch always emits [[codex-task status=dispatched ...]] and “dispatched as background job ...” text, but enqueueBackgroundTask can now return a terminal payload (for example status: "cancelled") when cancellation wins before worker spawn. In that race, non-JSON callers are told the job was dispatched and should be watched even though no worker was launched, which misreports state and can send users into unnecessary polling/wait logic; only queued/running launches should produce the dispatched sentinel.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a7aaff1. renderQueuedTaskLaunch is now status-aware: a terminal enqueue payload (e.g. cancelled/failed before a worker launches) renders a plain terminal message with no [[codex-task status=dispatched …]] sentinel and no watcher guidance; only queued/running launches emit the dispatched token.
Covered by task --background cancelled before launch renders terminal text without a dispatched token, which asserts the output contains neither the dispatched sentinel nor "dispatched as background job" and instead shows "was cancelled before a worker launched; no work started."
| writeJobFile(job.workspaceRoot, job.id, { | ||
| ...spawnedRecord, | ||
| pid: child.pid ?? null | ||
| }); |
There was a problem hiding this comment.
Re-check cancellation before writing spawned worker PID
After spawning the worker, this branch writes pid back based on a previously read spawnedRecord status. If /codex:cancel lands between that read and this write, cancellation can be overwritten with a queued record and live PID, effectively resurrecting a job the user already cancelled and allowing detached work to continue. Re-read state (or atomically compare-and-set) immediately before persisting the PID so cancelled terminal state cannot be clobbered during launch races.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a7aaff1. persistSpawnedTaskWorkerPid now writes the pid and re-checks cancellation inside a single updateState read-modify-write: if the job is already terminal (a /codex:cancel won the post-spawn race), the pid is not written and the just-spawned worker is killed via terminateProcessTree instead of being resurrected by stale launch data. The terminal check (isTerminalJobRecord) consults both the summary state row and the per-job file, since either can be observed first during a race.
Covered by task --background does not persist a worker pid when cancellation wins after spawn, which injects the cancellation in the post-spawn window and asserts the job stays cancelled, pid stays null, and the orphaned worker process group is killed.
| } | ||
|
|
||
| async function runForegroundTaskWorker(cwd, job, request, options = {}) { | ||
| enqueueBackgroundTask(cwd, job, request); |
There was a problem hiding this comment.
Guard foreground path against detached spawn error events
Foreground task now always routes through enqueueBackgroundTask, which spawns a detached worker process; if that spawn fails (for example due transient ENOENT/resource errors), Node emits an error event on the child process and this path has no handler, so the parent process can crash before returning a structured failure or updating job state. Because this commit moved all foreground runs onto this launch path, the failure mode now affects normal synchronous rescue execution as well, not just optional background mode.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a7aaff1. spawnDetachedTaskWorker now wraps spawn in try/catch for synchronous failures, attaches child.once("error", …) for async error events, and treats a missing child.pid as a launch failure. Any spawn failure routes through markTaskLaunchFailed, which atomically marks the job failed so the foreground waiter surfaces a structured error instead of letting an unhandled error event crash the parent.
Covered by foreground task reports a structured failure when the detached worker spawn fails, which emits an async ENOENT error event with no pid and asserts no ETIMEDOUT hang, no Unhandled 'error' event, and a failed payload with the spawn-failure message.
…penai#346 review) Resolve the launch/cancel race class and spawn-error crash flagged on 186e103 by routing every launch/cancel job-record mutation through a single atomic updateState read-modify-write instead of read-then-write checks: - P1 cancel-resurrect race (persistSpawnedTaskWorkerPid): the worker pid is now written and cancellation re-checked inside one updateState transition. If a concurrent /codex:cancel already made the job terminal, the pid is NOT written and the just-spawned worker is killed (terminateProcessTree) instead of being resurrected by stale launch data. - P1 unhandled detached spawn error (spawnDetachedTaskWorker): wrap spawn in try/catch for sync failures, attach child.once('error') for async failures, and treat a missing child.pid as a launch failure. Any spawn failure atomically marks the job failed (markTaskLaunchFailed) so the foreground waiter surfaces a structured error instead of crashing or hanging. - P2 dispatched sentinel on terminal launch (renderQueuedTaskLaunch): a terminal enqueue payload (cancelled/failed before worker launch) now renders a plain terminal message with NO dispatched sentinel or watcher guidance, so hooks and humans never treat a non-started job as live background work. New helpers isTerminalJobRecord/resolveLaunchStatus check both the summary state row and the per-job file, since either can be observed first during a race. Tests: 3 new runtime cases (cancel-after-spawn kills the orphan and keeps the job cancelled with no pid; spawn failure yields a structured failed payload with no hang/crash; cancel-before-launch renders terminal text with no dispatched token). Full suite 107/107 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7aaff1ebd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (token?.status === "complete") { | ||
| return COMPLETE_CONTEXT; |
There was a problem hiding this comment.
Ignore forged complete tokens on failed Agent responses
Treating [[codex-task status=complete]] as authoritative before checking the Agent tool status allows a failed codex:codex-rescue return to be misclassified as success whenever the model output itself contains that literal line (for example, if the prompt or generated text echoes it). In that case the hook emits the “COMPLETED and exited” guidance even though tool_response.status is failed, which can suppress retries/escalation and mislead the parent thread about the real run outcome.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in bb53f47. The structured Agent tool status is now authoritative over any [[codex-task …]] line in the body: buildCompletionContext short-circuits to the failure context via a new hasFailureStatus(tool_response) check before parsing the token, so a failed return whose text merely echoes [[codex-task status=complete]] is reported as a failure, not success. A genuine success (status completed/success/absent — none of which are in FAILURE_STATUSES) still falls through and honors the complete token, so no regression.
Covered by codex-rescue hook reports failure (not success) when a failed return echoes the complete token: status: "failed" + a standalone complete-token line asserts the emitted context matches /exited WITHOUT a success signal/ and not /COMPLETED and exited/.
| if (token?.status === "dispatched" && token.id) { | ||
| return buildWatcherContext(token.id); |
There was a problem hiding this comment.
Reject dispatched tokens from failed Agent responses
The hook trusts a [[codex-task status=dispatched id=...]] line even when the Agent call itself failed, so any failed return that echoes that literal token is reclassified as an active background launch and gets watcher instructions for that job id. This can hide real failures and redirect users to poll/cancel the wrong task, because the dispatched branch runs before failure-status handling.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in bb53f47 by the same early failure-status gate: because hasFailureStatus returns the failure context before token handling, the dispatched-token branch is never reached for a failed return, so an echoed [[codex-task status=dispatched id=…]] no longer redirects to watcher/poll instructions for a phantom job.
Covered by codex-rescue hook reports failure (not dispatch) when a failed return echoes a dispatched token: status: "failed" + a dispatched-token line asserts the failure context and that no status … --wait watcher instruction is emitted.
| function isSynchronousAgentReturn(toolResponse) { | ||
| const status = String(toolResponse?.status ?? "").toLowerCase(); | ||
| return status !== "async_launched" && status !== "running"; | ||
| } |
There was a problem hiding this comment.
Treat sub_agent_entered as a non-terminal Agent status
isSynchronousAgentReturn currently treats every status except running/async_launched as terminal, which includes Agent responses with status: "sub_agent_entered" (interactive subagent handoff). In that state the rescue agent has not finished, but this hook will emit “exited / do not wait” context, causing the parent flow to stop waiting while the subagent is still active.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in bb53f47. isSynchronousAgentReturn now treats sub_agent_entered as non-terminal (added to a NON_TERMINAL_AGENT_STATUSES set alongside async_launched/running), so the hook stays silent during an interactive subagent handoff instead of emitting "exited / do not wait."
One note in the interest of honesty: I could not independently confirm sub_agent_entered against the harness's Agent tool status enum, so I've implemented this as a defensive guard — it is a no-op if that status never occurs and correct if it does. If you can point me at the authoritative status list I'll align the set exactly (and add any sibling non-terminal statuses).
Covered by codex-rescue hook stays silent on sub_agent_entered handoff: a sub_agent_entered return asserts exit 0 and empty stdout (no additionalContext).
| @@ -774,20 +1186,23 @@ async function handleTask(argv) { | |||
| return; | |||
There was a problem hiding this comment.
Exit non-zero when background task launch returns failed
handleTask always returns success from the --background path after printing the payload, even when enqueueBackgroundTask reports a terminal failure (for example worker spawn/pid failure). This makes task --background exit with code 0 while returning { "status": "failed" }, so scripts and calling agents can treat a failed dispatch as successful and skip retry/escalation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in bb53f47. The --background branch of handleTask now sets process.exitCode = 1 when payload.status === "failed" (a worker spawn/pid failure from enqueueBackgroundTask), so a failed dispatch no longer exits 0 while printing { "status": "failed" }. A cancelled launch stays exit 0, since a user-initiated cancel before launch is not an error.
Covered by task --background exits non-zero when the worker spawn fails: a stubbed async spawn ENOENT asserts a non-zero exit, payload.status === "failed", and no Unhandled 'error' event.
…instead of hanging The background-completion watcher reaches detached workers through `status <jobId> --wait`, but that path called waitForSingleJobSnapshot WITHOUT failWhenWorkerExits — the dead-worker liveness check (hasExitedActiveWorker) was only wired into the foreground task path. So when a --background worker died without writing a terminal status (crash/kill/teardown), the job stayed `running` forever, the watcher polled that stale status until --timeout-ms elapsed (e.g. 30 min) and returned a misleading timeout, and the same stale `running` blocked /codex:result and kept the completion hook treating the job as active. Fix (reuse the proven foreground helpers, no new machinery): - handleStatus --wait now passes failWhenWorkerExits: true; on workerExited it reconciles via failActiveWorkerJob (persists status=failed, pid=null, errorMessage) and renders the crash, mirroring runForegroundTaskWorker. - A plain status <jobId> read also reconciles a dead active worker, so result, hooks, and later status callers see the durable failure (failActiveWorkerJob persists globally, so the first read repairs state for everyone). No false positives: hasExitedActiveWorker requires a recorded pid that is not alive, so pid-null/queued and live-pid jobs are never reconciled. Tests (+3, suite 110/110): a dead-pid running job makes status --wait return promptly as failed (no ETIMEDOUT) and persists failed/pid null; a plain status read reconciles the same; a live-pid (process.pid) running job still times out and is NOT reconciled (false-positive guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on-zero on failed launch (PR openai#346 review) Address the 4 findings from the Codex review of a7aaff1: - P1 (hook): the completion hook honored a [[codex-task status=complete]] / [[codex-task status=dispatched]] line in the response body BEFORE checking the Agent tool status, so a failed return whose text merely echoed the literal token (model output or prompt echo) was misclassified as success / active dispatch — suppressing retries and pointing watchers at a phantom job. The structured tool status is authoritative: a known failure status now short-circuits to the failure context before any token handling. A genuine success (status completed/success/none) still honors the complete token. - P2 (hook): isSynchronousAgentReturn treated every status except async_launched/running as terminal, including sub_agent_entered (interactive subagent handoff). The hook now treats sub_agent_entered as non-terminal and stays silent so it never tells the parent "exited" while the subagent is live. - P1 (companion): handleTask --background printed the launch payload and always returned exit 0, even when enqueueBackgroundTask reported status=failed (worker spawn/pid failure). It now sets a non-zero exit code on a failed launch so callers don't treat a failed dispatch as success (cancelled stays exit 0). Tests (+4, suite 114/114): a failed return echoing the complete token yields the failure context (not COMPLETED); a failed return echoing a dispatched token yields the failure context with no watcher instruction; a sub_agent_entered return emits no context; task --background exits non-zero with status=failed on spawn failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Problem
The main Claude Code thread treated a synchronous
codex:codex-rescuereturn as "still running" and waited for a notification that never came — for work that had already finished.Root cause (model-independent — reproduced on Opus): a synchronous rescue return carries no machine-checkable completion signal, only raw Codex text plus the harness resume-token suffix. For long runs, the Claude Code Bash tool auto-backgrounds the companion call past its ~600s cap and returns "Command running in background… you will be notified", which the subagent faithfully relays; the detached run then completes but is orphaned. Because a real timeout reads as "still running" to any model, this can't be fixed by subagent instructions or a model bump.
(Supersedes the original docs-only clarification on this branch, which only documented the resume-token suffix — that commit is retained underneath.)
Fix — a deterministic signal at a layer the model can't misread
[[codex-task status=complete]](foreground) /[[codex-task status=dispatched id=<jobId>]](background). The background dispatch message no longer falsely promises a notification.codex-rescue-completion-hook.mjs): on every codex-rescue return, injects an authoritative line into the main thread — complete → done/verify-on-disk; Bash auto-background → "still finishing detached, don't wait"; dispatched → poll/codex:status. Ignores non-codex-rescue agents and genuine async launches.--backgroundrouting: explicit--backgroundhonored; long/open-ended work defaults totask --background; foreground stays for short bounded runs. Resolves the ambiguous "strip--background" wording.rescue.md,codex-cli-runtime,codex-result-handling): trust the sentinel/hook over prose.Tests
92/92 (
node --test). New coverage: token scheme, hook behavior (incl. the Bash auto-background branch and other-subagent ignore), and the--backgrounddispatch token. Validated a real--backgroundlifecycle end-to-end (dispatch → running → completed → result).🤖 Generated with Claude Code