Skip to content

fix: deterministic completion signal for codex-rescue (stop waiting on finished runs)#346

Open
zknpr wants to merge 12 commits into
openai:mainfrom
zknpr:fix/clarify-agent-resume-token-suffix
Open

fix: deterministic completion signal for codex-rescue (stop waiting on finished runs)#346
zknpr wants to merge 12 commits into
openai:mainfrom
zknpr:fix/clarify-agent-resume-token-suffix

Conversation

@zknpr
Copy link
Copy Markdown

@zknpr zknpr commented May 22, 2026

Problem

The main Claude Code thread treated a synchronous codex:codex-rescue return as "still running" and waited for a notification that never came — for work that had already finished.

Root cause (model-independent — reproduced on Opus): a synchronous rescue return carries no machine-checkable completion signal, only raw Codex text plus the harness resume-token suffix. For long runs, the Claude Code Bash tool auto-backgrounds the companion call past its ~600s cap and returns "Command running in background… you will be notified", which the subagent faithfully relays; the detached run then completes but is orphaned. Because a real timeout reads as "still running" to any model, this can't be fixed by subagent instructions or a model bump.

(Supersedes the original docs-only clarification on this branch, which only documented the resume-token suffix — that commit is retained underneath.)

Fix — a deterministic signal at a layer the model can't misread

  • Companion sentinel: [[codex-task status=complete]] (foreground) / [[codex-task status=dispatched id=<jobId>]] (background). The background dispatch message no longer falsely promises a notification.
  • PostToolUse hook (codex-rescue-completion-hook.mjs): on every codex-rescue return, injects an authoritative line into the main thread — complete → done/verify-on-disk; Bash auto-background → "still finishing detached, don't wait"; dispatched → poll /codex:status. Ignores non-codex-rescue agents and genuine async launches.
  • Reliable --background routing: explicit --background honored; long/open-ended work defaults to task --background; foreground stays for short bounded runs. Resolves the ambiguous "strip --background" wording.
  • Docs (rescue.md, codex-cli-runtime, codex-result-handling): trust the sentinel/hook over prose.

Tests

92/92 (node --test). New coverage: token scheme, hook behavior (incl. the Bash auto-background branch and other-subagent ignore), and the --background dispatch token. Validated a real --background lifecycle end-to-end (dispatch → running → completed → result).

🤖 Generated with Claude Code

… running signal

Every Agent(codex:codex-rescue) return is suffixed by the Claude Code harness
with `agentId: <id> (use SendMessage with to: '<id>' to continue this agent)`.
This is a resume token for the same agent thread, not a "still running" or
"in background" status indicator. Callers (including Claude itself) have
repeatedly misread it as the latter and then incorrectly told users that
Codex is still running in the background, when the work has actually
completed.

Adds one explicit caller-side rule in two places:

- plugins/codex/skills/codex-result-handling/SKILL.md — the internal guidance
  the caller consults when presenting Codex output.
- plugins/codex/commands/rescue.md Operating rules — the most common
  user-facing entry point (`/codex:rescue`).

No code changes; both files are caller-side documentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zknpr zknpr requested a review from a team May 22, 2026 19:45
…ops waiting on finished runs

A synchronous codex-rescue return carried no machine-checkable completion
signal — only raw Codex text plus the harness resume-token suffix — so the
main thread inferred status from prose and concluded "still running". For long
runs the Claude Code Bash tool auto-backgrounds the companion call past its
~600s cap and returns "Command running in background... you will be notified",
which the subagent faithfully relays; the detached run then completes but is
orphaned. This is model-independent (reproduced on Opus), so it is not fixable
by subagent instructions or a model bump.

Fix at the structural layer:
- Companion stamps a deterministic sentinel: [[codex-task status=complete]]
  (foreground) / [[codex-task status=dispatched id=<jobId>]] (background); the
  background dispatch message no longer falsely promises a notification.
- New PostToolUse hook (codex-rescue-completion-hook.mjs) injects an
  authoritative line into the main thread on every codex-rescue return:
  complete -> done/verify-on-disk; Bash auto-background -> still finishing
  detached, don't wait; dispatched -> poll /codex:status.
- Reliable --background routing: explicit --background is honored, long
  open-ended work defaults to task --background, foreground only for short
  bounded runs.
- Docs (rescue.md, codex-cli-runtime, codex-result-handling) tell the main
  thread to trust the sentinel/hook over prose.

Tests: 92/92, incl. new task-status-token + hook coverage and updated command/
runtime assertions. Validated a real --background lifecycle end to end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zknpr zknpr changed the title docs: clarify Agent harness's agentId suffix is a resume token, not a running signal fix: deterministic completion signal for codex-rescue (stop waiting on finished runs) May 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d651c502a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


function isSynchronousAgentReturn(toolResponse) {
const status = String(toolResponse?.status ?? "").toLowerCase();
return status !== "async_launched" && status !== "running";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict completion context to successful Agent returns

Treating every non-running/async_launched status as synchronous completion causes false “COMPLETED and exited” guidance for failed rescue calls. In isSynchronousAgentReturn, statuses like failed/errored pass this check, and buildCompletionContext then falls back to COMPLETE_CONTEXT when no sentinel is present, so the parent thread is told the run finished successfully even when the Agent tool actually failed. This can suppress retries/escalation and mislead users about the true execution state.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e1b8ae3. Success is now asserted only when the [[codex-task status=complete]] sentinel is present. A failure status (failed/error/timeout/...) or an empty return now yields a non-success line — "exited WITHOUT a success signal ... may have failed or produced no result ... re-run or escalate" — and a tokenless-but-otherwise-fine return gets a neutral "exited, verify on disk" line, never a false "COMPLETED ... final result". Every branch still emits "do not wait", so a failed run can't reintroduce the original hang. Added failed-status and empty-return test cases.

Addresses the PR openai#346 Codex review (P1): buildCompletionContext fell back to
the "COMPLETED ... final result" line for any synchronous return without a
sentinel — including failed/errored/empty returns — falsely signaling success
and suppressing retry/escalation.

Now success is asserted only when the [[codex-task status=complete]] sentinel
is present. A failure status (failed/error/timeout/...) or an empty return
(codex-rescue returns nothing when Codex can't be invoked) gets a failure-aware
"exited without a success signal — may have failed, verify/retry" line; any
other tokenless return gets a neutral "exited, verify on disk" line. Every
branch still says "do not wait" so a failed run doesn't reintroduce the
original waiting bug.

Tests: 94/94 (added failed-status and empty-return cases; tokenless default is
now neutral).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1b8ae3407

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/codex/scripts/lib/render.mjs Outdated
Comment on lines +327 to +328
const message = String(parsedResult?.failureMessage ?? "").trim() || "Codex did not return a final message.";
return `${message}\n`;
return appendTaskStatusToken(message, completeToken);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid stamping complete token on failure fallback output

When rawOutput is empty, this branch still appends [[codex-task status=complete]] to a failure fallback message, so non-zero/failed task runs can carry a synthetic "complete" sentinel. The new PostToolUse hook treats that token as authoritative and emits the "COMPLETED and exited" context before checking failure status, which can misclassify failed rescue executions as successful completions and suppress retry/escalation behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65d8320. renderTaskResult now appends the complete sentinel only on a successful exit (status 0); empty/failure fallback output renders without it, so a failed run no longer carries a success signal. Added a failure-no-sentinel test.

A foreground `task` ran the Codex turn in-process, so when the Bash tool
auto-backgrounded a long foreground call and the owning subagent was reaped,
the run was killed mid-task and the work was lost with no resumable record.

Now the foreground path enqueues the same detached (`detached` + `unref`)
worker the `--background` path uses and waits for it inline, printing the
worker's stored result. If the inline wait is killed, the detached worker
survives and the result is recoverable via /codex:status. Completed-run output
is byte-identical (the worker stamps the [[codex-task status=complete]]
sentinel). The inline wait degrades to a "still running" status after the
existing wait cap instead of blocking forever. enqueueBackgroundTask now writes
the job record before spawning to close a worker-startup race.

Tests: 95/95 (new test proves foreground goes through the detached worker via
the queued-job record + log line, with identical stdout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review


P2 Badge Reject conflicting --background and --wait flags

The task handler documents --wait as an explicit foreground choice, but when both flags are present this branch still routes to background because it only checks options.background and never errors on the conflict. In that input case, the command silently does the opposite of an explicit wait request, which can unexpectedly detach work and require manual status polling.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +733 to +735
const snapshot = await waitForSingleJobSnapshot(cwd, job.id, {
pollIntervalMs: FOREGROUND_TASK_POLL_INTERVAL_MS
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid timing out foreground task waits after 4 minutes

runForegroundTaskWorker waits on waitForSingleJobSnapshot without a timeout override, so it inherits the 240000ms default and exits non-zero once that deadline is hit even though the detached worker may still be running. Any foreground rescue that legitimately runs longer than ~4 minutes will be reported as failed/unfinished to the caller, which can trigger incorrect retry/escalation behavior and duplicate work while the original task continues in the background.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65d8320. runForegroundTaskWorker now passes an unbounded timeout (timeoutMs: Infinity) instead of inheriting the 240s status default, so long xhigh foreground runs wait for the real result; the detached worker still provides survival if the waiting process is killed.

status: "queued",
phase: "queued",
pid: child.pid ?? null,
pid: null,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve worker PID for queued task cancellation

The queued job record now stores pid: null, and the detached worker PID is not persisted before returning. /codex:cancel relies on job.pid to terminate active jobs, so cancelling immediately after dispatch (while still queued) can mark the job as cancelled without actually stopping the worker; that worker can then start and finish anyway, potentially applying writes after the user thought cancellation succeeded.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65d8320. After spawning the detached worker, the job record is updated with the worker pid (guarded by isActiveJobStatus to avoid clobbering a finished job), so /codex:cancel can terminate it. The pre-spawn record write is kept to avoid the worker-startup race.

Comment thread plugins/codex/scripts/lib/render.mjs Outdated
Comment on lines +327 to +328
const message = String(parsedResult?.failureMessage ?? "").trim() || "Codex did not return a final message.";
return `${message}\n`;
return appendTaskStatusToken(message, completeToken);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate complete sentinel on successful task exit status

renderTaskResult now appends [[codex-task status=complete]] unconditionally, including fallback failure messages. runTrackedJob marks non-zero task exits as failed, but the stored rendered output still carries this success token, and the new PostToolUse hook treats that token as authoritative completion success. This lets failed/interrupted rescue runs be misclassified as completed, suppressing retry/escalation exactly when the task did not succeed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65d8320. The complete sentinel is now gated on a successful exit status (status 0) rather than on rawOutput presence, so non-zero/failed task exits no longer emit it.

zknpr and others added 2 commits May 25, 2026 01:38
…reground wait, worker pid

Three regressions from e1b8ae3/c4722b5:
- renderTaskResult stamped [[codex-task status=complete]] on failure/empty
  fallback output, so a failed run carried the success sentinel and the
  completion hook read it as success. The sentinel is now appended only on a
  successful exit (status 0); failure/empty output renders without it.
- runForegroundTaskWorker inherited the 240s status-wait default, so foreground
  xhigh runs (routinely >4min) timed out instead of returning the result. The
  inline wait is now unbounded; the detached worker provides survival.
- enqueueBackgroundTask left the queued record pid: null and never persisted
  the spawned worker pid, breaking /codex:cancel. The worker pid is now written
  after spawn, guarded against clobbering a finished job.

Tests: 97/97 (added failure-no-sentinel + worker-pid coverage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A companion --background job has no push channel to the main thread, and the
subagent often paraphrases the [[...dispatched...]] sentinel away — so the hook
could even report a still-running job as "done" (found in real testing).

- The PostToolUse hook now uses companion STATE as ground truth: if the
  workspace has an active (queued/running) task job after a codex-rescue return,
  it injects a watcher instruction — arm `status <jobId> --wait` via the Bash
  tool with run_in_background=true, which exits and re-invokes Claude on terminal
  status. Robust against paraphrased/dropped dispatch tokens and the Bash
  auto-background case.
- The companion --background dispatch message and the rescue/result-handling
  docs now carry the same watcher command.

Gives background codex-rescue a reliable completion notification — the
backgrounded `status --wait` watcher isn't subject to the 600s foreground cap.

Tests: 98/98 (new state-seeded watcher test incl. a paraphrased-dispatch case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2071e212aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +162 to +164
const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input));
if (activeTaskJob) {
return buildWatcherContext(activeTaskJob.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Check completion token before inferring active task state

buildCompletionContext returns a watcher message as soon as any queued/running task exists in companion state, before it inspects the current tool output for [[codex-task status=complete]]. In a workspace with an unrelated background task already running, a newly completed codex:codex-rescue foreground run will be misclassified as still running, causing the parent thread to ignore the actual final result and keep waiting/polling. This regression is introduced by prioritizing global active-job state over the response’s explicit completion sentinel.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11db7d5. buildCompletionContext now checks the completion/dispatch token BEFORE the active-task-state inference, so a return carrying the complete sentinel returns COMPLETE even when an unrelated active task job exists; the state inference is only a fallback for paraphrased/dropped dispatch tokens.

Comment on lines +106 to +107
[...loadState(cwd).jobs]
.filter((job) => job?.jobClass === "task" && ACTIVE_TASK_STATUSES.has(String(job.status ?? "")))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope active-task detection to the current Claude session

findNewestActiveTaskJob scans all tracked task jobs in the workspace without any session filter, so the PostToolUse hook can attach a watcher for a different Claude session’s job. When two sessions run in the same repo, a rescue return in session A can be annotated with session B’s running job id, sending the user to poll/cancel the wrong job and masking the actual state of the current run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11db7d5. findNewestActiveTaskJob is now scoped to the hook input's session_id (matching job.sessionId), with an unscoped fallback only when no session_id is provided, so it won't point the watcher at another Claude session's active job.

const BASH_AUTO_BACKGROUND_OUTPUT_PREFIX = "Output is being written to:";

function buildWatcherContext(jobId) {
return `codex-rescue background job ${jobId} is RUNNING — there is no automatic push notification. To be notified, arm a watcher: run this via the Bash tool with run_in_background=true: node ${COMPANION_SCRIPT_PATH} status ${jobId} --wait --timeout-ms 1800000 — it blocks until the job is terminal, then exits and re-invokes you. If it returns and the job is still running, re-arm the same command. Do NOT treat the job as done until the watcher reports a terminal status.`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote companion script path in watcher command text

The watcher guidance interpolates COMPANION_SCRIPT_PATH into a shell command without quoting it. On installations where the plugin path contains spaces (for example under Application Support), the suggested node ... status command is split into multiple arguments and fails, so users who follow the injected context cannot arm the completion watcher and may miss terminal job transitions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11db7d5. The watcher command now quotes the companion path: node "<COMPANION_SCRIPT_PATH>" status ... — handles paths with spaces.

zknpr and others added 2 commits May 25, 2026 11:21
…ession scope, quoting

- Check the completion/dispatch token BEFORE inferring active-task state, so a
  return that actually completed isn't masked by an unrelated active job.
- Scope findNewestActiveTaskJob to the hook's session_id (job.sessionId match),
  with an unscoped fallback when no session_id is present — avoids pointing the
  watcher at another Claude session's job.
- Quote the companion path in the watcher command (handles paths with spaces).

Tests: 100/100 (added complete-token-beats-active-job, session-scoping, and
quoted-path cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ackground/--wait conflict

- Hook: the dispatched-token branch now emits the full watcher instruction
  (buildWatcherContext) instead of a bare "poll /codex:status", matching the
  state-inference fallback; removed the now-unused buildDispatchedContext.
- Companion: handleTask now rejects conflicting --background + --wait with an
  error, mirroring the --resume/--fresh guard (PR openai#346 review P2 — previously
  --background silently won, ignoring the explicit foreground --wait).

Tests: 101/101.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zknpr
Copy link
Copy Markdown
Author

zknpr commented May 25, 2026

Addressed the P2 from the c4722b5 review body ("Reject conflicting --background and --wait flags") in 153bd63: handleTask now throws "Choose either --background or --wait, not both." when both flags are present — mirroring the existing --resume/--fresh guard — instead of silently backgrounding and ignoring the explicit foreground --wait.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 153bd63aec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +178 to +181
const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input), input?.session_id);
if (activeTaskJob) {
return buildWatcherContext(activeTaskJob.id);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard active-job fallback with invocation-specific evidence

This fallback infers that the current codex:codex-rescue call is still running whenever any task in the same session is queued/running, and it runs before the failure/empty-return handling. In a session with an unrelated background rescue already active, a new foreground rescue that exits without a sentinel (for example, Codex invocation failure or token loss in agent output) will be misclassified as still running and mapped to the wrong job ID, which can hide a failed run and send users to poll/cancel the wrong task. The fallback should only trigger when the current invocation is tied to that job (e.g., dispatched token/job ID or other invocation-specific signal).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 186e103. The global active-job fallback now only fires when THIS return contains invocation-specific background evidence (a BACKGROUND_INTENT_PATTERN match), so a sentinel-less/failed foreground return with an unrelated concurrent active task is no longer misclassified as still-running — it falls through to the failure/neutral branch. Session scoping retained.

Comment on lines +677 to +680
writeJobFile(job.workspaceRoot, job.id, queuedRecord);
upsertJob(job.workspaceRoot, queuedRecord);

const child = spawnDetachedTaskWorker(cwd, job.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-check cancellation before spawning detached worker

Fresh evidence in this commit: enqueueBackgroundTask now persists a cancellable queued record (pid: null) before spawn, but still unconditionally spawns the detached worker afterward. If /codex:cancel lands in that window, the job is marked cancelled with no PID to kill, and this code can still start the worker anyway, allowing writes to continue after the user saw a successful cancel. Re-read job state before spawn (or atomically claim start) and skip spawning when status is no longer active.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 186e103. enqueueBackgroundTask re-reads the job record immediately before spawnDetachedTaskWorker and skips the spawn (returning the terminal status, logging the skip) when /codex:cancel marked it non-active in the write->spawn window.

Comment on lines +178 to +180
const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input), input?.session_id);
if (activeTaskJob) {
return buildWatcherContext(activeTaskJob.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prefer Bash auto-background marker over global active-job fallback

When the Agent output explicitly contains Bash’s auto-background marker, that is invocation-specific evidence that this run detached past the foreground cap; however, the hook first checks for any active task in session state and returns a watcher for that job. In sessions with another active task, auto-backgrounded foreground runs will be misattributed to the unrelated job and lose the output-path guidance needed to follow the detached process. Check the auto-background marker before global state fallback (or require job correlation for fallback).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 186e103. The Bash auto-background marker (invocation-specific) is now checked BEFORE the global state fallback, so an auto-backgrounded run keeps its streamed-output-path guidance instead of being misattributed to an unrelated active job.

Comment on lines +747 to +750
const snapshot = await waitForSingleJobSnapshot(cwd, job.id, {
// PR #346 review: foreground xhigh waits must survive beyond the 240s status default.
timeoutMs: Infinity,
pollIntervalMs: FOREGROUND_TASK_POLL_INTERVAL_MS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound foreground waits when detached worker fails to start

Foreground task now always waits with timeoutMs: Infinity after dispatching a detached worker. If that worker never transitions the job out of queued/running (for example, process spawn failure/resource exhaustion or early worker bootstrap failure), this wait loop never terminates and the caller hangs indefinitely instead of surfacing an actionable error. Keep a finite guard timeout or detect launch failure before entering an unbounded wait.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 186e103. The foreground Infinity wait now detects when the detached worker pid has exited while the job is still queued/running (failWhenWorkerExits + pid-liveness) and bails with 'background worker exited before completing; check /codex:status ' instead of hanging. Liveness-based, so legitimately long xhigh runs still wait.

- Hook: gate the global active-job fallback on invocation-specific background
  evidence (BACKGROUND_INTENT_PATTERN) and check the Bash auto-background marker
  before it, so a sentinel-less foreground failure isn't misrouted to an
  unrelated concurrent task's job. (P1 + P2)
- Companion: enqueueBackgroundTask re-reads the record before spawning and skips
  the worker if /codex:cancel already marked it terminal in the write->spawn
  window. (P1)
- Companion: the foreground Infinity wait now bails with an actionable error
  when the detached worker pid exits while the job is still active (pid-liveness,
  so long runs still wait), plus a brief missing-job retry for the
  worker-writes-record race. (P2)

Tests: 104/104.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 186e103504

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +622 to +625
const statusToken = buildTaskDispatchedStatusToken(payload.jobId);
return [
statusToken,
`${payload.title} dispatched as background job ${payload.jobId}.`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip dispatched token when launch status is terminal

renderQueuedTaskLaunch always emits [[codex-task status=dispatched ...]] and “dispatched as background job ...” text, but enqueueBackgroundTask can now return a terminal payload (for example status: "cancelled") when cancellation wins before worker spawn. In that race, non-JSON callers are told the job was dispatched and should be watched even though no worker was launched, which misreports state and can send users into unnecessary polling/wait logic; only queued/running launches should produce the dispatched sentinel.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7aaff1. renderQueuedTaskLaunch is now status-aware: a terminal enqueue payload (e.g. cancelled/failed before a worker launches) renders a plain terminal message with no [[codex-task status=dispatched …]] sentinel and no watcher guidance; only queued/running launches emit the dispatched token.

Covered by task --background cancelled before launch renders terminal text without a dispatched token, which asserts the output contains neither the dispatched sentinel nor "dispatched as background job" and instead shows "was cancelled before a worker launched; no work started."

Comment on lines +767 to +770
writeJobFile(job.workspaceRoot, job.id, {
...spawnedRecord,
pid: child.pid ?? null
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-check cancellation before writing spawned worker PID

After spawning the worker, this branch writes pid back based on a previously read spawnedRecord status. If /codex:cancel lands between that read and this write, cancellation can be overwritten with a queued record and live PID, effectively resurrecting a job the user already cancelled and allowing detached work to continue. Re-read state (or atomically compare-and-set) immediately before persisting the PID so cancelled terminal state cannot be clobbered during launch races.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7aaff1. persistSpawnedTaskWorkerPid now writes the pid and re-checks cancellation inside a single updateState read-modify-write: if the job is already terminal (a /codex:cancel won the post-spawn race), the pid is not written and the just-spawned worker is killed via terminateProcessTree instead of being resurrected by stale launch data. The terminal check (isTerminalJobRecord) consults both the summary state row and the per-job file, since either can be observed first during a race.

Covered by task --background does not persist a worker pid when cancellation wins after spawn, which injects the cancellation in the post-spawn window and asserts the job stays cancelled, pid stays null, and the orphaned worker process group is killed.

}

async function runForegroundTaskWorker(cwd, job, request, options = {}) {
enqueueBackgroundTask(cwd, job, request);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard foreground path against detached spawn error events

Foreground task now always routes through enqueueBackgroundTask, which spawns a detached worker process; if that spawn fails (for example due transient ENOENT/resource errors), Node emits an error event on the child process and this path has no handler, so the parent process can crash before returning a structured failure or updating job state. Because this commit moved all foreground runs onto this launch path, the failure mode now affects normal synchronous rescue execution as well, not just optional background mode.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7aaff1. spawnDetachedTaskWorker now wraps spawn in try/catch for synchronous failures, attaches child.once("error", …) for async error events, and treats a missing child.pid as a launch failure. Any spawn failure routes through markTaskLaunchFailed, which atomically marks the job failed so the foreground waiter surfaces a structured error instead of letting an unhandled error event crash the parent.

Covered by foreground task reports a structured failure when the detached worker spawn fails, which emits an async ENOENT error event with no pid and asserts no ETIMEDOUT hang, no Unhandled 'error' event, and a failed payload with the spawn-failure message.

…penai#346 review)

Resolve the launch/cancel race class and spawn-error crash flagged on 186e103
by routing every launch/cancel job-record mutation through a single atomic
updateState read-modify-write instead of read-then-write checks:

- P1 cancel-resurrect race (persistSpawnedTaskWorkerPid): the worker pid is now
  written and cancellation re-checked inside one updateState transition. If a
  concurrent /codex:cancel already made the job terminal, the pid is NOT written
  and the just-spawned worker is killed (terminateProcessTree) instead of being
  resurrected by stale launch data.
- P1 unhandled detached spawn error (spawnDetachedTaskWorker): wrap spawn in
  try/catch for sync failures, attach child.once('error') for async failures,
  and treat a missing child.pid as a launch failure. Any spawn failure
  atomically marks the job failed (markTaskLaunchFailed) so the foreground
  waiter surfaces a structured error instead of crashing or hanging.
- P2 dispatched sentinel on terminal launch (renderQueuedTaskLaunch): a terminal
  enqueue payload (cancelled/failed before worker launch) now renders a plain
  terminal message with NO dispatched sentinel or watcher guidance, so hooks and
  humans never treat a non-started job as live background work.

New helpers isTerminalJobRecord/resolveLaunchStatus check both the summary state
row and the per-job file, since either can be observed first during a race.

Tests: 3 new runtime cases (cancel-after-spawn kills the orphan and keeps the
job cancelled with no pid; spawn failure yields a structured failed payload with
no hang/crash; cancel-before-launch renders terminal text with no dispatched
token). Full suite 107/107 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7aaff1ebd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +171 to +172
if (token?.status === "complete") {
return COMPLETE_CONTEXT;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ignore forged complete tokens on failed Agent responses

Treating [[codex-task status=complete]] as authoritative before checking the Agent tool status allows a failed codex:codex-rescue return to be misclassified as success whenever the model output itself contains that literal line (for example, if the prompt or generated text echoes it). In that case the hook emits the “COMPLETED and exited” guidance even though tool_response.status is failed, which can suppress retries/escalation and mislead the parent thread about the real run outcome.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb53f47. The structured Agent tool status is now authoritative over any [[codex-task …]] line in the body: buildCompletionContext short-circuits to the failure context via a new hasFailureStatus(tool_response) check before parsing the token, so a failed return whose text merely echoes [[codex-task status=complete]] is reported as a failure, not success. A genuine success (status completed/success/absent — none of which are in FAILURE_STATUSES) still falls through and honors the complete token, so no regression.

Covered by codex-rescue hook reports failure (not success) when a failed return echoes the complete token: status: "failed" + a standalone complete-token line asserts the emitted context matches /exited WITHOUT a success signal/ and not /COMPLETED and exited/.

Comment on lines +176 to +177
if (token?.status === "dispatched" && token.id) {
return buildWatcherContext(token.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject dispatched tokens from failed Agent responses

The hook trusts a [[codex-task status=dispatched id=...]] line even when the Agent call itself failed, so any failed return that echoes that literal token is reclassified as an active background launch and gets watcher instructions for that job id. This can hide real failures and redirect users to poll/cancel the wrong task, because the dispatched branch runs before failure-status handling.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb53f47 by the same early failure-status gate: because hasFailureStatus returns the failure context before token handling, the dispatched-token branch is never reached for a failed return, so an echoed [[codex-task status=dispatched id=…]] no longer redirects to watcher/poll instructions for a phantom job.

Covered by codex-rescue hook reports failure (not dispatch) when a failed return echoes a dispatched token: status: "failed" + a dispatched-token line asserts the failure context and that no status … --wait watcher instruction is emitted.

Comment on lines +90 to +93
function isSynchronousAgentReturn(toolResponse) {
const status = String(toolResponse?.status ?? "").toLowerCase();
return status !== "async_launched" && status !== "running";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat sub_agent_entered as a non-terminal Agent status

isSynchronousAgentReturn currently treats every status except running/async_launched as terminal, which includes Agent responses with status: "sub_agent_entered" (interactive subagent handoff). In that state the rescue agent has not finished, but this hook will emit “exited / do not wait” context, causing the parent flow to stop waiting while the subagent is still active.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb53f47. isSynchronousAgentReturn now treats sub_agent_entered as non-terminal (added to a NON_TERMINAL_AGENT_STATUSES set alongside async_launched/running), so the hook stays silent during an interactive subagent handoff instead of emitting "exited / do not wait."

One note in the interest of honesty: I could not independently confirm sub_agent_entered against the harness's Agent tool status enum, so I've implemented this as a defensive guard — it is a no-op if that status never occurs and correct if it does. If you can point me at the authoritative status list I'll align the set exactly (and add any sibling non-terminal statuses).

Covered by codex-rescue hook stays silent on sub_agent_entered handoff: a sub_agent_entered return asserts exit 0 and empty stdout (no additionalContext).

@@ -774,20 +1186,23 @@ async function handleTask(argv) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exit non-zero when background task launch returns failed

handleTask always returns success from the --background path after printing the payload, even when enqueueBackgroundTask reports a terminal failure (for example worker spawn/pid failure). This makes task --background exit with code 0 while returning { "status": "failed" }, so scripts and calling agents can treat a failed dispatch as successful and skip retry/escalation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bb53f47. The --background branch of handleTask now sets process.exitCode = 1 when payload.status === "failed" (a worker spawn/pid failure from enqueueBackgroundTask), so a failed dispatch no longer exits 0 while printing { "status": "failed" }. A cancelled launch stays exit 0, since a user-initiated cancel before launch is not an error.

Covered by task --background exits non-zero when the worker spawn fails: a stubbed async spawn ENOENT asserts a non-zero exit, payload.status === "failed", and no Unhandled 'error' event.

zknpr and others added 2 commits May 25, 2026 13:26
…instead of hanging

The background-completion watcher reaches detached workers through
`status <jobId> --wait`, but that path called waitForSingleJobSnapshot
WITHOUT failWhenWorkerExits — the dead-worker liveness check (hasExitedActiveWorker)
was only wired into the foreground task path. So when a --background worker died
without writing a terminal status (crash/kill/teardown), the job stayed `running`
forever, the watcher polled that stale status until --timeout-ms elapsed (e.g. 30
min) and returned a misleading timeout, and the same stale `running` blocked
/codex:result and kept the completion hook treating the job as active.

Fix (reuse the proven foreground helpers, no new machinery):
- handleStatus --wait now passes failWhenWorkerExits: true; on workerExited it
  reconciles via failActiveWorkerJob (persists status=failed, pid=null,
  errorMessage) and renders the crash, mirroring runForegroundTaskWorker.
- A plain status <jobId> read also reconciles a dead active worker, so result,
  hooks, and later status callers see the durable failure (failActiveWorkerJob
  persists globally, so the first read repairs state for everyone).

No false positives: hasExitedActiveWorker requires a recorded pid that is not
alive, so pid-null/queued and live-pid jobs are never reconciled.

Tests (+3, suite 110/110): a dead-pid running job makes status --wait return
promptly as failed (no ETIMEDOUT) and persists failed/pid null; a plain status
read reconciles the same; a live-pid (process.pid) running job still times out
and is NOT reconciled (false-positive guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on-zero on failed launch (PR openai#346 review)

Address the 4 findings from the Codex review of a7aaff1:

- P1 (hook): the completion hook honored a [[codex-task status=complete]] /
  [[codex-task status=dispatched]] line in the response body BEFORE checking the
  Agent tool status, so a failed return whose text merely echoed the literal
  token (model output or prompt echo) was misclassified as success / active
  dispatch — suppressing retries and pointing watchers at a phantom job. The
  structured tool status is authoritative: a known failure status now
  short-circuits to the failure context before any token handling. A genuine
  success (status completed/success/none) still honors the complete token.
- P2 (hook): isSynchronousAgentReturn treated every status except
  async_launched/running as terminal, including sub_agent_entered (interactive
  subagent handoff). The hook now treats sub_agent_entered as non-terminal and
  stays silent so it never tells the parent "exited" while the subagent is live.
- P1 (companion): handleTask --background printed the launch payload and always
  returned exit 0, even when enqueueBackgroundTask reported status=failed (worker
  spawn/pid failure). It now sets a non-zero exit code on a failed launch so
  callers don't treat a failed dispatch as success (cancelled stays exit 0).

Tests (+4, suite 114/114): a failed return echoing the complete token yields the
failure context (not COMPLETED); a failed return echoing a dispatched token yields
the failure context with no watcher instruction; a sub_agent_entered return emits
no context; task --background exits non-zero with status=failed on spawn failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant