Skip to content

Don't wedge a non-concurrent schedule behind a dead job#47

Closed
dereuromark wants to merge 2 commits into
masterfrom
feature/no-wedge-on-dead-job
Closed

Don't wedge a non-concurrent schedule behind a dead job#47
dereuromark wants to merge 2 commits into
masterfrom
feature/no-wedge-on-dead-job

Conversation

@dereuromark
Copy link
Copy Markdown
Owner

Problem

A non-concurrent row (allow_concurrent = false) gates its next dispatch on QueuedJobsTable::isQueued($reference, $task), which counts any row with completed IS NULL.

If a worker fetches the scheduled job and then dies (OOM, PHP timeout, container kill) without completing or failing it, that row stays completed IS NULL forever. From then on:

  • every cron tick is held back, and
  • the admin "Run" action returns false ("The job could not be added to the queue"),

with no way out short of manually deleting the ghost row. The schedule is permanently wedged behind a job that will never finish — and to an operator it just looks "stuck running".

Change

run() and runOnce() now decide hold-back through a new isBlockedByActiveJob() helper:

  • A job fetched longer ago than the queue's own requeue timeout (Queue.defaultRequeueTimeout) is one the queue would already re-offer to a worker, so it is presumed dead and no longer blocks dispatch.
  • Jobs not yet fetched, or fetched within the window, still block — genuine concurrency protection is unchanged.
  • When Queue.defaultRequeueTimeout is not configured, the original strict isQueued() semantics are kept, so behaviour is unchanged for installs that have not opted into a timeout.

Relationship to cakephp-queue#503

This mirrors the optional $staleTimeout parameter proposed for QueuedJobsTable::isQueued() in dereuromark/cakephp-queue#503. This PR is intentionally self-contained (no version bump, green on the current released queue) by doing the staleness query locally. Once a queue release carrying that parameter is required, isBlockedByActiveJob() can collapse to a single isQueued($ref, $task, $staleTimeout) call.

Tests

testRunIgnoresStaleQueuedJob (a 5-minutes-ago fetched, never-completed job no longer blocks) and testRunBlocksOnFreshQueuedJob (a 10-seconds-ago fetched job still blocks). Full SchedulerRowsTableTest green; phpcs and phpstan clean on the changed files.

A non-concurrent row (allow_concurrent = false) gates its next dispatch
on QueuedJobsTable::isQueued(), which counts any row with
`completed IS NULL`. If a worker fetches the scheduled job and then dies
(OOM, timeout, kill) without completing or failing it, that row stays
`completed IS NULL` forever — so every subsequent tick AND the admin
"Run" action keep returning false ("could not be added to the queue").
The schedule is permanently stuck behind a job that will never finish.

run() and runOnce() now go through isBlockedByActiveJob(): a job
fetched longer ago than the queue's own requeue timeout
(`Queue.defaultRequeueTimeout`) is one the queue would re-offer to a
worker, so it is presumed dead and no longer blocks dispatch.
Not-yet-fetched jobs and jobs fetched within the window still block, so
genuine concurrency protection is unchanged. When the timeout is not
configured the original strict isQueued() semantics are kept, so
behaviour is unchanged for installs that have not opted in.

Mirrors the optional stale-timeout added to
QueuedJobsTable::isQueued() in dereuromark/cakephp-queue#503; once a
release carrying that parameter is required this helper can collapse to
a single isQueued($ref, $task, $staleTimeout) call.
JSON_THROW_ON_ERROR was passed as the $depth parameter (#3) instead of
$flags (#4). Add the default depth of 512 so the flag lands correctly.
@dereuromark
Copy link
Copy Markdown
Owner Author

Closing in favor of dereuromark/cakephp-queue#504. Once the queue records terminal 'aborted' state, plain isQueued() stops counting dead jobs, so a non-concurrent schedule self-heals with no scheduler-side change needed. The staleness guard here also conflated still-retrying jobs with dead ones.

@dereuromark dereuromark deleted the feature/no-wedge-on-dead-job branch May 28, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant