Don't wedge a non-concurrent schedule behind a dead job#47
Closed
dereuromark wants to merge 2 commits into
Closed
Conversation
A non-concurrent row (allow_concurrent = false) gates its next dispatch
on QueuedJobsTable::isQueued(), which counts any row with
`completed IS NULL`. If a worker fetches the scheduled job and then dies
(OOM, timeout, kill) without completing or failing it, that row stays
`completed IS NULL` forever — so every subsequent tick AND the admin
"Run" action keep returning false ("could not be added to the queue").
The schedule is permanently stuck behind a job that will never finish.
run() and runOnce() now go through isBlockedByActiveJob(): a job
fetched longer ago than the queue's own requeue timeout
(`Queue.defaultRequeueTimeout`) is one the queue would re-offer to a
worker, so it is presumed dead and no longer blocks dispatch.
Not-yet-fetched jobs and jobs fetched within the window still block, so
genuine concurrency protection is unchanged. When the timeout is not
configured the original strict isQueued() semantics are kept, so
behaviour is unchanged for installs that have not opted in.
Mirrors the optional stale-timeout added to
QueuedJobsTable::isQueued() in dereuromark/cakephp-queue#503; once a
release carrying that parameter is required this helper can collapse to
a single isQueued($ref, $task, $staleTimeout) call.
Owner
Author
|
Closing in favor of dereuromark/cakephp-queue#504. Once the queue records terminal 'aborted' state, plain isQueued() stops counting dead jobs, so a non-concurrent schedule self-heals with no scheduler-side change needed. The staleness guard here also conflated still-retrying jobs with dead ones. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A non-concurrent row (
allow_concurrent = false) gates its next dispatch onQueuedJobsTable::isQueued($reference, $task), which counts any row withcompleted IS NULL.If a worker fetches the scheduled job and then dies (OOM, PHP timeout, container kill) without completing or failing it, that row stays
completed IS NULLforever. From then on:with no way out short of manually deleting the ghost row. The schedule is permanently wedged behind a job that will never finish — and to an operator it just looks "stuck running".
Change
run()andrunOnce()now decide hold-back through a newisBlockedByActiveJob()helper:Queue.defaultRequeueTimeout) is one the queue would already re-offer to a worker, so it is presumed dead and no longer blocks dispatch.Queue.defaultRequeueTimeoutis not configured, the original strictisQueued()semantics are kept, so behaviour is unchanged for installs that have not opted into a timeout.Relationship to cakephp-queue#503
This mirrors the optional
$staleTimeoutparameter proposed forQueuedJobsTable::isQueued()in dereuromark/cakephp-queue#503. This PR is intentionally self-contained (no version bump, green on the current released queue) by doing the staleness query locally. Once a queue release carrying that parameter is required,isBlockedByActiveJob()can collapse to a singleisQueued($ref, $task, $staleTimeout)call.Tests
testRunIgnoresStaleQueuedJob(a 5-minutes-ago fetched, never-completed job no longer blocks) andtestRunBlocksOnFreshQueuedJob(a 10-seconds-ago fetched job still blocks). FullSchedulerRowsTableTestgreen; phpcs and phpstan clean on the changed files.