Persist terminal aborted state so dead jobs stop counting as queued#504
Merged
Merged
Conversation
A job with `completed IS NULL` is treated as queued/in-progress by
isQueued(), the admin "in_progress" filter, and the status badge. That
conflates three different things: pending, actively running, and
permanently failed. Once a job exhausts its retries it will never run
again, yet it keeps `completed IS NULL` forever — so:
* isQueued() keeps returning true, which wedges any non-concurrent
caller (e.g. a QueueScheduler row) permanently behind the dead job;
* the admin shows it as "Running" instead of failed.
getFailedStatus() already distinguishes a retryable failure
("requeued") from a terminal one ("aborted"); this only persists that
verdict so the rest of the system can see it.
* Add `QueuedJobsTable::STATUS_ABORTED` and `markJobAborted()`.
* Processor stamps it when getFailedStatus() === aborted (alongside the
existing maxAttemptsExhausted event).
* isQueued() and the "in_progress" search filter exclude aborted rows
(null/other statuses are unaffected — fully backward compatible).
* The status badge renders aborted as "Failed", checked before the
`fetched` branch so a job whose worker died on its last attempt no
longer shows a stuck "Running".
Retryable failures are untouched: while attempts remain the job is
genuinely in flight and still counts as queued, so there is no early
re-dispatch or pile-up. flushFailedJobs() continues to prune old rows.
This was referenced May 28, 2026
dereuromark
added a commit
to dereuromark/cakephp-queue-scheduler
that referenced
this pull request
May 28, 2026
…#48) * Rerun terminally-failed scheduled jobs in place, with failure backoff A non-concurrent row gates its next dispatch on QueuedJobsTable::isQueued(). Once cakephp-queue records terminal "aborted" state, a dead job stops counting as queued, so the schedule resumes — but it would then queue a brand-new job every tick, leaving one failed row behind per interval (e.g. ~1440/day for an every-minute task). #1 — Rerun in place: when the row's previous dispatch terminally failed (queue status = aborted), run() resets that same job (clears attempts and the stale aborted status) instead of creating a new one. Only one recycled row per logical scheduled job, no pile-up. Checked before the in-flight hold-back so it works whether or not the installed queue release already excludes aborted jobs from isQueued() — an aborted job is terminal and never blocks. Still-retrying jobs are untouched: while attempts remain the job is genuinely in flight and the next tick is held back, so there is no early re-dispatch. #2 — Backoff: a new `consecutive_failures` counter tracks aborts without an intervening success. After `QueueScheduler.maxConsecutiveFailures` (default 0 = unlimited) the row is disabled and a `QueueScheduler.Row.disabled` event is fired so the host app can alert. A successful/fresh dispatch resets the counter to 0. The disable write commits while run() still reports "not dispatched": the transactional() callback returns true (commit) for both a dispatch and a backoff-disable, and only false (rollback) for hold-back / lost compare-and-swap. The actual dispatched/not-dispatched result is tracked in an outer flag so the disable is not rolled back. Adds the `consecutive_failures` column (migration + fixture + entity), three SchedulerRowsTable tests, and documents the config key in docs/README.md and config/app.example.php. Pairs with dereuromark/cakephp-queue#504 (which persists the terminal aborted state); dormant and behaviour-preserving on older queue releases. * Fix json_decode() arg order in SchedulerRow::_getJobData() JSON_THROW_ON_ERROR was passed as the 3rd argument ($depth) instead of the 4th ($flags), so the throw-on-error behaviour never actually engaged and a malformed param would have returned null rather than throwing. Pass an explicit depth (512, the PHP default) and the flag in its proper slot. Newer phpstan flags this as argument.invalidConstant; the bug predates this branch but only surfaced now via a phpstan release. * Harden scheduler failure backoff - maxConsecutiveFailures now counts reruns granted rather than aborts: the cap is checked before incrementing, so re-enabling a disabled row recovers through the in-place rerun path on every queue version. Previously a cap of 1 could never retry, and re-enabling immediately re-disabled the row. - Reset the consecutive_failures streak in beforeSave() when a row is re-enabled, granting it a fresh round of reruns. - Dispatch the QueueScheduler.Row.disabled event and log the warning after the transaction commits, so a listener never runs mid-transaction. - Tidy a detached test docblock left over from the earlier insertion. * Reference cakephp-queue#505 in the reset-status shim comment
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #503 (which discounted stale-fetched jobs — wrong, since a job with retries left is one the queue itself re-runs, so the scheduler would re-dispatch a duplicate and pile up failed rows). This instead records the terminal verdict the queue already computes.
Problem
A row with
completed IS NULLis treated as queued/in-progress byisQueued(), the admin "in_progress" filter, and the status badge. That conflates three states: pending, actively running, and permanently failed. Once a job exhausts its retries it never runs again, yet keepscompleted IS NULLforever, so:isQueued()keeps returning true → any non-concurrent caller (notably a QueueScheduler row) wedges permanently behind the dead job: both the cron tick and the admin "Run" return "could not be added to the queue";Change
getFailedStatus()already distinguishes a retryable failure (requeued) from a terminal one (aborted). This persists that verdict so the rest of the system can act on it:QueuedJobsTable::STATUS_ABORTEDandmarkJobAborted().Processorstamps it whengetFailedStatus() === aborted(next to the existingQueue.Job.maxAttemptsExhaustedevent).isQueued()and thein_progresssearch filter exclude aborted rows. Null / other statuses are unaffected — fully backward compatible.fetchedbranch, so a job whose worker died on its last attempt no longer shows a stuck "Running".Retryable failures are deliberately untouched: while attempts remain the job is genuinely in flight and still counts as queued, so there is no early re-dispatch and no pile-up.
flushFailedJobs()continues to prune old rows.Tests
testIsQueuedExcludesAborted,testMarkJobAborted, and an assertion in the ProcessormaxAttemptsExhaustedtest that the exhausted job is stampedaborted. FullQueuedJobsTableTest+ProcessorTestgreen; phpcs and phpstan clean on the changed files.Downstream
Pairs with dereuromark/cakephp-queue-scheduler#47 — but with this change that scheduler PR is no longer needed: plain
isQueued()stops counting dead jobs, so the schedule self-heals with no scheduler change. I'll close #47.