Skip to content

Tasks dispatched with immediate=True, deferred=True fail on immediate timeout instead of falling back to deferred execution #7446

@vkukk

Description

@vkukk

Version

pulpcore 3.105.0

Describe the bug

When a task is dispatched with both immediate=True and deferred=True, and the immediate execution exceeds IMMEDIATE_TIMEOUT (5 seconds, hardcoded in pulpcore/constants.py), the task is marked as failed instead of falling back to deferred (background) execution.

The deferred=True flag is currently only used as a fallback when resources are blocked at dispatch time (dispatch() in pulpcore/tasking/tasks.py):

if execute_now:
    if are_resources_available(colliding_resources, task):
        send_wakeup_signal = True if resources else False
        task.unblock()
        with using_workdir():
            execute_task(task)
    elif deferred:  # Resources are blocked and can be deferred
        task.app_lock = None
        task.save()
    else:  # Can't be deferred
        task.set_canceling()
        task.set_canceled(TASK_STATES.CANCELED, "Resources temporarily unavailable.")

However, once the task starts executing immediately, there is no equivalent fallback. If the task times out during immediate execution, _add_timeout_to() raises RuntimeError, which is caught by _execute_task() and the task is unconditionally marked as failed via task.set_failed() — the deferred=True flag is never consulted:

def _add_timeout_to(coro_fn, task_pk):
    async def _wrapper():
        try:
            return await asyncio.wait_for(coro_fn(), timeout=IMMEDIATE_TIMEOUT)
        except asyncio.TimeoutError:
            msg_template = "Immediate task %s timed out after %s seconds."
            error_msg = msg_template % (task_pk, IMMEDIATE_TIMEOUT)
            _logger.info(error_msg)
            raise RuntimeError(error_msg)
    return _wrapper

In summary: deferred=True handles the "can't start" case (resources blocked), but not the "can't finish in time" case (immediate timeout).

To reproduce

  1. Create an RPM repository and accumulate many repository versions over time (100+), e.g. by having a high or unlimited retain_repo_versions.
  2. PATCH the repository to lower retain_repo_versions to a small value (e.g. 10).
  3. The resulting ageneral_update task is dispatched with immediate=True, deferred=True.
  4. Pulp begins deleting excess repository versions to enforce the new retention limit.
  5. The cleanup cannot complete within 5 seconds, and the task fails with a timeout.

Server logs

The task starts, begins deleting versions, hits the 5-second timeout, and is marked as failed — despite being dispatched with deferred: True:

pulp [...]: pulpcore.tasking.tasks:INFO: Starting task id: 019cdcee-e196-7843-8d30-ccd065542cf0 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-appstream; Version: 127> due to version retention limit.
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-appstream; Version: 126> due to version retention limit.
pulp [...]: pulpcore.tasking.tasks:INFO: Immediate task 019cdcee-e196-7843-8d30-ccd065542cf0 timed out after 5 seconds.
pulp [...]: pulpcore.tasking.tasks:INFO: Task[pulpcore.app.tasks.base.ageneral_update] 019cdcee-e196-7843-8d30-ccd065542cf0 failed (RuntimeError: Immediate task 019cdcee-e196-7843-8d30-ccd065542cf0 timed out after 5 seconds.) in domain: default

A comparable repository with fewer excess versions (90 vs 117+) completes successfully within the timeout:

pulp [...]: pulpcore.tasking.tasks:INFO: Starting task id: 019cdcee-fab2-775a-925f-2cb63feff039 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-crb; Version: 90> due to version retention limit.
pulp [...]: pulpcore.tasking.tasks:INFO: Task completed 019cdcee-fab2-775a-925f-2cb63feff039 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True, execution_time: 1148158 μs

Expected behavior

When a task dispatched with immediate=True, deferred=True times out during immediate execution, it should fall back to deferred execution (which has no IMMEDIATE_TIMEOUT constraint) instead of failing. This would allow operations with unbounded runtime — such as repository version cleanup after lowering retain_repo_versions — to complete successfully.

Context

This issue is particularly impactful when changing repository version retention. A repository that has accumulated hundreds of versions needs to delete all excess versions when retain_repo_versions is lowered. This is an inherently unbounded operation whose duration depends on the number of excess versions — there is no way for the operator to control or predict whether it will complete within 5 seconds.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions