-
Notifications
You must be signed in to change notification settings - Fork 141
Description
Version
pulpcore 3.105.0
Describe the bug
When a task is dispatched with both immediate=True and deferred=True, and the immediate execution exceeds IMMEDIATE_TIMEOUT (5 seconds, hardcoded in pulpcore/constants.py), the task is marked as failed instead of falling back to deferred (background) execution.
The deferred=True flag is currently only used as a fallback when resources are blocked at dispatch time (dispatch() in pulpcore/tasking/tasks.py):
if execute_now:
if are_resources_available(colliding_resources, task):
send_wakeup_signal = True if resources else False
task.unblock()
with using_workdir():
execute_task(task)
elif deferred: # Resources are blocked and can be deferred
task.app_lock = None
task.save()
else: # Can't be deferred
task.set_canceling()
task.set_canceled(TASK_STATES.CANCELED, "Resources temporarily unavailable.")However, once the task starts executing immediately, there is no equivalent fallback. If the task times out during immediate execution, _add_timeout_to() raises RuntimeError, which is caught by _execute_task() and the task is unconditionally marked as failed via task.set_failed() — the deferred=True flag is never consulted:
def _add_timeout_to(coro_fn, task_pk):
async def _wrapper():
try:
return await asyncio.wait_for(coro_fn(), timeout=IMMEDIATE_TIMEOUT)
except asyncio.TimeoutError:
msg_template = "Immediate task %s timed out after %s seconds."
error_msg = msg_template % (task_pk, IMMEDIATE_TIMEOUT)
_logger.info(error_msg)
raise RuntimeError(error_msg)
return _wrapperIn summary: deferred=True handles the "can't start" case (resources blocked), but not the "can't finish in time" case (immediate timeout).
To reproduce
- Create an RPM repository and accumulate many repository versions over time (100+), e.g. by having a high or unlimited
retain_repo_versions. - PATCH the repository to lower
retain_repo_versionsto a small value (e.g. 10). - The resulting
ageneral_updatetask is dispatched withimmediate=True, deferred=True. - Pulp begins deleting excess repository versions to enforce the new retention limit.
- The cleanup cannot complete within 5 seconds, and the task fails with a timeout.
Server logs
The task starts, begins deleting versions, hits the 5-second timeout, and is marked as failed — despite being dispatched with deferred: True:
pulp [...]: pulpcore.tasking.tasks:INFO: Starting task id: 019cdcee-e196-7843-8d30-ccd065542cf0 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-appstream; Version: 127> due to version retention limit.
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-appstream; Version: 126> due to version retention limit.
pulp [...]: pulpcore.tasking.tasks:INFO: Immediate task 019cdcee-e196-7843-8d30-ccd065542cf0 timed out after 5 seconds.
pulp [...]: pulpcore.tasking.tasks:INFO: Task[pulpcore.app.tasks.base.ageneral_update] 019cdcee-e196-7843-8d30-ccd065542cf0 failed (RuntimeError: Immediate task 019cdcee-e196-7843-8d30-ccd065542cf0 timed out after 5 seconds.) in domain: default
A comparable repository with fewer excess versions (90 vs 117+) completes successfully within the timeout:
pulp [...]: pulpcore.tasking.tasks:INFO: Starting task id: 019cdcee-fab2-775a-925f-2cb63feff039 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True
pulp [...]: pulpcore.app.models.repository:INFO: Deleting repository version <Repository: almalinux-9-crb; Version: 90> due to version retention limit.
pulp [...]: pulpcore.tasking.tasks:INFO: Task completed 019cdcee-fab2-775a-925f-2cb63feff039 in domain: default, task_type: pulpcore.app.tasks.base.ageneral_update, immediate: True, deferred: True, execution_time: 1148158 μs
Expected behavior
When a task dispatched with immediate=True, deferred=True times out during immediate execution, it should fall back to deferred execution (which has no IMMEDIATE_TIMEOUT constraint) instead of failing. This would allow operations with unbounded runtime — such as repository version cleanup after lowering retain_repo_versions — to complete successfully.
Context
This issue is particularly impactful when changing repository version retention. A repository that has accumulated hundreds of versions needs to delete all excess versions when retain_repo_versions is lowered. This is an inherently unbounded operation whose duration depends on the number of excess versions — there is no way for the operator to control or predict whether it will complete within 5 seconds.