You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specification details live in PR #658 under plans/645/, especially task-admission.md, request-admission.md, capacity-model.md, observability.md, benchmark-plan.md, and issue-map.md. This issue tracks the design-spike slice and gates only.
Design Scope
Produce a decision-ready design for future resource-vector and provider-aware task-admission policies. This issue is design-only unless a follow-up implementation issue is explicitly created.
defining how the accepted Add bounded-borrow admission policy for heavy-root async workloads #650 bounded-borrow policy should carry forward from coarse task-stage resources into provider/model/resource-vector resources, including strict share, bounded solo prefill, peer-pressure yielding, and borrow-debt repayment semantics;
defining per-generation-resource zero-inflight idle metrics for endpoint/GPU idleness evidence, including per-resource, total, max, and baseline-delta reporting;
liveness, release/accounting, cancellation/error, telemetry, and benchmark requirements for a later implementation;
recording rejected alternatives.
Quality Gates
No resource-vector implementation lands as part of this design issue.
The design preserves the task/request admission split and consumes request pressure only through read-only snapshots if needed.
The design keeps FairTaskQueue responsible for ready ordering, not resource ownership.
The design explains whether bounded-borrow semantics remain policy-compatible once resource identity becomes provider/model/resource-vector aware, or explicitly records any required policy revision.
Benchmark matrix covers asymmetric provider/model capacities, different dominant resources, neutral single-resource workloads, heavy-root compatibility, and correlated scheduler/request traces.
Benchmark evidence must report per-generation-resource zero-inflight idle time: for each provider/model/resource, the total workflow wall time where that resource has no in-flight generation request while the workflow is still active. Reports should include per-resource idle, total idle, max idle, and deltas versus the named baseline.
Validation
Close with an accepted design document or plan update that cross-references plans/645, lists open questions, records rejected alternatives, and defines the exact follow-up implementation issue(s) and evidence gates.
Priority Level
Low
Epic: #645
Depends on: #641, #646, #644, #657, #649, #650
Related: #647, #648, #654
Target branch:
epic/645-async-schedulingwhile the epic is active.Source of Truth
Specification details live in PR #658 under
plans/645/, especiallytask-admission.md,request-admission.md,capacity-model.md,observability.md,benchmark-plan.md, andissue-map.md. This issue tracks the design-spike slice and gates only.Design Scope
Produce a decision-ready design for future resource-vector and provider-aware task-admission policies. This issue is design-only unless a follow-up implementation issue is explicitly created.
This issue owns:
SchedulerResourceKey/SchedulerResourceRequestshape for a later implementation;Quality Gates
FairTaskQueueresponsible for ready ordering, not resource ownership.Validation
Close with an accepted design document or plan update that cross-references
plans/645, lists open questions, records rejected alternatives, and defines the exact follow-up implementation issue(s) and evidence gates.