Design resource-vector and provider-aware admission policies

## Priority Level

Low

Epic: #645
Depends on: #641, #646, #644, #657, #649, #650
Related: #647, #648, #654
Target branch: `epic/645-async-scheduling` while the epic is active.

## Source of Truth

Specification details live in PR #658 under `plans/645/`, especially `task-admission.md`, `request-admission.md`, `capacity-model.md`, `observability.md`, `benchmark-plan.md`, and `issue-map.md`. This issue tracks the design-spike slice and gates only.

## Design Scope

Produce a decision-ready design for future resource-vector and provider-aware task-admission policies. This issue is design-only unless a follow-up implementation issue is explicitly created.

This issue owns:

- deciding whether resource-vector internals stay scheduler-internal or require any public metadata additions beyond #641;
- the minimal `SchedulerResourceKey` / `SchedulerResourceRequest` shape for a later implementation;
- mapping resolved metadata from #641/#646 into scheduler resource requests without exposing scheduler internals as plugin API;
- defining how task-stage provider/model awareness avoids duplicating request-stage AIMD admission from #657;
- defining how the accepted #650 bounded-borrow policy should carry forward from coarse task-stage resources into provider/model/resource-vector resources, including strict share, bounded solo prefill, peer-pressure yielding, and borrow-debt repayment semantics;
- defining per-generation-resource zero-inflight idle metrics for endpoint/GPU idleness evidence, including per-resource, total, max, and baseline-delta reporting;
- liveness, release/accounting, cancellation/error, telemetry, and benchmark requirements for a later implementation;
- recording rejected alternatives.

## Quality Gates

- No resource-vector implementation lands as part of this design issue.
- The design preserves the task/request admission split and consumes request pressure only through read-only snapshots if needed.
- The design keeps `FairTaskQueue` responsible for ready ordering, not resource ownership.
- The design explains whether bounded-borrow semantics remain policy-compatible once resource identity becomes provider/model/resource-vector aware, or explicitly records any required policy revision.
- The future implementation acceptance bar includes liveness, no permit leaks, multi-resource fairness tests, stale/retry/salvage behavior, correlated telemetry compatibility, and #649 benchmark evidence.
- Benchmark matrix covers asymmetric provider/model capacities, different dominant resources, neutral single-resource workloads, heavy-root compatibility, and correlated scheduler/request traces.
- Benchmark evidence must report per-generation-resource zero-inflight idle time: for each provider/model/resource, the total workflow wall time where that resource has no in-flight generation request while the workflow is still active. Reports should include per-resource idle, total idle, max idle, and deltas versus the named baseline.

## Validation

Close with an accepted design document or plan update that cross-references `plans/645`, lists open questions, records rejected alternatives, and defines the exact follow-up implementation issue(s) and evidence gates.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design resource-vector and provider-aware admission policies #651

Priority Level

Source of Truth

Design Scope

Quality Gates

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Design resource-vector and provider-aware admission policies #651

Description

Priority Level

Source of Truth

Design Scope

Quality Gates

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions