[TRTLLM-12339][chore] Early-dispatch cross-KV path in KVCacheManager by eopXD · Pull Request #1 · cascade812/TensorRT-LLM

eopXD · 2026-06-11T05:55:00Z

What

Readability refactor of KVCacheManager.prepare_resources / update_resources, stacked on top of guiju/en_de3 (PR NVIDIA#13919). No functional change — purely structural.

The cross-attention support currently expresses the common self-attention path as the negation of the rare cross case (if not is_cross at ~5 sites across the two methods). Because is_cross is a per-instance constant (derived from kv_cache_type at construction), this both inverts the narrative — the dominant path reads as "not the special case" — and re-checks a fixed value on every call.

Change

prepare_resources: early-dispatch the cross pool to a dedicated _prepare_cross_resources() and return, leaving the self-attention body unconditional. Shared context-sequence collection moves into _collect_context_sequences() (single source of truth, no copy-paste of the loop).
update_resources: flip the cross guard from if not is_cross to a positive kv_cache_type != CROSS check with an intent-revealing comment; the rewind/relocation body is otherwise unchanged (no helper extraction — the guarded block is small enough to read inline).
Collapse the now-redundant batch_ctx_requests list (identical to batch_llm_requests once the guard is removed).

Equivalence

The cross instance still performs exactly the allocate-once subset: collect → add_sequence_batch → refresh_blocks — no decode-time token growth, draft-token reserve, or scheduler bookkeeping (encoder K/V is written once and never grows).
The self/draft instance runs the full path unchanged; the old if not is_cross blocks always executed for it.

Verification

pre-commit (isort / yapf v0.43.0 / autoflake / ruff-legacy / codespell) clean on the changed file.
python -m py_compile clean.
Behavioral tests (T5/BART integration, dual-pool scheduler units) need GPU + the built wheel and were not run on my local (Darwin) box — they'll run under PR [TRTLLM-12339][feat] Support T5 encoder-decoder models in the PyTorch backend NVIDIA/TensorRT-LLM#13919's CI when this is folded in.

Intent is to make the dual-pool blending read top-to-bottom as the common case. If you'd rather fold it in differently — or not at all — please feel free; happy to adjust.

prepare_resources had accumulated `if not is_cross` guards that expressed the common self-attention path as the negation of the rare cross-attention case. `is_cross` is a per-instance constant (derived from kv_cache_type at construction), so branching on it at every step re-checks something fixed for the object's lifetime. Restructure so the common path reads as the primary narrative: - prepare_resources: dispatch the cross pool to a dedicated _prepare_cross_resources() and return early, leaving the self-attention body unconditional. Shared context-sequence collection is factored into _collect_context_sequences(). - update_resources: flip the cross guard from `if not is_cross` to a positive `kv_cache_type != CROSS` check with an intent-revealing comment; the rewind/relocation body is otherwise unchanged. No functional change: the cross instance still performs exactly the allocate-once subset (collect + add_sequence_batch + refresh_blocks), and the self/draft instance runs the full path as before. The now-redundant batch_ctx_requests list (identical to batch_llm_requests once the guard is removed) is collapsed. Signed-off-by: Yueh-Ting Chen <yuehtingc@nvidia.com>

github-actions Bot assigned eopXD Jun 11, 2026

eopXD mentioned this pull request Jun 11, 2026

[TRTLLM-12339][feat] Support T5 encoder-decoder models in the PyTorch backend NVIDIA/TensorRT-LLM#13919

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12339][chore] Early-dispatch cross-KV path in KVCacheManager#1

[TRTLLM-12339][chore] Early-dispatch cross-KV path in KVCacheManager#1
eopXD wants to merge 1 commit into
cascade812:guiju/en_de3from
eopXD:eopxd/kvcm-cross-early-dispatch

eopXD commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eopXD commented Jun 11, 2026

What

Change

Equivalence

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant