[KVCache] Support request-level prefix cache disable#7854
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7854 +/- ##
==========================================
Coverage ? 63.84%
==========================================
Files ? 462
Lines ? 64388
Branches ? 9874
==========================================
Hits ? 41106
Misses ? 20487
Partials ? 2795
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-19 18:08:20
📋 Review 摘要
PR 概述:新增请求级 prefix cache 禁用控制,允许单个请求绕过全局 prefix caching 的匹配、写入和释放路径
变更范围:engine/request.py、engine/resource_manager.py、engine/sched/resource_manager_v1.py、entrypoints/openai/protocol.py、docs
影响面 Tag:KVCache Engine APIServer
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | tests/v1/test_resource_manager_v1.py:726 |
测试未设置 kvcache_storage_backend,未能真正覆盖 disable_prefix_caching 的 skip 分支 |
| ❓ 疑问 | fastdeploy/engine/sched/resource_manager_v1.py:1433 |
get_prefix_cached_blocks 早返回路径计算 cache_info 缺少 enc_dec_block_num |
| ❓ 疑问 | fastdeploy/engine/sched/resource_manager_v1.py:1726 |
finish_requests 引入的 kvcache_storage_backend guard 超出本 PR 功能范围,属隐式行为变更 |
📝 PR 规范检查
PR 标题含合法 Tag [KVCache],格式规范;描述五个章节完整,Checklist 勾选合理(Accuracy Tests 未勾且在正文已说明不涉及精度测试)。规范合规 ✓
总体评价
整体设计清晰,通过 _enable_prefix_cache_for_request 辅助方法统一控制请求级禁用,代码侵入面小;全链路(protocol → request → resource_manager)传递完整,legacy 和 v1 两套实现均同步更新。主要关注点:一处测试有效性需补强,两处细节请作者确认后可合入。
|
|
||
| manager.finish_requests([request.request_id]) | ||
|
|
||
| manager.cache_manager.write_cache_to_storage.assert_not_called() |
There was a problem hiding this comment.
🟡 建议 测试未能有效验证 disable_prefix_caching 的跳过逻辑
本测试未设置 manager.config.cache_config.kvcache_storage_backend,其默认值为 falsy,因此 finish_requests 外层的 if self.config.cache_config.kvcache_storage_backend: 直接短路,根本不进入写入循环。assert_not_called 通过,但验证的是「没有 backend 时不写入」而非「disable_prefix_caching=True 时跳过写入」,目标分支实际上未被覆盖。
建议在测试中加上 backend 设置后再验证:
manager.config.cache_config.kvcache_storage_backend = "mock_backend"
# ... 其余不变 ...
manager.finish_requests([request.request_id])
manager.cache_manager.write_cache_to_storage.assert_not_called()
manager.cache_manager.write_cache_to_storage_decode.assert_not_called()| """ | ||
| if not self._enable_prefix_cache_for_request(request): | ||
| block_size = self.config.cache_config.block_size | ||
| request.cache_info = [0, (request.need_prefill_tokens + block_size - 1) // block_size] |
There was a problem hiding this comment.
❓ 疑问 cache_info 计算可能遗漏 enc_dec_block_num
早返回路径使用:
request.cache_info = [0, (request.need_prefill_tokens + block_size - 1) // block_size]而 preallocate_resource_in_p 中使用:
need_prealloc_prefill_blocks = (
request.need_prefill_tokens + block_size - 1
) // block_size + self.config.cache_config.enc_dec_block_num对于 enc_dec 模型(enc_dec_block_num > 0),两处计算不一致,可能导致后续块分配数不足。请确认 cache_info[1] 是否需要加上 enc_dec_block_num。
| else: | ||
| # P instance / Mixed instance uses standard write method (relies on Radix Tree) | ||
| self.cache_manager.write_cache_to_storage(req) | ||
| if self.config.cache_config.kvcache_storage_backend: |
There was a problem hiding this comment.
❓ 疑问 引入 kvcache_storage_backend guard 超出本 PR 范围
原始代码对 need_postprocess_reqs 中的每个请求无条件调用 write_cache_to_storage*,新代码在最外层加了 if self.config.cache_config.kvcache_storage_backend: 检查。
这是一处隐式的行为变更(超出 disable_prefix_caching 特性范围):当 kvcache_storage_backend 未配置时,原来仍会调用写入方法(结果视方法实现而定),新代码则完全跳过。若 write_cache_to_storage 在 kvcache_storage_backend=None 时是安全的 no-op,此改动影响有限;若方法内有实际操作,则可能改变现有语义。
建议:若此改动是有意修复,请在 PR 描述的 Modifications 中单独说明;否则考虑拆分以保持变更内聚。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览✅ 所有 Required 任务全部通过,可以合并(2 个可选任务失败,不阻塞合并)。
2 任务状态汇总2.1 Required 任务:10/10 通过
2.2 可选任务 — 29/31 通过
3 失败详情(仅 Required)无 Required 失败任务。 |
Motivation
支持请求级禁用 prefix caching。部分请求需要跳过 prefix cache 的匹配、写入和释放复用路径,以避免污染或复用全局缓存;默认值保持 False,继续遵循全局 prefix caching 配置。
Modifications
disable_prefix_caching参数,并贯通到内部Request序列化/反序列化。Usage or Command
# 单测 /root/paddlejob/inference-public/chengyanfu/.venv/py310/bin/python -m pytest \ tests/engine/test_request.py \ tests/engine/test_resource_manager.py \ tests/v1/test_resource_manager_v1.py -q请求示例:
Accuracy Tests
不涉及模型计算逻辑或算子变更,未执行精度测试。
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.