[KVCache] Add e2e v1 cache manager tests for prefix caching and swap mechanism verification#7814
Conversation
## Motivation
新增 CacheManager 端到端测试,验证前缀缓存(Prefix Caching)在 FastDeploy 推理服务中的实际运行效果。测试覆盖冷启动、重复请求命中、共享前缀命中、metrics 端点及非流式请求等场景,确保缓存功能在生产环境中正常运行。
## Modifications
- 新增 tests/e2e/test_cache_manager.py,包含 5 个端到端测试用例
- test_cache_cold_start: 冷启动请求验证 cached_tokens=0
- test_cache_hit_on_repeat: 重复请求验证缓存命中(cached_tokens > 0)
- test_cache_shared_prefix: 多轮对话共享 system prompt 前缀验证缓存命中
- test_cache_metrics_endpoint: /metrics 端点包含缓存相关指标
- test_cache_non_stream: 非流式请求缓存命中验证
## Usage or Command
```bash
# 启动服务(需要 MODEL_PATH 环境变量)
export MODEL_PATH=/path/to/models
python -m pytest tests/e2e/test_cache_manager.py -vv -s
# 关键启动参数
python -m fastdeploy.entrypoints.openai.api_server \
--model <model_path> \
--enable-prefix-caching \
--swap-space 20 \
--max-model-len 32768 \
--max-num-seqs 128
```
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览所有 Required 任务均已通过 ✅,PR 可正常合并。有 4 个 Optional 任务仍在等待中(不影响合并)。
2 任务状态汇总2.1 Required任务:2/2 通过
2.2 可选任务 — 7/11 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7814 +/- ##
==========================================
Coverage ? 63.52%
==========================================
Files ? 461
Lines ? 64580
Branches ? 9897
==========================================
Hits ? 41026
Misses ? 20728
Partials ? 2826
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-14 20:18:04
📋 Review 摘要
PR 概述:新增 V1 CacheManager 端到端测试,覆盖前缀缓存、SWAP 机制、LRU 淘汰及缓存监控指标验证
变更范围:tests/ci_use/Prefix_Caching_Swap/
影响面 Tag:[KVCache] [CI]
📝 PR 规范检查
PR 标题格式合规([KVCache] 为官方 Tag),描述包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 所有必填 section,内容充实,规范通过 ✓
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | test_v1_cache_manager.py:335 |
test_lru_eviction_policy 仅断言 status==200,未验证 LRU 淘汰顺序,测试可信度低 |
| ❓ 疑问 | test_v1_cache_manager.py:47 |
make_usage_payload 等三个 payload 方法均缺少 model 字段,可能导致全部用例 422 失败 |
⚠️ 覆盖说明:PR 描述提到文件共 630 行 / 6 个测试用例,但本次 diff 仅包含 335 行(test_basic_prefix_cache_functionality和test_lru_eviction_policy),其余 4 个测试用例(swap、metrics、robustness 等)未在 diff 中出现,本次 Review 无法覆盖。
总体评价
整体测试框架设计合理,fixture 生命周期管理、流式响应解析逻辑清晰。主要问题是 LRU 测试断言太弱,建议补强校验逻辑;同时请确认 payload 中 model 字段的省略是否被服务端接受,避免 CI 全量失败。
Motivation
新增 V1 CacheManager 端到端测试,验证前缀缓存(Prefix Caching)、GPU/CPU 交换机制(SWAP2CPU/SWAP2GPU)、LRU 淘汰策略及缓存监控指标在实际推理服务中的运行效果。当前 CacheManager 缺少端到端测试覆盖,无法自动化验证缓存功能正确性,本 PR 填补这一空白。
Modifications
tests/ci_use/Prefix_Caching_Swap/test_v1_cache_manager.py(630 行),包含 6 个端到端测试用例:test_basic_prefix_cache_functionality: 冷启动验证 cached_tokens=0,重复请求及多轮对话共享前缀验证缓存命中test_lru_eviction_policy: 填满缓存后按 LRU 顺序重新访问,验证淘汰策略正确性Usage or Command
Accuracy Tests
N/A(本 PR 仅新增测试用例,不修改模型前向或 kernel 代码)
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.