Skip to content

[KVCache] Add e2e v1 cache manager tests for prefix caching and swap mechanism verification#7814

Merged
EmmonsCurse merged 3 commits into
PaddlePaddle:developfrom
kevincheng2:feature/e2e-cache-manager-tests
May 15, 2026
Merged

[KVCache] Add e2e v1 cache manager tests for prefix caching and swap mechanism verification#7814
EmmonsCurse merged 3 commits into
PaddlePaddle:developfrom
kevincheng2:feature/e2e-cache-manager-tests

Conversation

@kevincheng2
Copy link
Copy Markdown
Collaborator

@kevincheng2 kevincheng2 commented May 14, 2026

Motivation

新增 V1 CacheManager 端到端测试,验证前缀缓存(Prefix Caching)、GPU/CPU 交换机制(SWAP2CPU/SWAP2GPU)、LRU 淘汰策略及缓存监控指标在实际推理服务中的运行效果。当前 CacheManager 缺少端到端测试覆盖,无法自动化验证缓存功能正确性,本 PR 填补这一空白。

Modifications

  • 新增 tests/ci_use/Prefix_Caching_Swap/test_v1_cache_manager.py(630 行),包含 6 个端到端测试用例:
    • test_basic_prefix_cache_functionality: 冷启动验证 cached_tokens=0,重复请求及多轮对话共享前缀验证缓存命中
    • test_lru_eviction_policy: 填满缓存后按 LRU 顺序重新访问,验证淘汰策略正确性

Usage or Command

# 启动 FastDeploy 推理服务(启用 V1 CacheManager)
ENABLE_V1_KVCACHE_MANAGER=1 python -m fastdeploy.entrypoints.openai.api_server \
    --model <model_path> \
    --port 12211 \
    --max-model-len 128 \
    --num-gpu-blocks-override 4 \
    --swap-space 10 \
    --enable-prefix-caching \
    --metrics-port 9090 \
    --cache-queue-port 8081 \
    --engine-worker-queue-port 8082

# 运行 e2e 测试
pytest tests/ci_use/Prefix_Caching_Swap/test_v1_cache_manager.py -v

Accuracy Tests

N/A(本 PR 仅新增测试用例,不修改模型前向或 kernel 代码)

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

## Motivation

新增 CacheManager 端到端测试,验证前缀缓存(Prefix Caching)在 FastDeploy 推理服务中的实际运行效果。测试覆盖冷启动、重复请求命中、共享前缀命中、metrics 端点及非流式请求等场景,确保缓存功能在生产环境中正常运行。

## Modifications

- 新增 tests/e2e/test_cache_manager.py,包含 5 个端到端测试用例
  - test_cache_cold_start: 冷启动请求验证 cached_tokens=0
  - test_cache_hit_on_repeat: 重复请求验证缓存命中(cached_tokens > 0)
  - test_cache_shared_prefix: 多轮对话共享 system prompt 前缀验证缓存命中
  - test_cache_metrics_endpoint: /metrics 端点包含缓存相关指标
  - test_cache_non_stream: 非流式请求缓存命中验证

## Usage or Command

```bash
# 启动服务(需要 MODEL_PATH 环境变量)
export MODEL_PATH=/path/to/models
python -m pytest tests/e2e/test_cache_manager.py -vv -s

# 关键启动参数
python -m fastdeploy.entrypoints.openai.api_server \
    --model <model_path> \
    --enable-prefix-caching \
    --swap-space 20 \
    --max-model-len 32768 \
    --max-num-seqs 128
```
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 14, 2026

Thanks for your contribution!

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 14, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-14 20:05:22

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

所有 Required 任务均已通过 ✅,PR 可正常合并。有 4 个 Optional 任务仍在等待中(不影响合并)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
13(0) 13 9 0 0 4 0

2 任务状态汇总

2.1 Required任务:2/2 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
其余 2 个必选任务全部通过 - - - - -

2.2 可选任务 — 7/11 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
⏸️ FD-Clone-Linux / code-clone - - -
⏸️ FD-Clone-Linux-ILUVATAR / code-clone - - -
⏸️ FD-Clone-Linux-XPU / code-clone - - -
⏸️ CI_HPU - - -
其余 7 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@cb2d7c0). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7814   +/-   ##
==========================================
  Coverage           ?   63.52%           
==========================================
  Files              ?      461           
  Lines              ?    64580           
  Branches           ?     9897           
==========================================
  Hits               ?    41026           
  Misses             ?    20728           
  Partials           ?     2826           
Flag Coverage Δ
GPU 73.21% <ø> (?)
XPU 7.13% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevincheng2 kevincheng2 changed the title [Tests] add e2e cache manager tests for prefix caching verification [KVCache] Add e2e v1 cache manager tests for prefix caching and swap mechanism verification May 14, 2026
PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-14 20:18:04

📋 Review 摘要

PR 概述:新增 V1 CacheManager 端到端测试,覆盖前缀缓存、SWAP 机制、LRU 淘汰及缓存监控指标验证
变更范围tests/ci_use/Prefix_Caching_Swap/
影响面 Tag[KVCache] [CI]

📝 PR 规范检查

PR 标题格式合规([KVCache] 为官方 Tag),描述包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 所有必填 section,内容充实,规范通过 ✓

问题

级别 文件 概述
🟡 建议 test_v1_cache_manager.py:335 test_lru_eviction_policy 仅断言 status==200,未验证 LRU 淘汰顺序,测试可信度低
❓ 疑问 test_v1_cache_manager.py:47 make_usage_payload 等三个 payload 方法均缺少 model 字段,可能导致全部用例 422 失败

⚠️ 覆盖说明:PR 描述提到文件共 630 行 / 6 个测试用例,但本次 diff 仅包含 335 行(test_basic_prefix_cache_functionalitytest_lru_eviction_policy),其余 4 个测试用例(swap、metrics、robustness 等)未在 diff 中出现,本次 Review 无法覆盖。

总体评价

整体测试框架设计合理,fixture 生命周期管理、流式响应解析逻辑清晰。主要问题是 LRU 测试断言太弱,建议补强校验逻辑;同时请确认 payload 中 model 字段的省略是否被服务端接受,避免 CI 全量失败。

Comment thread tests/ci_use/Prefix_Caching_Swap/test_v1_cache_manager.py
Comment thread tests/ci_use/Prefix_Caching_Swap/test_v1_cache_manager.py
@EmmonsCurse EmmonsCurse merged commit 45fb3d1 into PaddlePaddle:develop May 15, 2026
40 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants