[xpu] fix interrupt error by zhupengyang · Pull Request #7805 · PaddlePaddle/FastDeploy

zhupengyang · 2026-05-13T12:37:30Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-13T12:37:38Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-13T12:46:21Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-15 10:36:49

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 7de938c
Merge base: c2df4c6 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

⏳ Required 任务运行中：2 个必选任务仍在执行，请等待完成后再合并。当前已通过 2/4 必选任务。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
21(0)	21	14	1	4	0	2

2 任务状态汇总

2.1 Required任务：2/4 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
⏳	`xpu_4cards_case_test / run_xpu_4cards_cases`	-	运行中	-	Job	-
⏳	`xpu_8cards_case_test / run_xpu_8cards_cases`	-	运行中	-	Job	-
✅	其余 2 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 12/17 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	15s	Job	-
⏳	`xpu_unit_test / run_xpu_unit_test`	-	Job	-
⏳	`Trigger Jenkins for PR`	-	Job	-
✅	其余 12 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

codecov-commenter · 2026-05-13T14:04:52Z

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@c2df4c6). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/xpu_model_runner.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             develop   #7805   +/-   ##
=========================================
  Coverage           ?   7.12%           
=========================================
  Files              ?     462           
  Lines              ?   64274           
  Branches           ?    9851           
=========================================
  Hits               ?    4577           
  Misses             ?   59608           
  Partials           ?      89

Flag	Coverage Δ
XPU	`7.12% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

EmmonsCurse · 2026-05-15T02:16:39Z

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-15 10:24:44

📋 Review 摘要

PR 概述：XPU ModelRunner 在非 EP 模式下 num_tokens 为 0 时提前退出，修复 interrupt 错误
变更范围：fastdeploy/worker/xpu_model_runner.py
影响面 Tag：[XPU] [BugFix]

📝 PR 规范检查

标题含非标准 [not merge] 前缀且 [xpu] 大小写不规范；PR 描述各 section 未填写实际内容。

标题建议（可直接复制）：

[XPU][BugFix] Fix interrupt error when num_tokens is 0

PR 描述建议（可直接复制）：

## Motivation

XPU ModelRunner 在非 Expert Parallel 模式下，当 `ids_remove_padding` 的 token 数为 0 时，直接调用 model forward 会触发 interrupt 错误。通过在 `execute_model` 中添加提前退出逻辑，避免以空 batch 执行推理。

## Modifications

- `fastdeploy/worker/xpu_model_runner.py`：在 `execute_model` 的 `padding_cudagraph_inputs()` 调用之后，增加 `num_tokens <= 0` 的 early-return 判断；EP 模式下即使 token 为 0 仍需参与集合通信，故仅在非 EP 模式下触发。

## Usage or Command

N/A

## Accuracy Tests

N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
📝 PR 规范	—	标题含非标准 `[not merge]` 前缀、`[xpu]` 大小写不规范；描述各 section 未填写实际内容
❓ 疑问	`fastdeploy/worker/xpu_model_runner.py:1304`	`num_tokens <= 0` 中 `< 0` 分支不可达，建议改为 `== 0` 以提高语义准确性

总体评价

修复逻辑正确：非 EP 模式下 token 数为 0 时提前返回，与已有 EP 模式注释保持一致，改动小风险低。请在合入前补充 PR 描述并修正标题格式。

zhupengyang had a problem deploying to Metax_ci May 13, 2026 12:37 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

zhupengyang force-pushed the fix_interrupt_error branch from ae4f8f5 to ab12fdc Compare May 13, 2026 16:26

zhupengyang had a problem deploying to Metax_ci May 13, 2026 16:26 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

zhupengyang force-pushed the fix_interrupt_error branch from ab12fdc to 442afa5 Compare May 15, 2026 02:07

zhupengyang had a problem deploying to Metax_ci May 15, 2026 02:07 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

[xpu] fix interrupt error

7de938c

zhupengyang force-pushed the fix_interrupt_error branch from 442afa5 to 7de938c Compare May 15, 2026 02:20

zhupengyang had a problem deploying to Metax_ci May 15, 2026 02:20 — with GitHub Actions Failure

zhupengyang changed the title ~~[not merge][xpu] fix interrupt error~~ [xpu] fix interrupt error May 15, 2026

PaddlePaddle-bot reviewed May 15, 2026

View reviewed changes

Comment thread fastdeploy/worker/xpu_model_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xpu] fix interrupt error#7805

[xpu] fix interrupt error#7805
zhupengyang wants to merge 1 commit into
PaddlePaddle:developfrom
zhupengyang:fix_interrupt_error

zhupengyang commented May 13, 2026

Uh oh!

paddle-bot Bot commented May 13, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 13, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

EmmonsCurse commented May 15, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhupengyang commented May 13, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 13, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务：2/4 通过

2.2 可选任务 — 12/17 通过

3 失败详情（仅 required）

Uh oh!

codecov-commenter commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

EmmonsCurse commented May 15, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PaddlePaddle-bot commented May 13, 2026 •

edited

Loading

codecov-commenter commented May 13, 2026 •

edited

Loading