Skip to content

[Others] update flash mask version#7819

Open
BingooYang wants to merge 2 commits into
PaddlePaddle:developfrom
BingooYang:up_flashmaske
Open

[Others] update flash mask version#7819
BingooYang wants to merge 2 commits into
PaddlePaddle:developfrom
BingooYang:up_flashmaske

Conversation

@BingooYang
Copy link
Copy Markdown
Contributor

@BingooYang BingooYang commented May 14, 2026

Motivation

flashinfer升级到0.6.11版本要求nvidia-cutlass-dsl>=4.4.2(https://github.com/PaddlePaddle/FastDeploy/pull/7799),flash mask旧版本锁死了nvidia-cutlass-dsl==4.4.2,产生冲突,因此升级一下flash mask版本

Modifications

升级flash mask版本
版本信息记录在:https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/YeqWcBGW4m/EUBpKxHfTurV5G

Usage or Command

NA

Accuracy Tests

NA

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 14, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 14, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-20 03:21:58

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

2 个 required 任务失败,需优先处理后方可合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 38 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h19m PR问题:新增 flash_attn_backend.py 7行覆盖率 0% 为 L88-99 新增单测或申请豁免 Job -
Approval 8s PR问题:新增 logger.info 触发日志修改审批规则 请 xyxinyang 或 zyyzghb Approve Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 30/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m47s Job -
⏸️ CI_HPU - - -
其余 30 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不达标(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 覆盖率不达标
  • 置信度: 高
  • 根因摘要: PR 新增 flash_attn_backend.py 的7行逻辑覆盖率为 0%
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例: 无(单元测试全部通过,覆盖率检查失败)

根因详情:
PR 在 flash_attn_backend.py 第88-99行新增了 try/except 的 flash mask 条件导入逻辑,这些新增行(L88, L89, L91, L92, L96, L98, L99)没有被任何单元测试覆盖,diff 覆盖率为 0%(7/7 行均为 violation),低于 80% 的覆盖率阈值,触发 COVERAGE_EXIT_CODE=9,最终 exit code 9。注意单元测试本身全部通过(TEST_EXIT_CODE=0),仅覆盖率检查失败。

关键日志:

COVERAGE_EXIT_CODE: 9
GPU Patch Coverage Details:
{"src_stats": {"fastdeploy/model_executor/layers/attention/flash_attn_backend.py":
  {"percent_covered": 0.0, "violation_lines": [88, 89, 91, 92, 96, 98, 99],
   "covered_lines": []}},
 "total_num_lines": 7, "total_num_violations": 7, "total_percent_covered": 0}
##[error]Process completed with exit code 9.

修复建议:

  1. fastdeploy/model_executor/layers/attention/flash_attn_backend.py L88-99 新增 mock 单元测试,覆盖 is_flash_mask_available() 返回 True/False 两个分支,以及 ImportError/ModuleNotFoundError 异常分支
  2. 若 CI 环境无法模拟 GPU 相关 import,可在 CI 配置中为该文件的条件导入代码申请覆盖率豁免

修复建议摘要: 为 flash_attn_backend.py L88-99 新增单测或申请豁免

关联变更: fastdeploy/model_executor/layers/attention/flash_attn_backend.py L88-99(新增 flash mask 条件导入 try/except 逻辑)
链接: 查看日志

Approval — 代码规范(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 代码规范
  • 置信度: 高
  • 根因摘要: PR新增logger.info触发日志修改审批规则,需指定RD审批
  • 分析器: 通用分析(fallback)

根因详情:
check_approval.sh 脚本检测到 PR diff 中存在日志行为修改(新增 logger.info(f"The current platform[sm{get_sm_version()}] can't import Flash Attention V4.")),触发了规则:修改 .info/.debug/.error/log_request 日志行为需要 FastDeploy RD(xyxinyang 或 zyyzghb)之一的 Review Approval。当前未满足此审批条件,脚本以 exit code 6 退出。

修复建议:

  1. 请 xyxinyang (zhouchong) 或 zyyzghb (zhangyongyue) 在此 PR 上进行 Approve 审批

修复建议摘要: 请 xyxinyang 或 zyyzghb Approve 此 PR

关联变更: PR 新增了 logger.info(...) 日志语句(涉及日志行为修改)
链接: 查看日志

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

❌ Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@dad5a43). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...el_executor/layers/attention/flash_attn_backend.py 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7819   +/-   ##
==========================================
  Coverage           ?   63.33%           
==========================================
  Files              ?      462           
  Lines              ?    64371           
  Branches           ?     9872           
==========================================
  Hits               ?    40769           
  Misses             ?    20835           
  Partials           ?     2767           
Flag Coverage Δ
GPU 72.44% <0.00%> (?)
XPU 7.12% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zoooo0820
zoooo0820 previously approved these changes May 14, 2026
Copy link
Copy Markdown
Collaborator

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-18 10:27:43

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

⏳ Required 任务进行中:5 个运行中,1 个等待中,暂无 Required 失败。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
37(0) 37 27 1 7 2 0

2 任务状态汇总

2.1 Required任务 : 3/9 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Extracted partial CE model tasks to run in CI. / run_ce_cases - 运行中 - Job -
Run Base Tests / base_tests - 运行中 - Job -
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage - 运行中 - Job -
xpu_4cards_case_test / run_xpu_4cards_cases - 运行中 - Job -
xpu_8cards_case_test / run_xpu_8cards_cases - 运行中 - Job -
⏸️ Run Four Cards Tests / run_4_cards_tests - 等待中 - - -
其余 3 个必选任务通过 - - - - -

2.2 可选任务 — 24/28 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Trigger Jenkins for PR 10m44s Job -
Run iluvatar Tests / run_iluvatar_cases - Job -
xpu_unit_test / run_xpu_unit_test - Job -
⏸️ CI_HPU - - -
其余 24 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 22:59:58

📋 Review 摘要

PR 概述:将 flash mask 从独立包迁移到 paddlefleet.ops 下,并移除 requirements.txt 中的旧版本锁定依赖。
变更范围fastdeploy/model_executor/layers/attention/flash_attn_backend.pyrequirements.txt
影响面 Tag[OP]

问题

级别 文件 概述
🔴 Bug flash_attn_backend.py:102 fa4 变量在内层异常分支未赋值,外层继续使用导致 NameError 崩溃

📝 PR 规范检查

标题格式合规,PR 描述所有必填 section 均已填写,Checklist 已勾选,规范符合要求。✓

总体评价

PR 意图清晰,依赖迁移合理。但新增的嵌套异常处理逻辑存在明确的变量作用域 Bug——当 flash_mask 不可用时 fa4 未赋值,外层代码继续访问会抛出 NameError 导致进程崩溃,需在合入前修复。

logger.info(f"The current platform[sm{get_sm_version()}] can't import Flash Attention V4.")

global flashmask_attention_v4
flashmask_attention_v4 = fa4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug fa4 变量在异常分支未被赋值,但外层代码仍访问它

is_flash_mask_available() 返回 FalseImportError/ModuleNotFoundError 被内层 except 捕获时,fa4 变量并未赋值。此时外层 try 块第 102 行 flashmask_attention_v4 = fa4 会抛出 NameError: name 'fa4' is not defined,而外层 except ImportError 无法捕获 NameError,导致程序崩溃,等同于 flash_mask 完全不可用时的行为比原来更糟。

建议修复方式:

try:
    paddle.enable_compat(scope={"cutlass"})
    fa4 = None
    try:
        from paddlefleet.ops import is_flash_mask_available
        if is_flash_mask_available():
            from paddlefleet.ops.flash_mask.cute.interface import (
                flashmask_attention as fa4,
            )
        else:
            raise ModuleNotFoundError("flash_mask not available.")
    except (ImportError, ModuleNotFoundError):
        logger.info(f"The current platform[sm{get_sm_version()}] can't import Flash Attention V4.")

    if fa4 is not None:
        global flashmask_attention_v4
        flashmask_attention_v4 = fa4
        FLASH_ATTN_VERSION = 4
        logger.info("The current platform supports Flash Attention V4.")
except ImportError:
    logger.info(f"The current platform[sm{get_sm_version()}] can't import Flash Attention V4.")

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ BingooYang
❌ root


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants