[Refactor] Refactor RL unittest part 2: add pr-smoke unit test by YanhuiDua · Pull Request #1892 · InternLM/xtuner

YanhuiDua · 2026-06-08T11:56:15Z

Summary

本 PR 新增 Qwen3.5 VLM MoE 的 PR-real smoke，并将部分已被 smoke 或 pr_fast 覆盖的根目录 RL UT 删除 / 迁移，减少重复的大型 RL 测试。

PR-real Smoke 覆盖范围

新增测试：

tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py

该 smoke 实际跑 2 step RLColocateTrainer.fit()，配置如下：

模型：Qwen3.5 VLM MoE 35B-A3B，QWEN3_5_MOE_PATH
后端：LMDeploy，8 GPU，rollout TP=1，EP=2
数据：Geometry3K long-tail 多模态数据，GEO3K_LONGTAIL_DATA_PATH + GEO3K_MEDIA_ROOT
Judger：GEO3KJudgerConfig(judger_name="hiyouga/geometry3k")
训练链路：AsyncReplayBuffer + AsyncProduceStrategy + partial rollout + GRPO

主要验证：

真实 async RL 训练链路能完整跑 2 step；
每步产生 64 条训练 rollout；
async produce 的 completed / aborted / leftover / failed / filtered / expired 数量正确；
train / rollout 权重同步每步执行一次；
mismatch KL 指标低于不开R3 的阈值；
VLM rollout state 保留 mm_info、image_data、response_ids、logprobs、reward["score"]；
生成 train_rollout_1.jsonl、train_rollout_2.jsonl，且 response 非空、长度合法。

说明：pr_real_smoke测试无法保证精准的精度，精度通过e2e测试来保证，并且单测会额外提供针对具体功能的测试脚本，例如ckpt的save与resume，dense模型的rl训练流程，sglang的确定性训练，内存监控等等。

YanhuiDua · 2026-06-08T12:14:52Z

@claude review

claude · 2026-06-08T12:17:25Z

+
+    def tearDown(self):
+        if ray.is_initialized():
+            ray.shutdown()


Claude: Warning: Module-level os.environ["XTUNER_USE_LMDEPLOY"] (and lines 121-123 for the other vars) will raise KeyError during test collection if the env var is unset. This means any pytest run that discovers this file without the CI environment configured will fail to collect all tests in the directory, not just skip this test.

Consider using os.environ.get(...) with a fallback and adding a pytest.importorskip-style guard or a module-level skipIf condition, similar to how test_lmdeploy_abort_integration.py (now deleted) used @unittest.skipUnless(...) at the class level.

Alternatively, if this file is only ever collected by a CI job that always has these vars, this is acceptable — but worth a brief comment to clarify the intent.

claude · 2026-06-08T12:17:29Z

    if item_path.parent.name == "rl":
        patch_rollout_config_dist_port_base()
    _cleanup_hf_dynamic_modules()
+
+
+@pytest.hookimpl(hookwrapper=True)
+def pytest_runtest_protocol(item, nextitem):
+    del nextitem
+    item_path = Path(str(getattr(item, "path", None) or item.fspath)).resolve()
+    if not item_path.is_relative_to(_RL_TESTS_ROOT):
+        yield
+        return
+
+    import ray
+
+    original_init = ray.init
+    ray.init = init_with_local_address(original_init)
+    try:
+        yield
+    finally:


Claude: Nit: The global monkey-patch of ray.init during pytest_runtest_protocol works but is fragile — if a test spawns threads that call ray.init, or if pytest-xdist is ever used, the restoration in the finally block may race. Since this only affects RL tests and the wrapper is additive (setdefault), the practical risk is low.

One minor improvement: use functools.wraps(ray_init) in init_with_local_address so that introspection/debugging still shows the original signature and docstring.

claude · 2026-06-08T12:18:06Z

+                finish_reasons.append(row.get("finish_reason"))
+
+        self.assertGreater(max(response_lens), min(response_lens), "expected response length diversity across rollout artifacts")
+        self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens")


Claude: Nit: This line exceeds the 119-character max line length configured in pyproject.toml. Consider wrapping:

Suggested change

self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens")

self.assertGreater(

max(response_lens), min(response_lens),

"expected response length diversity across rollout artifacts",

)

claude · 2026-06-08T12:18:30Z

Claude: ## Summary

This PR adds a new PR-real smoke test for Qwen3.5 VLM MoE async RL training, consolidates unit tests that were already covered by smoke/pr_fast suites into tests/rl/fast/, and patches ray.init to default to address="local" for all RL tests.

Issues

Warning

[tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py:61-124] Module-level os.environ["..."] access will cause KeyError during pytest collection if CI env vars are missing, instead of a clean skip. Consider os.environ.get() with a module-level unittest.skipUnless guard.
[tests/conftest.py:73-92] The pytest_runtest_protocol hook globally monkey-patches ray.init without functools.wraps, and could be fragile under parallel test execution (minor practical risk).

Nit

[tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py:440] Line exceeds 119-char max (code suggestion provided).

Coverage Note

The deleted test_lmdeploy_abort_integration.py and test_rollout_pause_simulation.py tested specific abort-drain timing, LMDeploy metrics verification, and Ray-level pause latency under over-concurrency. The new smoke test validates the happy-path async RL loop but doesn't directly replicate these adversarial abort scenarios. If these behaviors are verified by e2e tests elsewhere, this is fine — otherwise worth noting the coverage gap.

Verdict

COMMENT — the consolidation is clean and the new smoke test is well-structured. The inline findings are minor.

claude Bot reviewed Jun 8, 2026

View reviewed changes

YanhuiDua force-pushed the refactor_rl_ut-part2 branch from 7f7c381 to c6867af Compare June 8, 2026 12:40

YanhuiDua added 4 commits June 12, 2026 00:34

Add Qwen3.5 VLM MoE PR smoke

bf94ce5

Consolidate RL smoke-covered tests

a1dc515

use ray.init(address='local') in rl ut

30d7b7d

Reorganize RL PR tests under tests/rl

28043de

YanhuiDua force-pushed the refactor_rl_ut-part2 branch from c6867af to 28043de Compare June 12, 2026 00:35

YanhuiDua added 2 commits June 12, 2026 05:45

fix ci

8d1437b

fix ci

7a93215

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892

[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892
YanhuiDua wants to merge 6 commits into
InternLM:mainfrom
YanhuiDua:refactor_rl_ut-part2

YanhuiDua commented Jun 8, 2026 •

edited

Loading

Uh oh!

YanhuiDua commented Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Uh oh!

claude Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YanhuiDua commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR-real Smoke 覆盖范围

Uh oh!

YanhuiDua commented Jun 8, 2026

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jun 8, 2026

Issues

Warning

Nit

Coverage Note

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YanhuiDua commented Jun 8, 2026 •

edited

Loading