Skip to content

[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892

Open
YanhuiDua wants to merge 6 commits into
InternLM:mainfrom
YanhuiDua:refactor_rl_ut-part2
Open

[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892
YanhuiDua wants to merge 6 commits into
InternLM:mainfrom
YanhuiDua:refactor_rl_ut-part2

Conversation

@YanhuiDua

@YanhuiDua YanhuiDua commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

本 PR 新增 Qwen3.5 VLM MoE 的 PR-real smoke,并将部分已被 smoke 或 pr_fast 覆盖的根目录 RL UT 删除 / 迁移,减少重复的大型 RL 测试。

PR-real Smoke 覆盖范围

新增测试:

tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py

该 smoke 实际跑 2 step RLColocateTrainer.fit(),配置如下:

  • 模型:Qwen3.5 VLM MoE 35B-A3B,QWEN3_5_MOE_PATH
  • 后端:LMDeploy,8 GPU,rollout TP=1,EP=2
  • 数据:Geometry3K long-tail 多模态数据,GEO3K_LONGTAIL_DATA_PATH + GEO3K_MEDIA_ROOT
  • Judger:GEO3KJudgerConfig(judger_name="hiyouga/geometry3k")
  • 训练链路:AsyncReplayBuffer + AsyncProduceStrategy + partial rollout + GRPO

主要验证:

  • 真实 async RL 训练链路能完整跑 2 step;
  • 每步产生 64 条训练 rollout;
  • async produce 的 completed / aborted / leftover / failed / filtered / expired 数量正确;
  • train / rollout 权重同步每步执行一次;
  • mismatch KL 指标低于 不开R3 的 阈值;
  • VLM rollout state 保留 mm_infoimage_dataresponse_idslogprobsreward["score"]
  • 生成 train_rollout_1.jsonltrain_rollout_2.jsonl,且 response 非空、长度合法。

说明:pr_real_smoke测试无法保证精准的精度,精度通过e2e测试来保证,并且单测会额外提供针对具体功能的测试脚本,例如ckpt的save与resume,dense模型的rl训练流程,sglang的确定性训练,内存监控等等。

@YanhuiDua

Copy link
Copy Markdown
Collaborator Author

@claude review


def tearDown(self):
if ray.is_initialized():
ray.shutdown()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: Module-level os.environ["XTUNER_USE_LMDEPLOY"] (and lines 121-123 for the other vars) will raise KeyError during test collection if the env var is unset. This means any pytest run that discovers this file without the CI environment configured will fail to collect all tests in the directory, not just skip this test.

Consider using os.environ.get(...) with a fallback and adding a pytest.importorskip-style guard or a module-level skipIf condition, similar to how test_lmdeploy_abort_integration.py (now deleted) used @unittest.skipUnless(...) at the class level.

Alternatively, if this file is only ever collected by a CI job that always has these vars, this is acceptable — but worth a brief comment to clarify the intent.

Comment thread tests/conftest.py Outdated
Comment on lines +73 to +92
if item_path.parent.name == "rl":
patch_rollout_config_dist_port_base()
_cleanup_hf_dynamic_modules()


@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_protocol(item, nextitem):
del nextitem
item_path = Path(str(getattr(item, "path", None) or item.fspath)).resolve()
if not item_path.is_relative_to(_RL_TESTS_ROOT):
yield
return

import ray

original_init = ray.init
ray.init = init_with_local_address(original_init)
try:
yield
finally:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: The global monkey-patch of ray.init during pytest_runtest_protocol works but is fragile — if a test spawns threads that call ray.init, or if pytest-xdist is ever used, the restoration in the finally block may race. Since this only affects RL tests and the wrapper is additive (setdefault), the practical risk is low.

One minor improvement: use functools.wraps(ray_init) in init_with_local_address so that introspection/debugging still shows the original signature and docstring.

finish_reasons.append(row.get("finish_reason"))

self.assertGreater(max(response_lens), min(response_lens), "expected response length diversity across rollout artifacts")
self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: This line exceeds the 119-character max line length configured in pyproject.toml. Consider wrapping:

Suggested change
self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens")
self.assertGreater(
max(response_lens), min(response_lens),
"expected response length diversity across rollout artifacts",
)

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude: ## Summary

This PR adds a new PR-real smoke test for Qwen3.5 VLM MoE async RL training, consolidates unit tests that were already covered by smoke/pr_fast suites into tests/rl/fast/, and patches ray.init to default to address="local" for all RL tests.

Issues

Warning

  • [tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py:61-124] Module-level os.environ["..."] access will cause KeyError during pytest collection if CI env vars are missing, instead of a clean skip. Consider os.environ.get() with a module-level unittest.skipUnless guard.
  • [tests/conftest.py:73-92] The pytest_runtest_protocol hook globally monkey-patches ray.init without functools.wraps, and could be fragile under parallel test execution (minor practical risk).

Nit

  • [tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py:440] Line exceeds 119-char max (code suggestion provided).

Coverage Note

The deleted test_lmdeploy_abort_integration.py and test_rollout_pause_simulation.py tested specific abort-drain timing, LMDeploy metrics verification, and Ray-level pause latency under over-concurrency. The new smoke test validates the happy-path async RL loop but doesn't directly replicate these adversarial abort scenarios. If these behaviors are verified by e2e tests elsewhere, this is fine — otherwise worth noting the coverage gap.

Verdict

COMMENT — the consolidation is clean and the new smoke test is well-structured. The inline findings are minor.

@YanhuiDua YanhuiDua force-pushed the refactor_rl_ut-part2 branch from 7f7c381 to c6867af Compare June 8, 2026 12:40
@YanhuiDua YanhuiDua force-pushed the refactor_rl_ut-part2 branch from c6867af to 28043de Compare June 12, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant