[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892
[Refactor] Refactor RL unittest part 2: add pr-smoke unit test#1892YanhuiDua wants to merge 6 commits into
Conversation
|
@claude review |
|
|
||
| def tearDown(self): | ||
| if ray.is_initialized(): | ||
| ray.shutdown() |
There was a problem hiding this comment.
Claude: Warning: Module-level os.environ["XTUNER_USE_LMDEPLOY"] (and lines 121-123 for the other vars) will raise KeyError during test collection if the env var is unset. This means any pytest run that discovers this file without the CI environment configured will fail to collect all tests in the directory, not just skip this test.
Consider using os.environ.get(...) with a fallback and adding a pytest.importorskip-style guard or a module-level skipIf condition, similar to how test_lmdeploy_abort_integration.py (now deleted) used @unittest.skipUnless(...) at the class level.
Alternatively, if this file is only ever collected by a CI job that always has these vars, this is acceptable — but worth a brief comment to clarify the intent.
| if item_path.parent.name == "rl": | ||
| patch_rollout_config_dist_port_base() | ||
| _cleanup_hf_dynamic_modules() | ||
|
|
||
|
|
||
| @pytest.hookimpl(hookwrapper=True) | ||
| def pytest_runtest_protocol(item, nextitem): | ||
| del nextitem | ||
| item_path = Path(str(getattr(item, "path", None) or item.fspath)).resolve() | ||
| if not item_path.is_relative_to(_RL_TESTS_ROOT): | ||
| yield | ||
| return | ||
|
|
||
| import ray | ||
|
|
||
| original_init = ray.init | ||
| ray.init = init_with_local_address(original_init) | ||
| try: | ||
| yield | ||
| finally: |
There was a problem hiding this comment.
Claude: Nit: The global monkey-patch of ray.init during pytest_runtest_protocol works but is fragile — if a test spawns threads that call ray.init, or if pytest-xdist is ever used, the restoration in the finally block may race. Since this only affects RL tests and the wrapper is additive (setdefault), the practical risk is low.
One minor improvement: use functools.wraps(ray_init) in init_with_local_address so that introspection/debugging still shows the original signature and docstring.
| finish_reasons.append(row.get("finish_reason")) | ||
|
|
||
| self.assertGreater(max(response_lens), min(response_lens), "expected response length diversity across rollout artifacts") | ||
| self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens") |
There was a problem hiding this comment.
Claude: Nit: This line exceeds the 119-character max line length configured in pyproject.toml. Consider wrapping:
| self.assertIn("stop", finish_reasons, "expected at least one rollout sample to stop before max_tokens") | |
| self.assertGreater( | |
| max(response_lens), min(response_lens), | |
| "expected response length diversity across rollout artifacts", | |
| ) |
|
Claude: ## Summary This PR adds a new PR-real smoke test for Qwen3.5 VLM MoE async RL training, consolidates unit tests that were already covered by smoke/pr_fast suites into IssuesWarning
Nit
Coverage NoteThe deleted VerdictCOMMENT — the consolidation is clean and the new smoke test is well-structured. The inline findings are minor. |
7f7c381 to
c6867af
Compare
c6867af to
28043de
Compare
Summary
本 PR 新增 Qwen3.5 VLM MoE 的 PR-real smoke,并将部分已被 smoke 或 pr_fast 覆盖的根目录 RL UT 删除 / 迁移,减少重复的大型 RL 测试。
PR-real Smoke 覆盖范围
新增测试:
tests/rl/fast/pr_real_smoke/test_qwen35_vl_moe_async_smoke.py该 smoke 实际跑 2 step
RLColocateTrainer.fit(),配置如下:QWEN3_5_MOE_PATHGEO3K_LONGTAIL_DATA_PATH+GEO3K_MEDIA_ROOTGEO3KJudgerConfig(judger_name="hiyouga/geometry3k")AsyncReplayBuffer+AsyncProduceStrategy+ partial rollout + GRPO主要验证:
mm_info、image_data、response_ids、logprobs、reward["score"];train_rollout_1.jsonl、train_rollout_2.jsonl,且 response 非空、长度合法。说明:pr_real_smoke测试无法保证精准的精度,精度通过e2e测试来保证,并且单测会额外提供针对具体功能的测试脚本,例如ckpt的save与resume,dense模型的rl训练流程,sglang的确定性训练,内存监控等等。