Speed up slow unit/gpu/example tests by kevalmorabia97 · Pull Request #1616 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-06-03T09:04:00Z

What does this PR do?

Type of change: test infrastructure / test speedups + CI stabilization

Make the test suite faster, tests/unit hermetic, and the CI lanes stable, without losing coverage. Most changes are mechanical test/infra edits; the buckets below cover the diff broadly.

Unit tests — hermetic (no HF Hub): toy local datasets/configs + the local tiny tokenizer (with a checked-in chat template) replace Hub assets; tests/unit/conftest.py enforces offline mode. Genuinely-HF tests moved to tests/gpu* (e.g. the new tests/gpu/torch/utils/test_dataset_utils.py). CONTRIBUTING.md documents the hermetic-unit-test expectation.

Unit-test speedups (no coverage loss): speculative (disable CPU torch.compile), calibrator (fewer histogram bins), ONNX conv/dynamo (smaller shapes + representative subset), Ruler/sparse-attention (local tokenizer), data-parallel autoquant (world size 4→2). Shared tiny_tokenizer fixture. The distributed test helper now uses a private spawn context instead of mutating the global start method (avoids cross-test contamination).

Rarely-used autonas/fastnas tests: heavy parametrize cases marked @pytest.mark.manual, one representative kept per test (fastnas preferred); lighter sibling tests still cover core behavior. The legacy FSDP1 NAS distributed test is also dropped: FastNAS/AutoNAS aren't used with either FSDP1 or FSDP2, and FSDP1 is superseded by the newer FSDP2 API — so we keep a single FSDP2 case as a sanity check and drop FSDP1, leaving the suite leaner.

gpu_megatron: deduplicate distributed worker pools by world_size within a module (saves a redundant pool spin-up in multi-pool files; module-scoped, no cross-module reuse).

Example tests: reduce per-test work via args that default to current behavior (tests pass the fast values) — torch_onnx TRT optimization level, diffusers calibration/inference steps, eagle sample_size, megatron_bridge iters/calib, llm_sparsity data slice, export safetensors-structure calib_size. Also enable the recently added gpt-oss example tests in CI.

Per-test timeouts: pytest-timeout with a default per-directory timeout (60s unit / 300s gpu+example) enforced in tests/conftest.py (timeout_func_only in pyproject.toml), so a new test cannot silently exceed the budget — an unmapped test dir crashes collection. A few inherently slow tests carry explicit higher per-test overrides (CUDA-compile, autotune, dflash).

CUDA kernel pre-compilation: a dedicated tests/gpu/_extensions test JIT-builds the conv3d implicit-GEMM kernel up front (collected before the functional tests in the same process) so the one-time build cost no longer lands on — and time out — the first functional test that uses it. Mirrored into the llm_ptq/vlm_ptq example lanes.

Test relocation & optional-dependency guards: vLLM sparsity plugin test moved to tests/gpu_vllm (drops the in-test importorskip); diffusers-dependent unit test guarded with importorskip("diffusers") for partial-install lanes; gpt_oss example test dir renamed to gpt-oss to match the CI matrix.

Diffusers test models: shared model-path constants in tests/_test_utils/examples/models.py consolidated/renamed and point at tiny hf-internal-testing test pipes (SDXL/SD3/FLUX) so cachify/quantize/export tests run on toy weights; local_ids normalized.

Shared dataset utils: examples/llm_sparsity/.../hf_pts.py now uses get_dataset_dataloader (drops the bespoke cnn_dailymail-only get_calib_dataloader; supports any registered/HF/JSONL dataset, includes attention_mask); data_prep.py gains --max_samples.

CI workflows: container image bumps (pytorch 26.04→26.05, TRT-LLM rc16→rc17) and tightened lane timeouts (unit 30→15 min, gpu lanes trimmed, onnx example lane 45 min).

Imports at top of file: in-function imports across the test suite are moved to module top per the coding guideline, conservatively — optional deps stay guarded (in-function or behind a module-level importorskip) in tests/unit since the partial-install lane runs without them, and build/hardware-availability imports (apex, triton, megatron/transformer_engine, tensorrt_llm) plus _test_utils lazy guards are left in place.

Kernel warning filters: the repeated filterwarnings blanket-ignore in six tests/gpu/torch/kernels/** modules is consolidated into a scoped hook in tests/gpu/torch/kernels/conftest.py (kernel tests only — the rest of the suite keeps surfacing warnings).

Eagle example speedups: torch.compile (eagle recipe default) added ~2 min to every eagle training test; it's now disabled in the eagle example tests except one smoke (test_llama_eagle3[1-False]), and the downstream resume / AR-validate / export tests point at the compile-free checkpoint. Measured: test_ar_validate 139s→17s, offline training 142s→22s, streaming 140s→23s — the compile path is still smoke-tested once.

Example lanes install editable (-e): so example scripts launched as subprocesses resolve modelopt to the same source path as the test process and reuse the pre-compiled CUDA-extension cache instead of recompiling (~2 min/test); verified in the TRT-LLM container.

Tiny test tokenizer: get_tiny_tokenizer defaults to left padding (what decoder-LM calibration expects) and ships a terse generation-tagged chat template — replacing a verbose ChatML one that inflated tokenized length on the 128-vocab tokenizer and broke the offline-PTQ example tests' max-seq-len filter.

Restored Hub-download coverage: the live (ungated) HF dataset round-trips exercising get_dataset_samples' download branch now live in tests/gpu/torch/utils/test_dataset_utils.py (they had been dropped from the hermetic unit file without a counterpart).

Individual file changes not explicitly called out above fall under this general test/CI cleanup.

Testing

Unit + the touched gpu_megatron files validated locally; example/GPU lanes validated in CI.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (tests + example CLI args default to prior behavior)
If you copied code from any other sources or added a new PIP dependency: N/A
Did you write any new necessary tests?: N/A (optimizes/relocates existing tests)
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ❌ (pending)

Summary by CodeRabbit

Release Notes

Chores
- Updated container image versions for PyTorch (26.04→26.05), TensorRT-LLM (1.3.0rc16→1.3.0rc17), and ONNX/TensorRT (26.04→26.05).
Tests
- Enhanced test isolation: unit tests now run hermetically without HuggingFace Hub access.
- Optimized test runtime via smaller model/dataset parameters and parallel test caching.
- Added CUDA extension availability tests and extended dataset utility coverage.
Documentation
- Updated testing guidelines in CONTRIBUTING.md to emphasize offline test design.
Chores
- Added pytest timeout configuration and improved CI/CD workflow efficiency with editable installs.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Optimize the slowest CPU unit tests (no coverage loss): - speculative: disable torch.compile (no-op on CPU) + module-scope fixture + smaller eagle arch (43s+25s -> <0.1s each) - calibrator: reduce histogram num_bins 2048->512/256 (~25s -> ~3s) - onnx conv/convtranspose: shrink structure-only model shapes (~21s -> ~1s) - ruler dataset: use local tiny tokenizer instead of gpt2 (hermetic + faster) - onnx dynamo export: one dtype + representative variant subset (~70s -> ~7s) - dist/autoquant: fork (not spawn) for gloo on CPU runners + world_size 4->2 Share a tiny_tokenizer fixture in tests/unit/torch/conftest.py. Gate slow autonas/fastnas tests behind @pytest.mark.manual, keeping one representative case per test enabled for sanity (preferring fastnas over autonas); core NAS behavior stays covered by lighter sibling tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-06-03T09:04:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR reduces unit-test workload and enforces hermetic offline execution by replacing remote model/dataset dependencies with local fixtures, shrinking model dimensions and calibration samples across tests, curating NAS/Dynamo test matrices, adding worker pool caching to reduce distributed setup overhead, and updating CI container versions and job timeouts.

Changes

Test Infrastructure and Execution Updates

Layer / File(s)	Summary
Offline policy and HF hub enforcement `CONTRIBUTING.md`, `tests/unit/conftest.py`, `pyproject.toml`	Adds hermetic unit-test requirements to CONTRIBUTING.md, enables offline HF behavior via environment variables and constants in unit conftest, and configures pytest to apply timeouts only to test execution (excluding fixture setup/teardown).
Tiny tokenizer fixture and Jinja chat template `tests/_test_utils/torch/transformers_models.py`, `tests/_test_utils/torch/tokenizer/chat_template.jinja`, `tests/_test_utils/torch/tokenizer/tokenizer_config.json`, `tests/conftest.py`	Updates get_tiny_tokenizer to accept configurable pad_side, introduces Jinja chat template with generation masking support, removes inline chat_template from tokenizer config, and refactors conftest to use lazy imports.
Local model configuration fixtures `tests/unit/onnx/test_gqa_graph_surgery.py`, `tests/unit/torch/speculative/plugins/test_hf_speculative.py`, `tests/unit/torch/speculative/plugins/test_hf_speculative_lora.py`, `tests/_test_utils/examples/models.py`, `tests/examples/diffusers/test_cache_diffusion.py`, `tests/examples/diffusers/test_diffusers.py`, `tests/examples/diffusers/test_export_diffusers_hf_ckpt.py`	Replaces remote Hugging Face model dependencies with local config directories for Qwen2 GQA graph surgery, shrinks EAGLE speculative model dimensions for unit tests, removes typos and consolidates SDXL/Diffusers model path constants.
Local dataset fixtures and offline loading `tests/unit/torch/utils/test_dataset_utils.py`, `examples/llm_sparsity/weight_sparsity/data_prep.py`, `examples/llm_sparsity/weight_sparsity/hf_pts.py`	Refactors dataset tests around local synthetic datasets, adds max_samples and dataset CLI knobs, removes HF Hub dependency from unit tests, switches hf_pts.py to modelopt dataset loader with left padding tokenizer configuration.
GPU Nemotron dataset registry validation `tests/gpu/torch/utils/test_dataset_utils.py`	Adds GPU test module covering Nemotron dataset registry shape validation, gated smoke download tests for 7 registry keys, and end-to-end Hub round-trip coverage for HF dataset loading and JSONL dump/reload equivalence.
Smaller model and calibration workloads `tests/_test_utils/onnx/lib_test_models.py`, `tests/_test_utils/torch/quantization/tensor_quantizer_common.py`, `tests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.py`, `tests/unit/onnx/quantization/test_convtranspose_qdq.py`, `tests/unit/torch/quantization/test_calibrator.py`, `tests/unit/torch/speculative/plugins/test_hf_dflash.py`, `tests/examples/llm_sparsity/weight_sparsity/test_llama_sparsify.py`, `tests/examples/megatron_bridge/test_*.py`	Reduces ONNX model tensor shapes, histogram binning counts (512–256 bins), model intermediate dimensions, test matrices, and calibration samples; updates training iterations and inference step counts.
Distributed execution helpers `tests/_test_utils/torch/distributed/utils.py`, `tests/gpu_megatron/conftest.py`, `tests/unit/torch/quantization/test_autoquant.py`	Refactors multiprocessing context to avoid global state mutation via local spawn contexts, implements Megatron worker pool caching across fixtures to reduce repeated setup, and reduces distributed quantization test to 2 ranks.
Curated NAS and Dynamo export test matrices `tests/gpu/torch/nas/test_distributed_model.py`, `tests/gpu/torch/nas/test_search_space_with_vision_models.py`, `tests/unit/torch/nas/test_full_algorithms_vision.py`, `tests/unit/torch/deploy/utils/test_torch_onnx_utils.py`	Reduces default test matrices to representative cases (single model per test, one numeric type), marks additional parametrized combinations as manual-only, removes redundant FSDP1 test, and simplifies backward verification.
GPU extension compilation and timeout management `tests/gpu/_extensions/test_torch_extensions.py`, `tests/gpu/_extensions/test_torch_kernels.py`, `tests/examples/llm_ptq/_extensions/test_torch_extensions.py`, `tests/examples/vlm_ptq/_extensions/test_torch_extensions.py`, `tests/gpu/onnx/quantization/autotune/test_workflow.py`	Adds module-level 180s timeouts for CUDA extension JIT compilation, introduces conv3d implicit-GEMM precompilation test, and applies explicit timeout parametrization for TensorRT build tests.
Example script runtime knobs `examples/diffusers/quantization/diffusion_trt.py`, `examples/torch_onnx/torch_quant_to_onnx.py`	Adds parameterizable CLI options (num_inference_steps, trt_builder_optimization_level) to reduce execution workload and allow fine-grained test control.
Example test and data-prep tuning `tests/examples/diffusers/test_.py`, `tests/examples/megatron_bridge/test_.py`, `tests/examples/gpt-oss/test_gpt_oss_qat.py`, `tests/examples/speculative_decoding/test_eagle.py`, `tests/examples/torch_onnx/test_torch_quant_to_onnx.py`, `tests/examples/llm_sparsity/weight_sparsity/test_llama_sparsify.py`	Updates example tests to use reduced calibration samples (4–64), fewer training iterations (2–64), smaller model configurations, local model generation, parameterized inference steps, and extended timeouts for slower tests.
CI workflow container versions and job timeouts `.github/workflows/example_tests.yml`, `.github/workflows/gpu_tests.yml`, `.github/workflows/unit_tests.yml`, `.github/workflows/_example_tests_runner.yml`, `tests/conftest.py`	Updates PyTorch (26.04→26.05), TensorRT-LLM (1.3.0rc16→1.3.0rc17), and TensorRT (26.04→26.05) container tags; reduces job timeout-minutes from 30 to 15; applies default per-group timeout markers during test collection; switches example test runner to editable pip installs.
Test dependency gates `tests/unit/torch/quantization/plugins/test_diffusers_wan_conv3d.py`, `tests/gpu_vllm/torch/sparsity/attention_sparsity/test_vllm_plugin.py`	Adds diffusers import-skip guard to quantization tests and removes vllm conditional skip (making vllm import unconditional).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

ChenhanYu
jenchen13
yueshen2016
Edwardf0t1

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/test-optimization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/unit/torch/speculative/plugins/test_hf_speculative_lora.py (1)
52-56: 💤 Low value

Module-scoped fixture is appropriate for this optimization.

The scope change from function to module reduces setup overhead. Note that test_forward_returns_loss calls model.train() on the shared fixture, which could create test order dependencies. However, the subsequent tests don't appear to assume eval mode, so this should be fine in practice.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/speculative/plugins/test_hf_speculative_lora.py` around
lines 52 - 56, You changed the lora_eagle_model fixture scope to module, which
is fine for performance but test_forward_returns_loss calls model.train() and
may leave the shared model in train mode for other tests; update
test_forward_returns_loss (or any test that calls model.train()) to restore the
model state after running (e.g., call model.eval() or use a context that resets
training mode) so the shared fixture is not mutated across tests, referencing
the lora_eagle_model fixture and the test_forward_returns_loss test to locate
the change.
tests/unit/conftest.py (1)
24-27: ⚡ Quick win

Add a comment explaining the conditional import.

The import of huggingface_hub.constants is guarded to handle it being an optional dependency, but a brief comment naming the reason would improve clarity.
📝 Suggested comment
+# Guard optional dependency: huggingface_hub may not be installed
 with contextlib.suppress(ImportError):
     import huggingface_hub.constants as _hf_constants
As per coding guidelines, "Add a brief comment in those cases naming the reason" when placing imports inside functions or blocks rather than at module top.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/conftest.py` around lines 24 - 27, Add a short explanatory comment
above the guarded import inside the contextlib.suppress block explaining why
huggingface_hub.constants is imported conditionally (it is an optional
dependency in tests) and that setting _hf_constants.HF_HUB_OFFLINE = True forces
offline mode for tests; reference the guarded import (import
huggingface_hub.constants as _hf_constants), the suppressing construct
(contextlib.suppress(ImportError)), and the flag (_hf_constants.HF_HUB_OFFLINE)
so reviewers understand the intent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/example_tests.yml:
- Line 62: The non-PR job's TensorRT-LLM image tag is out of sync: update the
docker_image value used by the trtllm-non-pr job to match the trtllm-pr job's
"nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc17" so both jobs use the same image
version; locate the trtllm-non-pr job's docker_image entry and replace the
"1.3.0rc16" tag with "1.3.0rc17".

---

Nitpick comments:
In `@tests/unit/conftest.py`:
- Around line 24-27: Add a short explanatory comment above the guarded import
inside the contextlib.suppress block explaining why huggingface_hub.constants is
imported conditionally (it is an optional dependency in tests) and that setting
_hf_constants.HF_HUB_OFFLINE = True forces offline mode for tests; reference the
guarded import (import huggingface_hub.constants as _hf_constants), the
suppressing construct (contextlib.suppress(ImportError)), and the flag
(_hf_constants.HF_HUB_OFFLINE) so reviewers understand the intent.

In `@tests/unit/torch/speculative/plugins/test_hf_speculative_lora.py`:
- Around line 52-56: You changed the lora_eagle_model fixture scope to module,
which is fine for performance but test_forward_returns_loss calls model.train()
and may leave the shared model in train mode for other tests; update
test_forward_returns_loss (or any test that calls model.train()) to restore the
model state after running (e.g., call model.eval() or use a context that resets
training mode) so the shared fixture is not mutated across tests, referencing
the lora_eagle_model fixture and the test_forward_returns_loss test to locate
the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f9689516-fd34-4372-aa9e-089960550c5f

📥 Commits

Reviewing files that changed from the base of the PR and between 862ed5e and 9c259c6.

📒 Files selected for processing (27)

.github/workflows/example_tests.yml
.github/workflows/gpu_tests.yml
CONTRIBUTING.md
tests/_test_utils/onnx/lib_test_models.py
tests/_test_utils/torch/distributed/utils.py
tests/_test_utils/torch/quantization/tensor_quantizer_common.py
tests/_test_utils/torch/tokenizer/chat_template.jinja
tests/_test_utils/torch/tokenizer/tokenizer_config.json
tests/examples/gpt_oss/test_gpt_oss_qat.py
tests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.py
tests/gpu/torch/nas/test_distributed_model.py
tests/gpu/torch/nas/test_search_space_with_vision_models.py
tests/gpu/torch/utils/test_dataset_utils.py
tests/gpu_megatron/conftest.py
tests/unit/conftest.py
tests/unit/onnx/quantization/test_convtranspose_qdq.py
tests/unit/onnx/test_gqa_graph_surgery.py
tests/unit/torch/conftest.py
tests/unit/torch/deploy/utils/test_torch_onnx_utils.py
tests/unit/torch/nas/test_full_algorithms_vision.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_calibrator.py
tests/unit/torch/sparsity/attention_sparsity/test_sparse_attention_calibration.py
tests/unit/torch/speculative/plugins/test_hf_dflash.py
tests/unit/torch/speculative/plugins/test_hf_speculative.py
tests/unit/torch/speculative/plugins/test_hf_speculative_lora.py
tests/unit/torch/utils/test_dataset_utils.py

💤 Files with no reviewable changes (1)

tests/_test_utils/torch/tokenizer/tokenizer_config.json

Reduce per-test work (no tests removed/disabled; example-script args default to current behavior and only tests pass the fast values): - torch_onnx: add --trt_builder_optimization_level (test uses 0) for fast trtexec builds - diffusers: fewer calib/inference steps; add diffusion_trt --num-inference-steps; smaller cache_diffusion steps - speculative_decoding: cap eagle training to training.sample_size=64 - megatron_bridge: fewer train_iters / calib samples / shorter seq_length - llm_sparsity: data_prep --max_samples; small finetune data slice Reuse modelopt.torch.utils.dataset_utils in hf_pts.py: drop the bespoke get_calib_dataloader and call get_dataset_dataloader directly, so calibration supports any registered/HF/JSONL dataset (via a new --dataset arg) and includes attention_mask instead of input_ids only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

fork in the long-lived pytest process inherits locks held by background threads (OpenMP / torch intra-op pools), deadlocking the child (e.g. in dist.init_process_group) and hanging the job. Revert spawn_multiprocess_job to spawn; the world_size reduction remains the speedup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-06-03T09:58:06Z

/claude review

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🧹 Nitpick comments (1)

examples/llm_sparsity/weight_sparsity/data_prep.py (1)

58-64: 💤 Low value

Truncation logic is correct. The min(args.max_samples, len(ds)) bound prevents over-selection, and rebuilding via type(dataset)({...}) preserves the DatasetDict type. Reconstructing the container is a bit indirect; an in-place update reads more directly, but this is optional.

♻️ Optional readability tweak

-    if args.max_samples is not None:
-        dataset = type(dataset)(
-            {
-                split: ds.select(range(min(args.max_samples, len(ds))))
-                for split, ds in dataset.items()
-            }
-        )
+    if args.max_samples is not None:
+        for split, ds in dataset.items():
+            dataset[split] = ds.select(range(min(args.max_samples, len(ds))))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_sparsity/weight_sparsity/data_prep.py` around lines 58 - 64, The
truncation is fine but reconstructing the DatasetDict is indirect; instead
modify the existing dataset in-place: when args.max_samples is not None iterate
the DatasetDict entries and replace each split with
ds.select(range(min(args.max_samples, len(ds)))) so you update the existing
dataset object rather than calling type(dataset)(...). Locate the block that
checks args.max_samples and change it to assign dataset[split] = ds.select(...)
for each split (keeping the same min(args.max_samples, len(ds)) bound).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/diffusers/quantization/diffusion_trt.py`:
- Around line 189-194: The CLI argument --num-inference-steps is not
range-checked; add validation at parse time by enforcing it is a positive
integer: either supply a custom argparse type/validator for
"num-inference-steps" or, immediately after parse_args(), check
args.num_inference_steps > 0 and call parser.error(...) (or raise a clear
ValueError) with a descriptive message; update references where
"num-inference-steps" is declared in parser.add_argument and any subsequent uses
to rely on this validated value.

In `@examples/llm_sparsity/weight_sparsity/hf_pts.py`:
- Around line 107-114: Replace the hardcoded max_sample_length=512 when
constructing the calibration dataloader so it uses args.model_max_length (i.e.,
call get_dataset_dataloader with max_sample_length=args.model_max_length), and
enable packing by passing pack=True to get_dataset_dataloader for calibration;
also ensure padding_side is not forcing right-padding for calibration (use
tokenizer.padding_side or explicitly set padding_side="left") to avoid degrading
calibration accuracy. Reference: the calib_dataloader construction that calls
get_dataset_dataloader and the args.model_max_length/tokenizer padding settings.

In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 509-515: Validate the documented 0-5 range for the CLI option by
enforcing it at the argparse boundary: update the parser.add_argument call for
"--trt_builder_optimization_level" to restrict allowed values (e.g., use
choices=range(0,6) or a custom type/validator) so only integers 0 through 5 are
accepted, or add an explicit post-parse check that raises a clear argparse-style
error if args.trt_builder_optimization_level is outside 0-5; target the
parser.add_argument invocation for "--trt_builder_optimization_level" to
implement this validation.

---

Nitpick comments:
In `@examples/llm_sparsity/weight_sparsity/data_prep.py`:
- Around line 58-64: The truncation is fine but reconstructing the DatasetDict
is indirect; instead modify the existing dataset in-place: when args.max_samples
is not None iterate the DatasetDict entries and replace each split with
ds.select(range(min(args.max_samples, len(ds)))) so you update the existing
dataset object rather than calling type(dataset)(...). Locate the block that
checks args.max_samples and change it to assign dataset[split] = ds.select(...)
for each split (keeping the same min(args.max_samples, len(ds)) bound).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f35294c0-7823-4dab-8a27-16720df79e40

📥 Commits

Reviewing files that changed from the base of the PR and between 9c259c6 and a694e89.

📒 Files selected for processing (14)

.github/workflows/example_tests.yml
examples/diffusers/quantization/diffusion_trt.py
examples/llm_sparsity/weight_sparsity/data_prep.py
examples/llm_sparsity/weight_sparsity/hf_pts.py
examples/torch_onnx/torch_quant_to_onnx.py
tests/examples/diffusers/test_cache_diffusion.py
tests/examples/diffusers/test_diffusers.py
tests/examples/diffusers/test_export_diffusers_hf_ckpt.py
tests/examples/llm_sparsity/weight_sparsity/test_llama_sparsify.py
tests/examples/megatron_bridge/test_distill.py
tests/examples/megatron_bridge/test_prune_minitron.py
tests/examples/megatron_bridge/test_quantize_export.py
tests/examples/speculative_decoding/test_eagle.py
tests/examples/torch_onnx/test_torch_quant_to_onnx.py

✅ Files skipped from review due to trivial changes (1)

tests/examples/diffusers/test_export_diffusers_hf_ckpt.py

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/example_tests.yml

codecov · 2026-06-03T10:03:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.57%. Comparing base (e40b4d6) to head (50e26aa).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1616      +/-   ##
==========================================
- Coverage   74.58%   67.57%   -7.02%     
==========================================
  Files         482      482              
  Lines       52943    52943              
==========================================
- Hits        39489    35777    -3712     
- Misses      13454    17166    +3712

Flag	Coverage Δ
examples	`41.86% <ø> (-0.64%)`	⬇️
gpu	`28.38% <ø> (-23.17%)`	⬇️
regression	`15.20% <ø> (+0.09%)`	⬆️
unit	`53.93% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add per-test timeouts via pytest-timeout, keyed by test dir (unit 60s, gpu/example/regression 300s), call-phase only; per-test overrides where a one-time CUDA JIT compile is expected. - Pre-compile the conv3d implicit-GEMM CUDA kernel in a dedicated _extensions test so the build cost doesn't land on the first kernel test. - Relocate tests to the lane that owns their dependency: rename the gpt-oss example test dir to match the example, move the vLLM sparse-attention plugin test to gpu_vllm (drop its importorskip), and guard the diffusers import in the partial-install unit lane. - Fix the eagle example override (data.sample_size, not training.sample_size). - Consolidate diffusers test model-path constants into a single SDXL_PATH and use tiny pipelines for the plumbing tests (incl. cache_diffusion); normalize local_id basenames. - llm_sparsity hf_pts.py: reuse get_dataset_dataloader, honor --model_max_length, and left-pad for calibration. - Skip the upstream-broken nemotron-sft-agentic-v2 gated smoke dataset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (2)

tests/_test_utils/examples/models.py (1)

15-24: ⚡ Quick win

Make exported model-path constants explicit with __all__.

This helper is a shared constants surface; listing exports explicitly prevents accidental API drift when constants are renamed or added.
♻️ Proposed change
 import os
 
+__all__ = [
+    "BART_PATH",
+    "T5_PATH",
+    "MIXTRAL_PATH",
+    "WHISPER_PATH",
+    "TINY_LLAMA_PATH",
+    "QWEN_VL_PATH",
+    "SDXL_PATH",
+    "SD3_PATH",
+    "FLUX_SCHNELL_PATH",
+    "PIXART_PATH",
+]
+
 _LOCAL_ROOT = os.getenv("MODELOPT_LOCAL_MODEL_ROOT")
As per coding guidelines: "Define the public API with __all__ at the top of each module and re-export via from .module import * in package __init__.py files to keep the public API explicit and safe".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/_test_utils/examples/models.py` around lines 15 - 24, Add an explicit
__all__ to declare the module's public API so exports don't drift; for this
file, add a top-level __all__ list that includes the model-path symbols you
intend to export (e.g., "_LOCAL_ROOT" and "_select_path") and ensure any package
__init__.py re-exports use from .models import *; update the module to include
that __all__ and keep the existing names (_LOCAL_ROOT, _select_path) unchanged.

tests/gpu/torch/utils/test_dataset_utils.py (1)

63-65: ⚡ Quick win

Use xfail(strict=True) instead of a permanent skip for the known upstream issue.

A hard skip can mask upstream recovery indefinitely. xfail(strict=True) keeps the failure expected but surfaces an unexpected pass when the upstream split is fixed.

Suggested change

-        pytest.param(k, marks=pytest.mark.skip(reason="upstream search-split schema cast failure"))
+        pytest.param(
+            k,
+            marks=pytest.mark.xfail(
+                reason="upstream search-split schema cast failure",
+                strict=True,
+            ),
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/gpu/torch/utils/test_dataset_utils.py` around lines 63 - 65, The test
case for the known upstream schema cast failure is being permanently skipped,
which can hide when the issue is fixed. Update the `pytest.param` entry in
`test_dataset_utils.py` for `nemotron-sft-agentic-v2` to use
`pytest.mark.xfail(strict=True)` instead of `pytest.mark.skip`, keeping the
failure expected while still surfacing an unexpected pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/_test_utils/examples/models.py`:
- Around line 15-24: Add an explicit __all__ to declare the module's public API
so exports don't drift; for this file, add a top-level __all__ list that
includes the model-path symbols you intend to export (e.g., "_LOCAL_ROOT" and
"_select_path") and ensure any package __init__.py re-exports use from .models
import *; update the module to include that __all__ and keep the existing names
(_LOCAL_ROOT, _select_path) unchanged.

In `@tests/gpu/torch/utils/test_dataset_utils.py`:
- Around line 63-65: The test case for the known upstream schema cast failure is
being permanently skipped, which can hide when the issue is fixed. Update the
`pytest.param` entry in `test_dataset_utils.py` for `nemotron-sft-agentic-v2` to
use `pytest.mark.xfail(strict=True)` instead of `pytest.mark.skip`, keeping the
failure expected while still surfacing an unexpected pass.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab4c67a8-95c5-4a9a-93ad-f0f7420106b3

📥 Commits

Reviewing files that changed from the base of the PR and between d4f9a2c and 91997ff.

📒 Files selected for processing (22)

.github/workflows/example_tests.yml
.github/workflows/gpu_tests.yml
.github/workflows/unit_tests.yml
examples/llm_sparsity/weight_sparsity/hf_pts.py
examples/torch_onnx/torch_quant_to_onnx.py
pyproject.toml
tests/_test_utils/examples/models.py
tests/conftest.py
tests/examples/diffusers/test_cache_diffusion.py
tests/examples/diffusers/test_diffusers.py
tests/examples/diffusers/test_export_diffusers_hf_ckpt.py
tests/examples/gpt-oss/test_gpt_oss_qat.py
tests/examples/llm_ptq/_extensions/test_torch_extensions.py
tests/examples/speculative_decoding/test_eagle.py
tests/examples/vlm_ptq/_extensions/test_torch_extensions.py
tests/gpu/_extensions/test_torch_extensions.py
tests/gpu/_extensions/test_torch_kernels.py
tests/gpu/onnx/quantization/autotune/test_workflow.py
tests/gpu/torch/utils/test_dataset_utils.py
tests/gpu_vllm/torch/sparsity/attention_sparsity/test_vllm_plugin.py
tests/regression/torch/speculative/test_dflash.py
tests/unit/torch/quantization/plugins/test_diffusers_wan_conv3d.py

💤 Files with no reviewable changes (2)

tests/examples/gpt-oss/test_gpt_oss_qat.py
tests/gpu_vllm/torch/sparsity/attention_sparsity/test_vllm_plugin.py

✅ Files skipped from review due to trivial changes (1)

tests/gpu/_extensions/test_torch_extensions.py

kevalmorabia97 · 2026-06-03T11:51:52Z

/claude review

cjluo-nv

Bot review — DM the bot to share feedback.

Large but cohesive test-infra PR (52 files, +632/-374) with many small, individually-reasonable changes. Design review: this PR doesn't introduce a new abstraction — it leans on already-present tooling (pytest-timeout, mp.get_context, the existing DistributedWorkerPool, the existing tests/_test_utils/torch/tokenizer/ tiny tokenizer, existing create_tiny_*_dir helpers) rather than building a new framework. No design red flags.

A few semantic-shift items a human owner should confirm rather than auto-approve:

examples/llm_sparsity/weight_sparsity/hf_pts.py switches padding_side from "right" to "left" for calibration. The comment explains why (get_dataset_dataloader warns otherwise), but this is a behavior change to an example CLI, not just a test edit. Worth confirming the resulting calibrated checkpoints are still equivalent for downstream users who script against this example.
tests/gpu/torch/nas/test_distributed_model.py drops the FSDP1 test_fsdp parametrize entirely (only FSDP2 remains). The PR body says "redundant parametrize cases marked manual" but this case is deleted, not marked manual — confirm FSDP1 NAS coverage is intentionally retired.
tests/unit/torch/utils/test_dataset_utils.py: the live HF integration tests (@pytest.mark.integration against hf-internal-testing/dataset_with_data_files and the gated Nemotron smoke) are replaced with toy local-dir round-trips. The new gpu-lane file tests/gpu/torch/utils/test_dataset_utils.py keeps the Nemotron smoke but does not re-add a live load_dataset(<hub-id>) round-trip — i.e. nothing in CI exercises the actual HF-download branch of get_dataset_samples anymore. Confirm that's intentional.
Unit-test wall-clock lane budget cut 30→15 min — aggressive given the matrix of OS/Torch/TF combos; one slow run could fail the lane. Worth confirming a few real CI runs landed comfortably under 15 min before merging.
60s default unit-test timeout is reasonable, but pytest_collection_modifyitems raises pytest.UsageError for any unmapped top-level tests/<group>/ directory. That's good fail-loud behavior, but means anyone adding a new tests/<group>/ will get a confusing collection error until they edit tests/conftest.py. Documenting this in CONTRIBUTING.md alongside the new hermetic-unit-test note would help.

No correctness bugs spotted in the diff (the renamed model-path constants, the mp.get_context("spawn") switch, the _pool_cache worker dedup, the chat-template move into chat_template.jinja, and the symlinked _extensions/test_torch_extensions.py files all look right).

Follow-ups from PR review and CI triage: - Restore live HF dataset round-trips (TestHfTinyDataset) to the gpu lane (tests/gpu/torch/utils/test_dataset_utils.py). The hermetic-unit-test work dropped the ungated hf-internal-testing round-trips without relocating them, leaving the get_dataset_samples Hub-download branch unexercised in CI. - Move the tiny_tokenizer fixture up to the root tests/conftest.py so both unit and gpu lanes share it; drop the duplicate pad_tokenizer fixture and use tiny_tokenizer in test_gptq and test_dataset_utils. - get_tiny_tokenizer: default to left padding (pad_token already set) so callers no longer hand-roll it; document the override. - Example lanes: install editable (-e) so example scripts launched as subprocesses resolve modelopt to the same source path as the test process, letting the pre-compiled CUDA extension cache be reused instead of recompiling. - speculative_decoding: give test_offline_eagle3_training[kimi-k2.5] an explicit timeout(360) (~256s on main + margin) now that a default per-test example timeout exists. - Replace the tiny tokenizer's verbose ChatML+think chat template with a terse generation-tagged Q:/A: template. The ChatML special tokens byte-exploded on the 128-vocab tokenizer (98 vs 28 tokens), pushing compute_hidden_states test conversations past max-seq-len and breaking the offline-PTQ tests; the terse template keeps answer-only-loss generation-tag masking that unit tests need. - CONTRIBUTING: brief note on the per-test timeout defaults. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-06-03T13:18:25Z

/claude review

cjluo-nv

Bot review — DM the bot to share feedback.

Large but cohesive test-infra PR (54 files, +763/-425). Design check passes — leverages existing tooling (pytest-timeout, mp.get_context, DistributedWorkerPool, the tiny tokenizer at tests/_test_utils/torch/tokenizer/, create_tiny_*_dir), no new abstraction introduced. Hermetic-unit-test goal is well-executed: tests/unit/conftest.py forces offline mode, the tiny_tokenizer fixture is shared, and Hub-only tests are relocated to tests/gpu/torch/utils/test_dataset_utils.py (live _HF_TINY round-trips and gated Nemotron smoke preserved there). Renames (SXDL_PATH→SDXL_PATH, SDXL_1_0_PATH→SDXL_PATH, LLAVA_PATH removal) all line up with their callers. The pytest_collection_modifyitems fail-loud on unmapped tests/<group>/ dirs is a nice guard, and timeout_func_only=true correctly excludes fixture setup from the budget.

A few items worth a human glance before merging:

💬 Author addressed the padding_side "right"→"left" flip in examples/llm_sparsity/weight_sparsity/hf_pts.py with a comment ("Left padding is recommended for calibration; get_dataset_dataloader warns otherwise") — still warrants sign-off because this is a user-facing behavior change to an example CLI, not just a test edit, and downstream users scripting against this script will get different calibrated checkpoints.
💬 PR body explains FSDP1 test_fsdp in tests/gpu/torch/nas/test_distributed_model.py is deleted (not marked @manual) because "FastNAS/AutoNAS aren't used with either FSDP1 or FSDP2" — flagging anyway because this is a coverage drop rather than a speedup, worth a human confirming the FSDP1-NAS path is indeed unused in production.
Unit-test lane budget cut 30→15 min is aggressive given the OS/Torch/TF matrix; one slow run can fail the lane. Worth confirming a few real CI runs landed comfortably under 15 min before merging.
Adding a new top-level tests/<group>/ will fail collection with pytest.UsageError until _DEFAULT_TIMEOUT is updated. CONTRIBUTING.md was updated to mention this — good — but consider whether the error message should mention the file path more explicitly (it already does: "add '{group}' to _DEFAULT_TIMEOUT in tests/conftest.py").
tests/gpu/torch/utils/test_dataset_utils.py permanently skips nemotron-sft-agentic-v2 for an upstream schema cast failure — xfail(strict=True) would surface upstream recovery automatically; minor.

No correctness bugs spotted in the diff itself. The mp.get_context("spawn") switch (dropping the global set_start_method mutation), the _pool_cache worker-pool dedup in gpu_megatron/conftest.py, the chat-template move into chat_template.jinja, and the symlinked _extensions/test_torch_extensions.py files all look correct.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/gpu/torch/utils/test_dataset_utils.py (1)
107-112: ⚡ Quick win

Move datasets import to module top (or justify the in-function import).

from datasets import load_dataset is placed inside _hf_dump_to_jsonl without a documented reason. Since these tests already depend on datasets transitively through get_dataset_samples/get_dataset_dataloader, a deferred import doesn't guard anything here and only delays surfacing an import error from collection time to mid-test. Either hoist it to the top with the other imports, or add a brief comment naming the concrete reason (e.g., optional-dependency guard).
♻️ Proposed change
 import json
+from datasets import load_dataset

 import pytest
 from huggingface_hub import get_token
 def _hf_dump_to_jsonl(name: str, split: str, path) -> str:
-    from datasets import load_dataset
-
     ds = load_dataset(name, split=split)
     ds.to_json(str(path), lines=True)
     return str(path)
As per coding guidelines: "Place all imports at module top in both source and test files... The only acceptable in-function imports are for circular imports or optional dependencies... and those should carry a brief comment naming the reason."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/gpu/torch/utils/test_dataset_utils.py` around lines 107 - 112, The
in-function import "from datasets import load_dataset" inside _hf_dump_to_jsonl
delays import errors and violates the module-top import guideline; either move
the import to the module top with the other imports (and update any import
ordering) or, if this is truly an optional/circular dependency, keep it but add
a brief comment above the import inside _hf_dump_to_jsonl explaining the
specific reason (e.g., "optional dependency guard" or "avoid circular import
with X"). Ensure the symbol load_dataset remains referenced by _hf_dump_to_jsonl
after the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_example_tests_runner.yml:
- Line 52: The step currently expands inputs.pip_install_extras directly into
the shell command, which can allow shell injection; change it to pass the input
into an environment variable (e.g., PIP_INSTALL_EXTRAS) and validate it against
a small allowlist or a strict regex before using it, then reference the
validated env var in the pip command (e.g., python -m pip install -e
".${PIP_INSTALL_EXTRAS}" or build the install path safely). Ensure validation in
the workflow uses explicit allowed tokens or rejects any string containing
unsafe characters like ; & | $() ` and only permits known extras (or an exact
pattern such as ^\[.*\]$ or comma-separated alphanumerics), so callers cannot
inject arbitrary shell content.

---

Nitpick comments:
In `@tests/gpu/torch/utils/test_dataset_utils.py`:
- Around line 107-112: The in-function import "from datasets import
load_dataset" inside _hf_dump_to_jsonl delays import errors and violates the
module-top import guideline; either move the import to the module top with the
other imports (and update any import ordering) or, if this is truly an
optional/circular dependency, keep it but add a brief comment above the import
inside _hf_dump_to_jsonl explaining the specific reason (e.g., "optional
dependency guard" or "avoid circular import with X"). Ensure the symbol
load_dataset remains referenced by _hf_dump_to_jsonl after the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c7f6c7f0-2648-4139-acc1-274baed74f65

📥 Commits

Reviewing files that changed from the base of the PR and between 91997ff and 99c2ef5.

📒 Files selected for processing (10)

.github/workflows/_example_tests_runner.yml
CONTRIBUTING.md
tests/_test_utils/torch/tokenizer/chat_template.jinja
tests/_test_utils/torch/transformers_models.py
tests/conftest.py
tests/examples/speculative_decoding/test_eagle.py
tests/gpu/_extensions/test_torch_kernels.py
tests/gpu/torch/quantization/test_gptq.py
tests/gpu/torch/utils/test_dataset_utils.py
tests/unit/torch/utils/test_dataset_utils.py

✅ Files skipped from review due to trivial changes (2)

tests/_test_utils/torch/tokenizer/chat_template.jinja
CONTRIBUTING.md

🚧 Files skipped from review as they are similar to previous changes (4)

tests/examples/speculative_decoding/test_eagle.py
tests/gpu/_extensions/test_torch_kernels.py
tests/conftest.py
tests/unit/torch/utils/test_dataset_utils.py

realAsma · 2026-06-03T14:49:02Z

        (None, False),                       # tiny_llama (from fixture), no FakeBase
-        ("moonshotai/Kimi-K2.5", True),      # remote HF repo, FakeBaseModel
+        pytest.param(
+            "moonshotai/Kimi-K2.5", True,    # remote HF repo, FakeBaseModel


@h-guo18 can we use tiny Kimi local models here?

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

cjluo-nv · 2026-06-03T16:20:41Z

        model_max_length=model_max_length,
-        padding_side="right",
+        # Left padding is recommended for calibration (get_dataset_dataloader warns otherwise).
+        padding_side="left",


@kaix-nv could you review this change?

cjluo-nv · 2026-06-03T16:21:15Z

+        type=int,
+        default=4,
+        choices=range(6),
+        help="trtexec --builderOptimizationLevel (0-5). Lower is much faster to build "


@ajrasane could you review?

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

shengliangxu

LGTM.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 and others added 3 commits June 3, 2026 00:25

Disable HF Hub access for unit tests and enable gpt_oss example tests

34b2319

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Optimize gpu test dist setup

9c259c6

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from a team as a code owner June 3, 2026 09:04

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/workflows/example_tests.yml

coderabbitai Bot approved these changes Jun 3, 2026

View reviewed changes

kevalmorabia97 requested review from a team as code owners June 3, 2026 09:48

kevalmorabia97 requested review from ajrasane and rohansjoshi June 3, 2026 09:48

kevalmorabia97 requested review from ChenhanYu, Edwardf0t1, cjluo-nv, jingyu-ml, kaix-nv, realAsma and shengliangxu June 3, 2026 09:54

kevalmorabia97 changed the title ~~Make unit tests hermetic and speed up slow unit/GPU tests~~ Make unit tests hermetic and speed up slow unit/GPU/example tests Jun 3, 2026

kevalmorabia97 changed the title ~~Make unit tests hermetic and speed up slow unit/GPU/example tests~~ Speed up slow unit/gpu/example tests Jun 3, 2026

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread examples/diffusers/quantization/diffusion_trt.py

Comment thread examples/llm_sparsity/weight_sparsity/hf_pts.py

Comment thread examples/torch_onnx/torch_quant_to_onnx.py

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

cjluo-nv reviewed Jun 3, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/workflows/_example_tests_runner.yml

realAsma reviewed Jun 3, 2026

View reviewed changes

Move all test imports to the top of the file

c9d3f80

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

cjluo-nv reviewed Jun 3, 2026

View reviewed changes

kevalmorabia97 requested review from a team and h-guo18 June 3, 2026 16:21

kevalmorabia97 added 2 commits June 3, 2026 09:23

Disable torch compile in most eagle tests to speedup ~1mins each

3db9e8f

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Move test_sparse_attn_worker from unit to gpu_vllm

28ba072

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kaix-nv approved these changes Jun 3, 2026

View reviewed changes

shengliangxu approved these changes Jun 3, 2026

View reviewed changes

kevalmorabia97 requested a review from a team as a code owner June 3, 2026 17:20

make llm_eval tests faster by limiting eval samples and output len

50e26aa

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/test-optimization branch from 03435f9 to 50e26aa Compare June 3, 2026 17:28

Conversation

kevalmorabia97 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevalmorabia97 commented Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 commented Jun 3, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 commented Jun 3, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

realAsma Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevalmorabia97 commented Jun 3, 2026 •

edited

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading