[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser by realAsma · Pull Request #1172 · NVIDIA/Model-Optimizer

realAsma · 2026-04-02T23:32:31Z

What does this PR do?

Type of change: new example

Refactors examples/llm_qat from a monolithic launch/script flow into a modular, config-driven Hugging Face QAT/QAD workflow. The high-level user flow is now:

Quantize a base model with a ModelOpt PTQ recipe.
Train or evaluate the quantized checkpoint with QAT, QAD, LoRA QAT, QLoRA, or fine-tuning configs.
Export the trained checkpoint for deployment.

Highlights:

Replaces the legacy examples/llm_qat/launch.sh + examples/llm_qat/main.py path with separate quantize.py and train.py entrypoints.
Adds YAML-driven argument parsing through ModelOptArgParser, including --config <yaml> defaults, CLI overrides, and generated examples/llm_qat/ARGUMENTS.md.
Adds declarative configs under examples/llm_qat/configs/ for training modes, dataset blends, and Accelerate backends.
Adds dataset_utils.py for weighted multi-source dataset blending, Hugging Face streaming, local dataset loading, distributed rank-aware loading, tokenization caching, pre-tokenization, chat templating, and assistant-token label masking.
Moves the Hugging Face QAD flow into the shared trainer path through DistillArguments, QADTrainer, and teacher-model distillation kwargs.
Updates QuantizationArguments to prefer recipe paths via --recipe, while keeping legacy --quant_cfg available with deprecation warnings for in-trainer quantization.
Updates QATTrainer handling for pre-quantized checkpoints, FSDP2 TensorQuantizer buffers, and recipe-resolved PTQ configs.
Adds the general/ptq/int4_blockwise_weight_only recipe and refreshes docs for NVFP4, FP8, INT4, FSDP2, DDP, DeepSpeed, QLoRA, and LLaMA-Factory integration.
Adds focused parser, dataset tokenization, assistant-mask, and example workflow coverage.

Usage

From the repo root:

cd examples/llm_qat

# 1. Quantize
python quantize.py \
  --model_name_or_path Qwen/Qwen3-8B \
  --dataset_config configs/dataset/blend.yaml \
  --recipe general/ptq/nvfp4_default-kv_fp8 \
  --output_dir qwen3-8b-quantized

# 2. QAT train
accelerate launch --config-file configs/accelerate/fsdp2.yaml train.py \
  --config configs/train/qat_nvfp4.yaml \
  --model_name_or_path qwen3-8b-quantized \
  --output_dir qwen3-8b-qat-nvfp4

# 3. QAD train
accelerate launch --config-file configs/accelerate/fsdp2.yaml train.py \
  --config configs/train/qad_nvfp4.yaml \
  --model_name_or_path qwen3-8b-quantized \
  --teacher_model Qwen/Qwen3-8B \
  --output_dir qwen3-8b-qad-nvfp4

Dataset blends can be pre-tokenized and cached before training:

cd examples/llm_qat
python dataset_utils.py \
  --dataset_config configs/dataset/blend.yaml \
  --model_name_or_path Qwen/Qwen3-8B

Testing

Focused coverage added or updated:

tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
tests/examples/llm_qat/test_dataset_tokenization.py
tests/examples/llm_qat/test_assistant_mask.py
tests/examples/llm_qat/test_llm_qat.py

Recorded validation:

pytest tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
pytest tests/examples/llm_qat/test_llm_qat.py::test_dataset_utils_pretokenize
pre-commit run --all-files
pytest tests/examples/llm_qat/test_llm_qat.py full GPU/backend suite

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: No for the legacy examples/llm_qat CLI/file layout (main.py, launch.sh, and FSDP1 config are removed); library quant_cfg usage remains available but is deprecated for this workflow.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: Yes. train.py/simple_qat_train.py retain the upstream Alpaca attribution where applicable; examples/llm_qat/requirements.txt switches tensorboardX to tensorboard.
Did you write any new necessary tests?: Yes. Parser, dataset tokenization, assistant masking, pre-tokenization, and QAT/QAD workflow tests were added or updated.
Did you update Changelog?: Yes.
Did you get Claude approval on this PR?: N/A in this description update.

Additional Information

This PR intentionally changes the llm_qat example surface. Existing users should move from examples/llm_qat/main.py and examples/llm_qat/launch.sh to examples/llm_qat/quantize.py, examples/llm_qat/train.py, and the YAML configs under examples/llm_qat/configs/.

coderabbitai · 2026-04-02T23:33:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Splits the monolithic llm_qat launcher into modular quantize.py and train.py, adds ModelOptArgParser with --generate_docs and YAML --config support, centralizes dataclass argument definitions, moves to recipe-based quantization and adds YAML-driven dataset blending; several configs, docs, and tests updated; FSDP1 removed.

Changes

Cohort / File(s)	Summary
Argument Parser & API `modelopt/torch/opt/plugins/transformers.py`	Adds `ModelOptHFArguments`, `ModelOptArgParser` (YAML `--config`, `--generate_docs`, docs generator), and updates `__all__`.
Quantization Trainer API `modelopt/torch/quantization/plugins/transformers_trainer.py`	Makes `QuantizationArguments` inherit `ModelOptHFArguments`, adds `recipe` field, deprecates `quant_cfg` with runtime warning.
Shared CLI Dataclasses `examples/llm_qat/arguments.py`	New centralized dataclass modules: `ModelArguments`, `DataArguments`, `TrainingArguments`, `QuantizeArguments` built on `ModelOptHFArguments`.
PTQ Script `examples/llm_qat/quantize.py`	New standalone PTQ script: loads recipe, builds calibration DataLoader, runs `mtq.quantize`/`mtq.compress`, saves quantized model.
Trainer Script `examples/llm_qat/train.py`	Refactored to use `ModelOptArgParser` + shared arguments; removes inline quant args; dataset construction now YAML-driven; distill/runtime warnings adjusted.
Dataset blending & utils `examples/llm_qat/dataset_utils.py`, `examples/llm_qat/configs/dataset/*`	New YAML-config driven dataset blending, streaming/rank-aware loading, tokenization factories, caching, and docs/examples.
Utils refactor `examples/llm_qat/utils.py`	Removes legacy Daring-Anteater path; `make_supervised_data_module()` now accepts dataset YAML path and delegates to `dataset_utils`.
Launcher removal / config changes `examples/llm_qat/launch.sh`, `examples/llm_qat/accelerate_config/fsdp1.yaml`	Deletes monolithic launcher and removes FSDP1 config; FSDP2 becomes default.
LLamaFactory & examples `examples/llm_qat/llama_factory/*`, `examples/llm_qat/simple_qat_train.py`	Switches quant arg from `quant_cfg`→`recipe`, adds recipe-to-config resolver, updates defaults/docs and launch helper behavior.
Configs & templates `examples/llm_qat/configs/train/*.yaml`	Adds training config templates for PTQ eval, QAT, QAD, and fine-tune.
Docs & READMEs `examples/llm_qat/README.md`, `examples/llm_qad/README.md`, `examples/vllm_serve/README.md`, `examples/llm_qat/ARGUMENTS.md`	Rewrites README to QAT/QAD flow (`quantize.py` → `train.py` → `export.py`), adds argument reference (auto-generated), minor README redirects.
Notebook update `examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb`	Adjusts QAT workflow text to reference new scripts.
Pre-commit `.pre-commit-config.yaml`	Adds local hook `generate-arguments-md` to regenerate `ARGUMENTS.md`; updates license hook exclusion to `train.py`.
Tests `tests/examples/llm_qat/test_llm_qat.py`, `tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`	Refactors tests to PTQ→train split, updates accelerate config selection, removes fsdp1 parametrization; adds unit tests for `ModelOptArgParser` including `--generate_docs` and YAML `--config` behavior.
Large new module `examples/llm_qat/dataset_utils.py`	Extensive new dataset blending/tokenization implementation (see above) with distributed cache/merge logic.
Recipe additions `modelopt_recipes/general/ptq/*`	Adds new PTQ recipes (e.g., int4 blockwise weight-only) referenced by `recipe` fields.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI
participant Parser as ModelOptArgParser
participant Config as RecipeLoader/Config
participant Tokenizer as Tokenizer
participant Data as DataLoader/BlendBuilder
participant Model as Model (HF)
participant MTQ as ModelOpt MTQ
participant FS as Filesystem

CLI->>Parser: invoke (args / --config / --generate_docs)
alt --generate_docs
    Parser->>FS: write ARGUMENTS.md
    Parser-->>CLI: exit(0)
else quantize flow
    Parser->>Config: load recipe
    Parser->>Model: load pretrained (+ tokenizer)
    Parser->>Data: build calibration dataset
    Data->>Tokenizer: tokenize batches
    Parser->>Model: move to device
    Model->>MTQ: forward_loop over calibration batches
    MTQ->>Model: apply quantization
    MTQ->>FS: print summary
    alt compress
        MTQ->>Model: compress
    end
    Model->>FS: save quantized model + tokenizer
end

mermaid
sequenceDiagram
participant CLI as CLI
participant Parser as ModelOptArgParser
participant Data as BlendBuilder
participant Trainer as HFTrainer (QAT/QAD)
participant Teacher as Teacher Model (optional)
participant FS as Filesystem

CLI->>Parser: parse training args & dataset_config
Parser->>Data: build/train & eval datasets (distributed/cache)
Parser->>Trainer: construct trainer (distill config if requested)
alt distillation
    Trainer->>Teacher: load teacher checkpoint
    Trainer->>Trainer: distillation training loop
else standard QAT/QAD
    Trainer->>Trainer: training loop
end
Trainer->>FS: save checkpoint(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser' clearly summarizes the main changes: refactoring the llm_qat example to use YAML configurations and introducing a new ModelOptArgParser, which are the primary objectives.
Security Anti-Patterns	✅ Passed	The pull request adheres to all security coding practices outlined in SECURITY.md. All model/tokenizer loading uses safe defaults without hardcoding trust_remote_code=True, all YAML parsing uses yaml.safe_load(), no dangerous patterns introduced, and no new risky PIP dependencies added.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch asma/new-qat-1

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-02T23:37:16Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1172/
Built to branch `gh-pages` at 2026-06-03 00:32 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)

70-79: Consider adding error-handling tests.

The empty YAML test is good. Consider adding tests for:

Non-existent --config file path (should raise appropriate error)
Malformed YAML syntax (should raise parse error, not crash)

Example test for missing config file

def test_missing_config_file(self, tmp_path):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises(FileNotFoundError):
        parser.parse_args_into_dataclasses(
            args=["--config", str(tmp_path / "nonexistent.yaml")]
        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 70 -
79, Add two new unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error
handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g.,
test_missing_config_file) should call parser.parse_args_into_dataclasses with
args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises
FileNotFoundError using pytest.raises; the other test (e.g.,
test_malformed_yaml_config) should write malformed YAML to a temp file and
assert parser.parse_args_into_dataclasses raises the YAML parsing exception
(yaml.YAMLError or the parser-specific parse error) using pytest.raises;
reference ModelOptArgParser, parse_args_into_dataclasses, and use
tmp_path/pytest.raises to locate where to add these tests.

examples/llm_qat/main.py (1)

197-198: Consider validating quant_cfg against known configurations.

Using getattr(mtq, quant_args.quant_cfg) with a user-provided string could potentially access unintended module attributes. While this is a CLI argument (not untrusted network input), adding validation against known configs would be more defensive.

🛡️ Optional: Add validation against known configs

     if quant_args.quant_cfg is not None:
+        if not hasattr(mtq, quant_args.quant_cfg) or not quant_args.quant_cfg.endswith("_CFG"):
+            raise ValueError(
+                f"Unknown quant_cfg: {quant_args.quant_cfg}. "
+                "See modelopt.torch.quantization.config for available options."
+            )
         quant_args.quant_cfg = getattr(mtq, quant_args.quant_cfg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/main.py` around lines 197 - 198, The current line uses
getattr(mtq, quant_args.quant_cfg) directly which may resolve unintended
attributes; validate quant_args.quant_cfg against an explicit whitelist of
allowed config names before calling getattr. Update the logic around
quant_args.quant_cfg: define the set/list of permitted config names (e.g., known
keys in mtq), check that quant_args.quant_cfg is in that list, and only then
call getattr(mtq, quant_args.quant_cfg); if not allowed, raise a clear error or
exit with a helpful message. Ensure you reference the same symbol names
(quant_args.quant_cfg and mtq) so the check is applied right where the getattr
is used.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 130-142: The pre-commit hook runs examples/llm_qat/main.py but
heavy module-level imports (torch, transformers, modelopt modules) run before
the argument parser (parse_args_into_dataclasses()) and train(), causing
slowness and CUDA failures; fix by lazy-loading those heavy imports (move
torch/transformers/modelopt imports inside train() or into helper functions
called only when not generating docs) or add a tiny dedicated script that
imports only the argument dataclasses for the --generate_docs flow, or check
sys.argv for "--generate_docs" at module import time and skip heavy imports;
reference the examples/llm_qat/main.py entrypoint, the train() function, and
parse_args_into_dataclasses() when making the change.

In `@examples/llm_qat/ARGUMENTS.md`:
- Line 35: Update the metadata help string for
TrainingArguments.model_max_length in examples/llm_qat/main.py to use the
hyphenated form "right-padded" instead of "right padded"; locate the
TrainingArguments definition or the argument registration for model_max_length
(symbol: TrainingArguments.model_max_length or the argument name
"--model_max_length") and modify its help/metadata string accordingly so the
generated ARGUMENTS.md shows "right-padded".

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The "--config" handling may access args[idx + 1] when
"--config" is last and raise IndexError; update the block that detects
"--config" to check that idx + 1 < len(args) before reading args[idx + 1], and
if missing raise a clear error (e.g., raise ValueError or use the argparse error
mechanism) with a message like "Missing value for --config" so callers get a
friendly error instead of an IndexError; ensure the rest of the flow (setting
config_path, loading YAML, and calling self.set_defaults(**config)) only runs
when a valid path is present.

---

Nitpick comments:
In `@examples/llm_qat/main.py`:
- Around line 197-198: The current line uses getattr(mtq, quant_args.quant_cfg)
directly which may resolve unintended attributes; validate quant_args.quant_cfg
against an explicit whitelist of allowed config names before calling getattr.
Update the logic around quant_args.quant_cfg: define the set/list of permitted
config names (e.g., known keys in mtq), check that quant_args.quant_cfg is in
that list, and only then call getattr(mtq, quant_args.quant_cfg); if not
allowed, raise a clear error or exit with a helpful message. Ensure you
reference the same symbol names (quant_args.quant_cfg and mtq) so the check is
applied right where the getattr is used.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 70-79: Add two new unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error
handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g.,
test_missing_config_file) should call parser.parse_args_into_dataclasses with
args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises
FileNotFoundError using pytest.raises; the other test (e.g.,
test_malformed_yaml_config) should write malformed YAML to a temp file and
assert parser.parse_args_into_dataclasses raises the YAML parsing exception
(yaml.YAMLError or the parser-specific parse error) using pytest.raises;
reference ModelOptArgParser, parse_args_into_dataclasses, and use
tmp_path/pytest.raises to locate where to add these tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9730ae6e-e2ad-4302-b952-ad714e48536f

📥 Commits

Reviewing files that changed from the base of the PR and between 18ddcb7 and e764ad9.

📒 Files selected for processing (13)

.pre-commit-config.yaml
examples/llm_qat/ARGUMENTS.md
examples/llm_qat/README.md
examples/llm_qat/configs/finetune.yaml
examples/llm_qat/configs/ptq_eval.yaml
examples/llm_qat/configs/qad_nvfp4.yaml
examples/llm_qat/configs/qat_nvfp4.yaml
examples/llm_qat/launch.sh
examples/llm_qat/main.py
modelopt/torch/opt/plugins/transformers.py
modelopt/torch/quantization/plugins/transformers_trainer.py
tests/examples/llm_qat/test_llm_qat.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py

💤 Files with no reviewable changes (1)

examples/llm_qat/launch.sh

codecov · 2026-04-02T23:44:51Z

Codecov Report

❌ Patch coverage is 84.56376% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.38%. Comparing base (f21977a) to head (3196989).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/opt/plugins/transformers.py	89.21%	11 Missing ⚠️
...torch/quantization/plugins/transformers_trainer.py	72.97%	10 Missing ⚠️
modelopt/torch/distill/plugins/huggingface.py	80.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1172       +/-   ##
===========================================
- Coverage   77.41%   67.38%   -10.03%     
===========================================
  Files         480      481        +1     
  Lines       52506    52904      +398     
===========================================
- Hits        40645    35647     -4998     
- Misses      11861    17257     +5396

Flag	Coverage Δ
examples	`41.59% <39.59%> (+0.78%)`	⬆️
gpu	`27.84% <23.48%> (-32.55%)`	⬇️
regression	`15.20% <23.48%> (+0.06%)`	⬆️
unit	`53.96% <71.14%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/llm_qat/main.py (1)

77-79: ⚠️ Potential issue | 🟡 Minor

Grammar: "right padded" should be "right-padded".

This is the source of the grammar issue in the generated ARGUMENTS.md.

📝 Proposed fix

     model_max_length: int = field(
         default=4096,
         metadata={
             "help": (
-                "Maximum sequence length. Sequences will be right padded (and possibly truncated)."
+                "Maximum sequence length. Sequences will be right-padded (and possibly truncated)."
             )
         },
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/main.py` around lines 77 - 79, Update the help string that
currently reads "Maximum sequence length. Sequences will be right padded (and
possibly truncated)." to use a hyphenated adjective: change it to "Maximum
sequence length. Sequences will be right-padded (and possibly truncated)."
Locate the string in examples/llm_qat/main.py where the argument's "help" value
for "Maximum sequence length" is defined (the dict entry with key "help" or the
parser.add_argument call providing that help text) and replace the phrase "right
padded" with "right-padded".

🧹 Nitpick comments (3)

modelopt/torch/opt/plugins/transformers.py (2)

80-83: Missing error handling for YAML file operations.

If the config file doesn't exist or contains invalid YAML, the error messages won't be user-friendly. Consider wrapping in try/except with clearer error context.

🛡️ Proposed improvement

-            with open(config_path) as f:
-                config = yaml.safe_load(f)
-            if config:
-                self.set_defaults(**config)
+            try:
+                with open(config_path) as f:
+                    config = yaml.safe_load(f)
+            except FileNotFoundError:
+                raise ValueError(f"Config file not found: {config_path}")
+            except yaml.YAMLError as e:
+                raise ValueError(f"Invalid YAML in config file {config_path}: {e}")
+            if config:
+                self.set_defaults(**config)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 83, Wrap the
config file read and YAML parsing around a try/except to provide clear,
user-friendly errors: catch FileNotFoundError to indicate the provided
config_path is missing, catch yaml.YAMLError to indicate the file contains
invalid YAML, and catch a generic Exception for any other I/O issues; in the try
block keep the existing open(config_path) -> yaml.safe_load flow and call
self.set_defaults(**config) when config is truthy, and in each except produce a
clear message or re-raise a new exception that mentions config_path and the
original error to aid debugging.

78-83: YAML import inside function body — consider moving to module level.

The import yaml statement is inside the conditional block. While this works, it's unconventional and may cause slight latency on first use. Moving it to the top-level imports would be cleaner.

♻️ Suggested refactor

 import dataclasses
 import sys
 import types
 from contextlib import contextmanager
 from pathlib import Path
+import yaml

 import torch
 import transformers

Then remove the import yaml from line 78.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 78 - 83, The local
"import yaml" inside the block that reads config_path should be moved to the
module top-level and the inline import removed; update the module imports to
include "import yaml" and then keep the body as: open(config_path),
yaml.safe_load, and if config: self.set_defaults(**config) so that yaml is only
imported once at module import time and the inline import is deleted.

tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)

36-120: Consider adding edge case tests for error handling.

The current tests cover happy paths well. Consider adding tests for:

--config with non-existent file path
--config with invalid YAML syntax
--config as last argument without a value

💡 Example additional tests

def test_config_file_not_found(self, tmp_path):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises(FileNotFoundError):
        parser.parse_args_into_dataclasses(
            args=["--config", str(tmp_path / "nonexistent.yaml")]
        )

def test_config_missing_value(self):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises((IndexError, ValueError)):
        parser.parse_args_into_dataclasses(args=["--config"])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 36 -
120, Add edge-case unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error
handling: create a test function (e.g., test_config_file_not_found) that
constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls
parse_args_into_dataclasses with args=["--config",
str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g.,
test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts
parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test
(e.g., test_config_missing_value) that calls parse_args_into_dataclasses with
args=["--config"] and asserts an IndexError or ValueError is raised. Use the
existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate
where to add these tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Around line 28-37: The generated TrainingArguments docs omit parameters whose
names are filtered out even when their defaults differ from the HF base; update
the doc generation to surface overridden defaults: in the function
_generate_docs locate the logic that filters out fields by name and change it to
also compare each field's default to the corresponding attribute on
transformers.TrainingArguments (e.g., learning_rate, eval_strategy, do_eval) and
include any field whose default value differs, or alternatively add an explicit
"Overridden defaults" section to ARGUMENTS.md listing each overridden argument
and its non-HF default so users can see values like learning_rate=1e-4.
- Around line 9-11: The generated docs drop union/Optional info (e.g. `str |
None` becoming `str`) because _format_type currently simplifies annotations;
update _format_type so it preserves union/Optional annotations (use
typing.get_origin/get_args or equivalent) and format Union types by joining
inner types with " | " (so a source annotation `str | None` yields `str |
None`), then ensure _generate_docs uses this updated _format_type when rendering
argument types like quant_cfg and other optional fields.

In `@examples/llm_qat/main.py`:
- Around line 64-69: ModelArguments.model_name_or_path lacks metadata helper
text; add a metadata={"help": "..."} to that field so the generated ARGUMENTS.md
includes a description. Edit the ModelArguments class and update the
model_name_or_path declaration to include a clear help string (e.g., "The name
or path of the pretrained model to use.") in the metadata dict, mirroring the
style used for teacher_model.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 116-122: The docstring claims "additional/overridden arguments"
but the current filter (own_fields = [f for f in dataclasses.fields(dc) if
f.name not in hf_training_fields]) only keeps truly new fields; update the
behavior by either (A) changing the text emitted in the lines.append call to
"Only additional arguments are shown below" to remove "overridden", or (B)
change the filtering logic to detect overrides: lookup the corresponding field
in the HuggingFace TrainingArguments (hf_training_fields) and compare default
values (and types if desired) with the dataclass field from
dataclasses.fields(dc), and include the field if the default differs (treating
missing defaults as different); modify the code around is_hf_subclass /
own_fields and the lines.append text accordingly so overridden fields are
included when defaults differ.

---

Outside diff comments:
In `@examples/llm_qat/main.py`:
- Around line 77-79: Update the help string that currently reads "Maximum
sequence length. Sequences will be right padded (and possibly truncated)." to
use a hyphenated adjective: change it to "Maximum sequence length. Sequences
will be right-padded (and possibly truncated)." Locate the string in
examples/llm_qat/main.py where the argument's "help" value for "Maximum sequence
length" is defined (the dict entry with key "help" or the parser.add_argument
call providing that help text) and replace the phrase "right padded" with
"right-padded".

---

Nitpick comments:
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-83: Wrap the config file read and YAML parsing around a
try/except to provide clear, user-friendly errors: catch FileNotFoundError to
indicate the provided config_path is missing, catch yaml.YAMLError to indicate
the file contains invalid YAML, and catch a generic Exception for any other I/O
issues; in the try block keep the existing open(config_path) -> yaml.safe_load
flow and call self.set_defaults(**config) when config is truthy, and in each
except produce a clear message or re-raise a new exception that mentions
config_path and the original error to aid debugging.
- Around line 78-83: The local "import yaml" inside the block that reads
config_path should be moved to the module top-level and the inline import
removed; update the module imports to include "import yaml" and then keep the
body as: open(config_path), yaml.safe_load, and if config:
self.set_defaults(**config) so that yaml is only imported once at module import
time and the inline import is deleted.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 36-120: Add edge-case unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error
handling: create a test function (e.g., test_config_file_not_found) that
constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls
parse_args_into_dataclasses with args=["--config",
str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g.,
test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts
parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test
(e.g., test_config_missing_value) that calls parse_args_into_dataclasses with
args=["--config"] and asserts an IndexError or ValueError is raised. Use the
existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate
where to add these tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf32b98f-0804-4249-87e1-ce58d13d15e2

📥 Commits

Reviewing files that changed from the base of the PR and between e764ad9 and 66e02a7.

📒 Files selected for processing (14)

.pre-commit-config.yaml
examples/llm_qad/README.md
examples/llm_qat/ARGUMENTS.md
examples/llm_qat/README.md
examples/llm_qat/configs/finetune.yaml
examples/llm_qat/configs/ptq_eval.yaml
examples/llm_qat/configs/qad_nvfp4.yaml
examples/llm_qat/configs/qat_nvfp4.yaml
examples/llm_qat/launch.sh
examples/llm_qat/main.py
modelopt/torch/opt/plugins/transformers.py
modelopt/torch/quantization/plugins/transformers_trainer.py
tests/examples/llm_qat/test_llm_qat.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py

💤 Files with no reviewable changes (1)

examples/llm_qat/launch.sh

✅ Files skipped from review due to trivial changes (5)

examples/llm_qad/README.md
examples/llm_qat/configs/qat_nvfp4.yaml
examples/llm_qat/configs/ptq_eval.yaml
examples/llm_qat/configs/finetune.yaml
examples/llm_qat/configs/qad_nvfp4.yaml

🚧 Files skipped from review as they are similar to previous changes (2)

.pre-commit-config.yaml
tests/examples/llm_qat/test_llm_qat.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

examples/llm_qat/ARGUMENTS.md (1)
10-10: ⚠️ Potential issue | 🟡 Minor

Minor docs grammar: use “right-padded.”

Line 10 still says “right padded”. Since this is auto-generated, update the originating help string so regenerated docs use “right-padded”.

As per coding guidelines, **/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 10, Update the help string for the CLI
option that defines --model_max_length (the add_argument call for
'--model_max_length' / the "model_max_length" help text) to use the
grammatically correct phrase "right-padded." instead of "right padded" so
regenerated docs reflect the change; locate the add_argument or option
definition that sets the default/help for model_max_length and change the help
string to include "right-padded." and re-run doc generation/pre-commit.

🧹 Nitpick comments (4)

examples/llm_qat/train.py (2)
137-141: Same trust_remote_code concern applies to teacher model loading.

The teacher model loading should also use the configurable trust_remote_code parameter for consistency.
♻️ Suggested fix
         teacher_model = transformers.AutoModelForCausalLM.from_pretrained(
             model_args.teacher_model,
             cache_dir=training_args.cache_dir,
             torch_dtype=torch.bfloat16,
+            trust_remote_code=model_args.trust_remote_code,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 137 - 141, The teacher model is being
loaded via transformers.AutoModelForCausalLM.from_pretrained without using the
configurable trust_remote_code flag; update the teacher_model loading call (the
call that assigns to teacher_model and uses model_args.teacher_model) to pass
the same trust_remote_code parameter used for the student/model loading (i.e.,
forward the configured trust_remote_code boolean into from_pretrained) so
teacher model remote code is handled consistently.
96-104: Consider exposing trust_remote_code as a configurable argument.

The model and tokenizer loading accepts user-provided model_name_or_path but doesn't expose trust_remote_code as a configurable parameter. While defaulting to False is correct for security, some models require trust_remote_code=True to function. Per coding guidelines, this should be caller-configurable rather than implicitly relying on the default.
♻️ Suggested approach

Add trust_remote_code: bool = False to ModelArguments in arguments.py, then pass it to the model and tokenizer loading calls:
     model = transformers.AutoModelForCausalLM.from_pretrained(
         model_args.model_name_or_path,
         cache_dir=training_args.cache_dir,
         torch_dtype=torch.bfloat16,
+        trust_remote_code=model_args.trust_remote_code,
     )
     ...
     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        model_args.model_name_or_path, model_max_length=model_args.model_max_length
+        model_args.model_name_or_path,
+        model_max_length=model_args.model_max_length,
+        trust_remote_code=model_args.trust_remote_code,
     )
As per coding guidelines, "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models. Let the caller decide via a parameter; default to False."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 96 - 104, The loader calls in
train.py (transformers.AutoModelForCausalLM.from_pretrained and
transformers.AutoTokenizer.from_pretrained) currently don't expose
trust_remote_code; add a new boolean field trust_remote_code: bool = False to
the ModelArguments dataclass (arguments.py) and then pass
model_args.trust_remote_code into both from_pretrained calls (e.g.,
trust_remote_code=model_args.trust_remote_code) so callers can opt-in while
retaining a safe default.
examples/llm_qat/utils.py (1)
43-43: Consider moving import to module level for clarity.

The runtime import inside the function works but is unconventional. If there's no circular import concern, moving it to the module level would improve readability and enable better static analysis.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/utils.py` at line 43, The file currently does a runtime
import of build_blend_dataset and load_blend_config from dataset_utils inside a
function; move the import statement for build_blend_dataset and
load_blend_config to the top of examples/llm_qat/utils.py (module-level) and
delete the in-function import so callers use the module-level imports instead,
ensuring there are no circular-import issues before changing.
examples/llm_qat/dataset_utils.py (1)
721-723: Consider making trust_remote_code explicit for defense-in-depth.

While trust_remote_code defaults to False in HuggingFace Transformers, the coding guidelines recommend making this explicit or caller-configurable. For a CLI tool that accepts arbitrary tokenizer paths, consider adding an explicit trust_remote_code=False or exposing it as a CLI flag.
♻️ Suggested improvement
     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        args.tokenizer, model_max_length=args.max_length
+        args.tokenizer, model_max_length=args.max_length, trust_remote_code=False
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 721 - 723, The AutoTokenizer
instantiation uses transformers.AutoTokenizer.from_pretrained without explicitly
setting trust_remote_code, which should be made explicit for defense-in-depth;
update the call in dataset_utils.py where tokenizer is created
(transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and
args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g.,
args.trust_remote_code) and pass that value to from_pretrained so callers can
opt in explicitly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/llama_factory/launch_llamafactory.sh`:
- Line 143: The --use_fsdp2 flag is parsed and shown in help but has no effect
because default config selection no longer reads USE_FSDP2; either remove the
flag from the help and argument parsing or make it effective by wiring it into
config selection and validation. Fix by updating the argument parsing block that
sets USE_FSDP2 (the case branch handling "--use_fsdp2") and the help echo line
to either delete that option, or if you keep it, validate the provided value
(true/false), set the USE_FSDP2 variable accordingly, and use USE_FSDP2 when
choosing the default config (where default config selection currently ignores
USE_FSDP2); if keeping the flag also emit a clear warning/error when an
unsupported value is provided.

In `@examples/llm_qat/quantize.py`:
- Around line 89-93: The calibration forward_loop currently runs forward passes
with gradients enabled and without evaluation mode; wrap the loop in
torch.inference_mode() and call model.eval() before iterating to disable
autograd and dropout/batchnorm updates (e.g., in forward_loop use model.eval()
and with torch.inference_mode(): for batch in calib_dataloader ...); also
consider restoring the original training state after calibration if the caller
expects the model's training flag unchanged.

In `@examples/llm_qat/README.md`:
- Around line 58-67: Update the README examples to use flag-style boolean
arguments to match ARGUMENTS.md: remove explicit True/False values and use the
flags `--distill` and `--lora` where currently written as `--distill True` or
`--lora True` (for example update the train invocation block containing
`--distill True` and any other occurrences such as the `--lora True` example
around line 225) so the CLI examples consistently use `--distill` and `--lora`
flag syntax.

---

Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: Update the help string for the CLI option that defines
--model_max_length (the add_argument call for '--model_max_length' / the
"model_max_length" help text) to use the grammatically correct phrase
"right-padded." instead of "right padded" so regenerated docs reflect the
change; locate the add_argument or option definition that sets the default/help
for model_max_length and change the help string to include "right-padded." and
re-run doc generation/pre-commit.

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 721-723: The AutoTokenizer instantiation uses
transformers.AutoTokenizer.from_pretrained without explicitly setting
trust_remote_code, which should be made explicit for defense-in-depth; update
the call in dataset_utils.py where tokenizer is created
(transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and
args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g.,
args.trust_remote_code) and pass that value to from_pretrained so callers can
opt in explicitly.

In `@examples/llm_qat/train.py`:
- Around line 137-141: The teacher model is being loaded via
transformers.AutoModelForCausalLM.from_pretrained without using the configurable
trust_remote_code flag; update the teacher_model loading call (the call that
assigns to teacher_model and uses model_args.teacher_model) to pass the same
trust_remote_code parameter used for the student/model loading (i.e., forward
the configured trust_remote_code boolean into from_pretrained) so teacher model
remote code is handled consistently.
- Around line 96-104: The loader calls in train.py
(transformers.AutoModelForCausalLM.from_pretrained and
transformers.AutoTokenizer.from_pretrained) currently don't expose
trust_remote_code; add a new boolean field trust_remote_code: bool = False to
the ModelArguments dataclass (arguments.py) and then pass
model_args.trust_remote_code into both from_pretrained calls (e.g.,
trust_remote_code=model_args.trust_remote_code) so callers can opt-in while
retaining a safe default.

In `@examples/llm_qat/utils.py`:
- Line 43: The file currently does a runtime import of build_blend_dataset and
load_blend_config from dataset_utils inside a function; move the import
statement for build_blend_dataset and load_blend_config to the top of
examples/llm_qat/utils.py (module-level) and delete the in-function import so
callers use the module-level imports instead, ensuring there are no
circular-import issues before changing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4e285767-bcf3-4de5-8296-f9658a97092c

📥 Commits

Reviewing files that changed from the base of the PR and between 66e02a7 and be7ba74.

📒 Files selected for processing (33)

.pre-commit-config.yaml
examples/llm_qad/README.md
examples/llm_qat/ARGUMENTS.md
examples/llm_qat/README.md
examples/llm_qat/accelerate_config/fsdp1.yaml
examples/llm_qat/arguments.py
examples/llm_qat/configs/accelerate/ddp.yaml
examples/llm_qat/configs/accelerate/deepspeed.yaml
examples/llm_qat/configs/accelerate/fsdp2.yaml
examples/llm_qat/configs/dataset/README.md
examples/llm_qat/configs/dataset/blend.yaml
examples/llm_qat/configs/dataset/blend_example.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/training/finetune.yaml
examples/llm_qat/configs/training/ptq_eval.yaml
examples/llm_qat/configs/training/qad_nvfp4.yaml
examples/llm_qat/configs/training/qat_nvfp4.yaml
examples/llm_qat/dataset_utils.py
examples/llm_qat/launch.sh
examples/llm_qat/llama_factory/README.md
examples/llm_qat/llama_factory/launch_llamafactory.sh
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/llama_factory/llama_factory.py
examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
examples/llm_qat/quantize.py
examples/llm_qat/simple_qat_train.py
examples/llm_qat/train.py
examples/llm_qat/utils.py
examples/vllm_serve/README.md
modelopt/torch/opt/plugins/transformers.py
modelopt/torch/quantization/plugins/transformers_trainer.py
tests/examples/llm_qat/test_llm_qat.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py

💤 Files with no reviewable changes (2)

examples/llm_qat/launch.sh
examples/llm_qat/accelerate_config/fsdp1.yaml

✅ Files skipped from review due to trivial changes (14)

examples/llm_qad/README.md
examples/vllm_serve/README.md
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/configs/training/ptq_eval.yaml
examples/llm_qat/configs/training/finetune.yaml
examples/llm_qat/configs/training/qat_nvfp4.yaml
examples/llm_qat/configs/training/qad_nvfp4.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/dataset/blend.yaml
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
examples/llm_qat/configs/dataset/README.md
examples/llm_qat/configs/dataset/blend_example.yaml
.pre-commit-config.yaml
examples/llm_qat/arguments.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/quantization/plugins/transformers_trainer.py
modelopt/torch/opt/plugins/transformers.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

examples/llm_qat/dataset_utils.py (1)
669-671: Expose trust_remote_code as a CLI parameter.

While trust_remote_code defaults to False (safe), the coding guidelines recommend letting callers configure this parameter. For CLI tools, users should be able to opt-in when loading tokenizers that require custom code.
♻️ Suggested improvement
     parser.add_argument("--tokenizer", required=True, help="Tokenizer name or path")
     parser.add_argument("--max_length", type=int, default=4096, help="Max sequence length")
+    parser.add_argument(
+        "--trust_remote_code",
+        action="store_true",
+        default=False,
+        help="Allow executing custom code from the tokenizer repository (use with caution)",
+    )
     args = parser.parse_args()

     config = load_blend_config(args.config)

     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        args.tokenizer, model_max_length=args.max_length
+        args.tokenizer,
+        model_max_length=args.max_length,
+        trust_remote_code=args.trust_remote_code,
     )
As per coding guidelines: "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models. Let the caller decide via a parameter; default to False."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 669 - 671, Add a CLI flag to
expose Hugging Face's trust_remote_code instead of hardcoding it: update the
argument parser to accept something like --trust-remote-code (default False) and
pass that flag into the tokenizer loading call
(transformers.AutoTokenizer.from_pretrained) using args.trust_remote_code, so
the tokenizer creation (tokenizer =
transformers.AutoTokenizer.from_pretrained(args.tokenizer,
model_max_length=args.max_length, trust_remote_code=args.trust_remote_code))
respects the caller's choice.
tests/examples/llm_qat/test_llm_qat.py (2)
82-82: Normalize tiny_llama_path before appending it to extra_cmd_args.

_run_quantize() is typed as taking list[str], and the other filesystem arguments in these calls are already wrapped with str(...). Converting tiny_llama_path here keeps the argv contract consistent and avoids surprises if the fixture returns a Path.
🧩 Proposed fix
-            "--model_name_or_path", tiny_llama_path,
+            "--model_name_or_path", str(tiny_llama_path),
Apply the same change in all three _run_quantize() call sites.
Also applies to: 105-105, 130-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` at line 82, The test passes
tiny_llama_path directly into the argv list which can be a Path object; update
all three _run_quantize(...) call sites to convert tiny_llama_path to a string
(e.g., str(tiny_llama_path)) when adding it to extra_cmd_args so the list[str]
contract for _run_quantize and consistency with other filesystem args is
preserved; locate the calls that pass "--model_name_or_path", tiny_llama_path
and replace tiny_llama_path with its stringified form.
35-35: Avoid relying on PATH for the Python and Accelerate entry points.

Using bare python and accelerate can pick a different environment than the one running pytest, which makes these integration tests brittle in CI and local virtualenvs. If run_example_command() does not already normalize the launcher, prefer invoking both steps from the active interpreter.

Also applies to: 52-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` at line 35, The test currently
invokes the example scripts via the bare strings "python" and "accelerate" which
can resolve to the wrong interpreter; update the test to launch both steps with
the active Python interpreter instead: replace occurrences of "python" with the
running interpreter (use sys.executable) and replace "accelerate" with a module
invocation via the interpreter (i.e. sys.executable with "-m accelerate" or
equivalent) when calling run_example_command(); update both the quantize step
(strings "python", "quantize.py") and the train/accelerate step (strings
"accelerate", "launch" / related args for "train.py") so the commands use the
active interpreter rather than relying on PATH.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current broad except Exception in the tokenization
fallback hides unexpected errors; change it to catch only expected
tokenization-related errors (e.g., ValueError, UnicodeEncodeError, and any
specific tokenizer error class your tokenizer exposes) around the tokenizer
invocation, keep the logger.warning and the fallback pad/label construction
using pad_id, max_length, and IGNORE_TOKEN_ID, and for any other exceptions
re-raise them so they surface (i.e., replace "except Exception as e" with a
narrowed except tuple and add a bare raise in an outer except to propagate
unexpected errors).

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 669-671: Add a CLI flag to expose Hugging Face's trust_remote_code
instead of hardcoding it: update the argument parser to accept something like
--trust-remote-code (default False) and pass that flag into the tokenizer
loading call (transformers.AutoTokenizer.from_pretrained) using
args.trust_remote_code, so the tokenizer creation (tokenizer =
transformers.AutoTokenizer.from_pretrained(args.tokenizer,
model_max_length=args.max_length, trust_remote_code=args.trust_remote_code))
respects the caller's choice.

In `@tests/examples/llm_qat/test_llm_qat.py`:
- Line 82: The test passes tiny_llama_path directly into the argv list which can
be a Path object; update all three _run_quantize(...) call sites to convert
tiny_llama_path to a string (e.g., str(tiny_llama_path)) when adding it to
extra_cmd_args so the list[str] contract for _run_quantize and consistency with
other filesystem args is preserved; locate the calls that pass
"--model_name_or_path", tiny_llama_path and replace tiny_llama_path with its
stringified form.
- Line 35: The test currently invokes the example scripts via the bare strings
"python" and "accelerate" which can resolve to the wrong interpreter; update the
test to launch both steps with the active Python interpreter instead: replace
occurrences of "python" with the running interpreter (use sys.executable) and
replace "accelerate" with a module invocation via the interpreter (i.e.
sys.executable with "-m accelerate" or equivalent) when calling
run_example_command(); update both the quantize step (strings "python",
"quantize.py") and the train/accelerate step (strings "accelerate", "launch" /
related args for "train.py") so the commands use the active interpreter rather
than relying on PATH.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d1f6438-abe5-4f9c-94c4-93bd1e217408

📥 Commits

Reviewing files that changed from the base of the PR and between be7ba74 and 7b581c7.

📒 Files selected for processing (33)

.pre-commit-config.yaml
examples/llm_qad/README.md
examples/llm_qat/ARGUMENTS.md
examples/llm_qat/README.md
examples/llm_qat/accelerate_config/fsdp1.yaml
examples/llm_qat/arguments.py
examples/llm_qat/configs/accelerate/ddp.yaml
examples/llm_qat/configs/accelerate/deepspeed.yaml
examples/llm_qat/configs/accelerate/fsdp2.yaml
examples/llm_qat/configs/dataset/README.md
examples/llm_qat/configs/dataset/blend.yaml
examples/llm_qat/configs/dataset/blend_example.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/train/finetune.yaml
examples/llm_qat/configs/train/ptq_eval.yaml
examples/llm_qat/configs/train/qad_nvfp4.yaml
examples/llm_qat/configs/train/qat_nvfp4.yaml
examples/llm_qat/dataset_utils.py
examples/llm_qat/launch.sh
examples/llm_qat/llama_factory/README.md
examples/llm_qat/llama_factory/launch_llamafactory.sh
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/llama_factory/llama_factory.py
examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
examples/llm_qat/quantize.py
examples/llm_qat/simple_qat_train.py
examples/llm_qat/train.py
examples/llm_qat/utils.py
examples/vllm_serve/README.md
modelopt/torch/opt/plugins/transformers.py
modelopt/torch/quantization/plugins/transformers_trainer.py
tests/examples/llm_qat/test_llm_qat.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py

💤 Files with no reviewable changes (2)

examples/llm_qat/accelerate_config/fsdp1.yaml
examples/llm_qat/launch.sh

✅ Files skipped from review due to trivial changes (13)

examples/llm_qat/configs/train/ptq_eval.yaml
examples/vllm_serve/README.md
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/configs/train/finetune.yaml
examples/llm_qad/README.md
examples/llm_qat/configs/train/qat_nvfp4.yaml
.pre-commit-config.yaml
examples/llm_qat/configs/dataset/blend.yaml
examples/llm_qat/configs/train/qad_nvfp4.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/dataset/blend_example.yaml
examples/llm_qat/llama_factory/README.md
examples/llm_qat/configs/dataset/README.md

🚧 Files skipped from review as they are similar to previous changes (5)

examples/llm_qat/llama_factory/llama_factory.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
examples/llm_qat/utils.py
examples/llm_qat/arguments.py
examples/llm_qat/train.py

coderabbitai

♻️ Duplicate comments (5)

examples/llm_qat/ARGUMENTS.md (2)

10-10: ⚠️ Potential issue | 🟡 Minor

Grammar: "right padded" should be hyphenated.

Since this file is auto-generated, fix the source help text in the model_max_length field metadata in examples/llm_qat/arguments.py to use "right-padded".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 10, The generated docs show "right
padded" unhyphenated; update the source help text for the model_max_length
argument in examples/llm_qat/arguments.py (the metadata for the model_max_length
argument/flag) to use "right-padded" instead of "right padded" so the
auto-generated ARGUMENTS.md will render the hyphenated form; edit the
help/message string in the model_max_length field definition where its
help/description is set.

11-11: ⚠️ Potential issue | 🟡 Minor

Type display drops Optional/Union annotation.

The --teacher_model field has default None but type shows as `str` instead of `str | None`. This stems from _format_type in the generator not preserving union types.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 11, The rendered arg type drops
Optional/Union (so `--teacher_model` shows `str` instead of `str | None`)
because _format_type in the generator flattens or ignores union/Optional
metadata; update _format_type to detect typing.Union/typing.Optional (use
typing.get_origin and typing.get_args) and reconstruct the type string by
joining inner argument type names with " | " (ensuring NoneType is rendered as
"None"), then return that composed union string; locate the generator function
named _format_type and modify its branch that handles non-primitive/annotated
types to handle get_origin==Union and format get_args accordingly.

modelopt/torch/opt/plugins/transformers.py (2)

73-85: ⚠️ Potential issue | 🟡 Minor

Missing bounds check for --config argument.

If --config is passed as the last argument without a path, args[idx + 1] will raise an IndexError. Add validation before accessing the next element.

🛡️ Proposed fix

         # --config <yaml_file>: load YAML as defaults, CLI args override
         if "--config" in args:
             idx = args.index("--config")
+            if idx + 1 >= len(args):
+                raise ValueError("--config requires a path argument")
             config_path = args[idx + 1]
             args = args[:idx] + args[idx + 2 :]  # strip --config <path> from argv

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 73 - 85, The code
accesses args[idx + 1] without checking bounds when parsing the "--config" flag;
add a guard after finding idx to ensure idx + 1 < len(args) and if not raise a
clear exception (e.g., ValueError("Missing path after --config")) or call
argparse/error handling, so you don't get IndexError. Update the block that
computes config_path and strips args (references: args, idx, config_path,
yaml.safe_load, self.set_defaults, and super().parse_args_into_dataclasses) to
validate the presence of the path before opening the file and only proceed to
open/configure if the check passes.

116-122: ⚠️ Potential issue | 🟡 Minor

Documentation claims "overridden" but filter excludes overridden fields.

The docstring says "additional/overridden arguments" but the filter at line 117 excludes any field whose name exists in the base TrainingArguments. Fields that override defaults won't appear in generated docs.

Either update the text to say "Only additional arguments" or adjust the logic to include fields with changed defaults.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 116 - 122, The doc
claims "additional/overridden arguments" but the current filter (own_fields = [f
for f in dataclasses.fields(dc) if f.name not in hf_training_fields]) drops any
field that exists on HuggingFace TrainingArguments, so overridden fields are
omitted; fix by either (A) updating the text to "Only additional arguments are
shown below" or (B) include overridden fields by changing the filter to keep
fields whose names are in hf_training_fields only when their defaults differ
from the base TrainingArguments defaults (e.g., build a map of base_fields =
{f.name: f for f in dataclasses.fields(transformers.TrainingArguments)} and
include f if f.name not in hf_training_fields OR base_fields.get(f.name) has a
different default/default_factory/value than f), and then update lines and the
comment accordingly (refer to is_hf_subclass, dataclasses.fields(dc),
hf_training_fields, and the lines list).

examples/llm_qat/dataset_utils.py (1)

224-231: ⚠️ Potential issue | 🟡 Minor

Broad exception handling may mask underlying issues.

Catching all Exception types could hide unexpected errors like MemoryError or tokenizer misconfiguration. Consider narrowing to expected tokenization failures.

♻️ Suggested refinement

-        except Exception as e:
+        except (ValueError, TypeError, KeyError) as e:
             logger.warning("Failed to tokenize sample: %s. Skipping.", e)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 224 - 231, The current
blanket except Exception in the tokenization fallback (around
tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too broad; restrict it to
expected tokenization errors (e.g., ValueError, TypeError, UnicodeEncodeError
and any tokenizer-specific error class from your tokenizer library such as
TokenizerError) so you don't swallow system-level errors like MemoryError, and
keep the same fallback return. Also include the exception class/name in the
logger message to aid debugging (e.g., log type(e) and message) while leaving
the pad_id/labels fallback logic unchanged.

🧹 Nitpick comments (5)

examples/llm_qat/train.py (1)

134-141: Consider raising ValueError instead of bare assert for better error handling.

The assert will be stripped in optimized mode (python -O). For user-facing validation, prefer an explicit check with ValueError.

♻️ Proposed fix

     if training_args.distill:
-        assert model_args.teacher_model is not None, "Teacher model is required for distillation."
+        if model_args.teacher_model is None:
+            raise ValueError("--teacher_model is required when --distill is enabled.")
 
         teacher_model = transformers.AutoModelForCausalLM.from_pretrained(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 134 - 141, Replace the bare assert
that checks for the teacher model when training_args.distill is true with an
explicit runtime check that raises ValueError if model_args.teacher_model is
None; update the logic around training_args.distill and model_args.teacher_model
(before calling transformers.AutoModelForCausalLM.from_pretrained) to raise
ValueError("Teacher model is required for distillation.") so the error is always
raised in user-facing validation and not stripped in optimized mode, then
proceed to load teacher_model as before.

examples/llm_qat/dataset_utils.py (1)

150-156: Specify explicit encoding when opening YAML config file.

Use encoding="utf-8" for consistent cross-platform behavior.

♻️ Proposed fix

 def load_blend_config(config_path: str) -> BlendConfig:
     """Parse a dataset blend YAML file into a :class:`BlendConfig`."""
-    with open(config_path) as f:
+    with open(config_path, encoding="utf-8") as f:
         raw = yaml.safe_load(f)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 150 - 156, The function
load_blend_config currently opens the YAML file without an explicit encoding
which can cause cross-platform inconsistencies; update the open call in
load_blend_config to specify encoding="utf-8" when opening config_path so
yaml.safe_load reads a UTF-8 stream consistently (keep the rest of the logic
that constructs DatasetSourceConfig and returns BlendConfig unchanged).

modelopt/torch/opt/plugins/transformers.py (2)

80-81: Specify explicit encoding when opening files.

Use encoding="utf-8" for consistent cross-platform behavior when reading YAML config files.

♻️ Proposed fix

-            with open(config_path) as f:
+            with open(config_path, encoding="utf-8") as f:
                 config = yaml.safe_load(f)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 81, The file
opens the YAML config with open(config_path) which can be platform-dependent;
update the open call that reads the config (the with open(...) as f: block where
config = yaml.safe_load(f) is executed) to explicitly specify encoding="utf-8"
so the YAML is read consistently across platforms, i.e. change the with open
usage to include encoding="utf-8".

149-149: Specify encoding when writing generated docs.

For consistency with the read path suggested above, use explicit UTF-8 encoding when writing the generated markdown file.

♻️ Proposed fix

-        Path(output_path).write_text("\n".join(lines) + "\n")
+        Path(output_path).write_text("\n".join(lines) + "\n", encoding="utf-8")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` at line 149, The write to disk
uses Path(output_path).write_text("\n".join(lines) + "\n") without an explicit
encoding; change the call to pass encoding="utf-8" so the generated markdown is
written using UTF-8 (i.e. update the Path(...).write_text invocation that writes
the joined lines from the lines variable for output_path to include
encoding="utf-8").

tests/examples/llm_qat/test_llm_qat.py (1)

43-68: --fsdp_transformer_layer_cls_to_wrap may be ignored for non-FSDP backends.

This argument is FSDP-specific but is passed to all backends including ddp and deepspeed. While HfArgumentParser typically ignores unknown/irrelevant args, consider conditionally adding it only for FSDP backends for clarity.

♻️ Proposed fix

 def _run_train(extra_cmd_args: list[str], backend: str = "fsdp2"):
     config_file = BACKEND_CONFIGS[backend]
     gradient_args = (
         ["--gradient_checkpointing", "True"]
         if backend in GRADIENT_CHECKPOINTING_BACKENDS
         else []
     )
+    fsdp_args = (
+        ["--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer"]
+        if backend.startswith("fsdp")
+        else []
+    )
     run_example_command(
         [
             "accelerate", "launch",
             "--config-file", config_file,
             "train.py",
             "--dataset_config", "configs/dataset/blend_test.yaml",
-            "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer",
             "--num_train_epochs", "0.3",
             "--learning_rate", "1e-5",
             "--save_steps", "5",
             "--eval_steps", "5",
             "--train_samples", "256",
             "--eval_samples", "64",
+            *fsdp_args,
             *gradient_args,
             *extra_cmd_args,
         ],

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` around lines 43 - 68, The test helper
_run_train currently always passes the FSDP-only flag
"--fsdp_transformer_layer_cls_to_wrap" to run_example_command, which can be
confusing for non-FSDP backends; change _run_train to append that flag only when
the selected backend is an FSDP backend (e.g., check backend membership in an
FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if it denotes FSDP
backends) so that run_example_command is invoked without FSDP-specific args for
ddp/deepspeed backends; update references in _run_train where the argument list
is built (alongside BACKEND_CONFIGS, GRADIENT_CHECKPOINTING_BACKENDS, and
run_example_command) to conditionally include the
"--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: The generated docs show "right padded" unhyphenated; update the
source help text for the model_max_length argument in
examples/llm_qat/arguments.py (the metadata for the model_max_length
argument/flag) to use "right-padded" instead of "right padded" so the
auto-generated ARGUMENTS.md will render the hyphenated form; edit the
help/message string in the model_max_length field definition where its
help/description is set.
- Line 11: The rendered arg type drops Optional/Union (so `--teacher_model`
shows `str` instead of `str | None`) because _format_type in the generator
flattens or ignores union/Optional metadata; update _format_type to detect
typing.Union/typing.Optional (use typing.get_origin and typing.get_args) and
reconstruct the type string by joining inner argument type names with " | "
(ensuring NoneType is rendered as "None"), then return that composed union
string; locate the generator function named _format_type and modify its branch
that handles non-primitive/annotated types to handle get_origin==Union and
format get_args accordingly.

In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current blanket except Exception in the tokenization
fallback (around tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too
broad; restrict it to expected tokenization errors (e.g., ValueError, TypeError,
UnicodeEncodeError and any tokenizer-specific error class from your tokenizer
library such as TokenizerError) so you don't swallow system-level errors like
MemoryError, and keep the same fallback return. Also include the exception
class/name in the logger message to aid debugging (e.g., log type(e) and
message) while leaving the pad_id/labels fallback logic unchanged.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The code accesses args[idx + 1] without checking bounds
when parsing the "--config" flag; add a guard after finding idx to ensure idx +
1 < len(args) and if not raise a clear exception (e.g., ValueError("Missing path
after --config")) or call argparse/error handling, so you don't get IndexError.
Update the block that computes config_path and strips args (references: args,
idx, config_path, yaml.safe_load, self.set_defaults, and
super().parse_args_into_dataclasses) to validate the presence of the path before
opening the file and only proceed to open/configure if the check passes.
- Around line 116-122: The doc claims "additional/overridden arguments" but the
current filter (own_fields = [f for f in dataclasses.fields(dc) if f.name not in
hf_training_fields]) drops any field that exists on HuggingFace
TrainingArguments, so overridden fields are omitted; fix by either (A) updating
the text to "Only additional arguments are shown below" or (B) include
overridden fields by changing the filter to keep fields whose names are in
hf_training_fields only when their defaults differ from the base
TrainingArguments defaults (e.g., build a map of base_fields = {f.name: f for f
in dataclasses.fields(transformers.TrainingArguments)} and include f if f.name
not in hf_training_fields OR base_fields.get(f.name) has a different
default/default_factory/value than f), and then update lines and the comment
accordingly (refer to is_hf_subclass, dataclasses.fields(dc),
hf_training_fields, and the lines list).

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 150-156: The function load_blend_config currently opens the YAML
file without an explicit encoding which can cause cross-platform
inconsistencies; update the open call in load_blend_config to specify
encoding="utf-8" when opening config_path so yaml.safe_load reads a UTF-8 stream
consistently (keep the rest of the logic that constructs DatasetSourceConfig and
returns BlendConfig unchanged).

In `@examples/llm_qat/train.py`:
- Around line 134-141: Replace the bare assert that checks for the teacher model
when training_args.distill is true with an explicit runtime check that raises
ValueError if model_args.teacher_model is None; update the logic around
training_args.distill and model_args.teacher_model (before calling
transformers.AutoModelForCausalLM.from_pretrained) to raise ValueError("Teacher
model is required for distillation.") so the error is always raised in
user-facing validation and not stripped in optimized mode, then proceed to load
teacher_model as before.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-81: The file opens the YAML config with open(config_path) which
can be platform-dependent; update the open call that reads the config (the with
open(...) as f: block where config = yaml.safe_load(f) is executed) to
explicitly specify encoding="utf-8" so the YAML is read consistently across
platforms, i.e. change the with open usage to include encoding="utf-8".
- Line 149: The write to disk uses Path(output_path).write_text("\n".join(lines)
+ "\n") without an explicit encoding; change the call to pass encoding="utf-8"
so the generated markdown is written using UTF-8 (i.e. update the
Path(...).write_text invocation that writes the joined lines from the lines
variable for output_path to include encoding="utf-8").

In `@tests/examples/llm_qat/test_llm_qat.py`:
- Around line 43-68: The test helper _run_train currently always passes the
FSDP-only flag "--fsdp_transformer_layer_cls_to_wrap" to run_example_command,
which can be confusing for non-FSDP backends; change _run_train to append that
flag only when the selected backend is an FSDP backend (e.g., check backend
membership in an FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if
it denotes FSDP backends) so that run_example_command is invoked without
FSDP-specific args for ddp/deepspeed backends; update references in _run_train
where the argument list is built (alongside BACKEND_CONFIGS,
GRADIENT_CHECKPOINTING_BACKENDS, and run_example_command) to conditionally
include the "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c9bac8a-7059-4aa8-aa2f-adab7aefd0d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7b581c7 and 3b4863c.

📒 Files selected for processing (34)

.pre-commit-config.yaml
examples/llm_qad/README.md
examples/llm_qat/ARGUMENTS.md
examples/llm_qat/README.md
examples/llm_qat/accelerate_config/fsdp1.yaml
examples/llm_qat/arguments.py
examples/llm_qat/configs/accelerate/ddp.yaml
examples/llm_qat/configs/accelerate/deepspeed.yaml
examples/llm_qat/configs/accelerate/fsdp2.yaml
examples/llm_qat/configs/dataset/README.md
examples/llm_qat/configs/dataset/blend.yaml
examples/llm_qat/configs/dataset/blend_example.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/train/finetune.yaml
examples/llm_qat/configs/train/ptq_eval.yaml
examples/llm_qat/configs/train/qad_nvfp4.yaml
examples/llm_qat/configs/train/qat_nvfp4.yaml
examples/llm_qat/dataset_utils.py
examples/llm_qat/launch.sh
examples/llm_qat/llama_factory/README.md
examples/llm_qat/llama_factory/launch_llamafactory.sh
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/llama_factory/llama_factory.py
examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
examples/llm_qat/quantize.py
examples/llm_qat/simple_qat_train.py
examples/llm_qat/train.py
examples/llm_qat/utils.py
examples/vllm_serve/README.md
modelopt/torch/opt/plugins/transformers.py
modelopt/torch/quantization/plugins/transformers_trainer.py
modelopt_recipes/general/ptq/int4_blockwise_weight_only.yml
tests/examples/llm_qat/test_llm_qat.py
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py

💤 Files with no reviewable changes (2)

examples/llm_qat/accelerate_config/fsdp1.yaml
examples/llm_qat/launch.sh

✅ Files skipped from review due to trivial changes (15)

examples/llm_qat/configs/train/ptq_eval.yaml
examples/vllm_serve/README.md
examples/llm_qad/README.md
examples/llm_qat/llama_factory/llama_config.yaml
examples/llm_qat/configs/dataset/blend_test.yaml
examples/llm_qat/configs/train/finetune.yaml
modelopt_recipes/general/ptq/int4_blockwise_weight_only.yml
examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
examples/llm_qat/configs/train/qad_nvfp4.yaml
examples/llm_qat/llama_factory/launch_llamafactory.sh
examples/llm_qat/configs/dataset/blend.yaml
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
examples/llm_qat/configs/dataset/blend_example.yaml
examples/llm_qat/configs/dataset/README.md
examples/llm_qat/arguments.py

🚧 Files skipped from review as they are similar to previous changes (7)

examples/llm_qat/configs/train/qat_nvfp4.yaml
examples/llm_qat/llama_factory/README.md
examples/llm_qat/simple_qat_train.py
examples/llm_qat/llama_factory/llama_factory.py
.pre-commit-config.yaml
modelopt/torch/quantization/plugins/transformers_trainer.py
examples/llm_qat/quantize.py

realAsma · 2026-06-02T16:40:26Z

/claude review

claude

ModelOpt-focused review summary

Reviewed structural and ModelOpt-specific concerns (mode/state composability, export/recipe wiring, backward-compat) — leaving style/security pattern-matching to CodeRabbit. Most prior critical findings from earlier rounds (recipe.ptq_cfg → recipe.quantize, simple_qat_train.py import, --config IndexError, ChatML boundary fix in the fallback) appear resolved.

Remaining items found this round:

IMPORTANT

examples/llm_qat/arguments.py:31 — model_name_or_path default is still meta-llama/Llama-2-7b-hf, but the new README leads end-to-end commands with Qwen/Qwen3-8B. Users who run quantize.py without --model_name_or_path will hit a gated-repo error rather than the documented Qwen flow. Either align the default with the README or make the field required.

SUGGESTIONS

modelopt/torch/opt/plugins/transformers.py:227-236 — _all_modelopt_fields() collects fields from every imported ModelOptHFArguments subclass, so a typo'd YAML key that collides with a field of an unrelated subclass loaded elsewhere in the process (e.g. quant_cfg, criterion) is silently accepted and dropped. Scoping to self.dataclass_types would tighten validation without losing the cross-subclass typo-detection intent.
modelopt/torch/distill/plugins/huggingface.py:52-65 — to_distill_kwargs validates criterion against a hardcoded set then unconditionally constructs LMLogitsLoss(). As more criteria are added, validation and construction will drift apart. A small {name: cls} dispatch keeps them in sync.
examples/llm_qat/dataset_utils.py:217-236 — The ChatML fallback mask excludes <|im_end|> from labels, while native HF {% generation %} masks for Qwen-family templates typically include it. The same dataset can produce different labels across tokenizer paths. Worth aligning, or adding a comment explaining the intentional divergence.

No CRITICAL findings, but with one IMPORTANT outstanding I'm leaving this as a comment review rather than approving. The structural refactor (YAML configs + ModelOptArgParser + recipe-driven quantization) looks solid; the items above are mostly about defaults and consistency.

cjluo-nv

Bot review — DM the bot to share feedback.

Large refactor (42 files, +2895/-1028) that has been through several review rounds. Most prior critical items (recipe.ptq_cfg→quantize, broken import, bare asserts→ValueError, ChatML boundary token, --config IndexError, broad except narrowing, FSDP1 removal docs) appear resolved. Recommending human sign-off on the following before approval:

💬 Author replied previously: "train.py now warns if --recipe is passed" — current arguments.py actually drops QuantizationArguments from TRAINING_ARG_TYPES entirely, so --recipe would raise an argparse "unrecognized argument" error rather than warn. Functionally fine (no silent discard) but the resolution differs from what was stated; a human should confirm this is the intended UX.
💬 Claude bot flagged: ModelArguments.model_name_or_path default is still meta-llama/Llama-2-7b-hf while the new README and example commands all use Qwen/Qwen3-8B — author hasn't replied to this. Users running quantize.py without --model_name_or_path will hit a gated-repo error rather than the documented Qwen flow.
💬 h-guo18 raised the design question: examples/llm_qat/dataset_utils.py (829 lines) overlaps in purpose with modelopt/torch/utils/plugins/transformers_dataset.py (LanguageDataCollator, chat-template + assistant-mask handling) and examples/dataset/. Author acknowledged this and deferred unification as a follow-up, which is reasonable for a refactor of this size, but a human reviewer with architectural context should sign off on adding a second in-repo system for chat-template tokenization rather than extending the existing one.
_all_modelopt_fields() walks every imported ModelOptHFArguments subclass globally to validate YAML keys, so a typo that collides with a field on an unrelated subclass (e.g. criterion from DistillArguments while running quantize.py) is silently accepted. Author defended this as intentional ("safer default"); worth a second opinion since it weakens typo detection.
New QuantizationArguments (now an ModelOptHFArguments subclass, with new recipe field) is also imported by examples/gpt-oss/sft.py via TrlParser — confirm that example still parses correctly with the added recipe field and the auto-@dataclass from __init_subclass__ (no test exercises this path in the PR).
Outstanding test plan checkbox: pytest tests/examples/llm_qat/test_llm_qat.py full GPU/backend suite is unchecked in the PR description.

realAsma · 2026-06-02T17:16:50Z

🤖 Bot comment.

Re: #1172 (review)

Thanks for the consolidated pass. Status on the flagged items: (1) model_name_or_path default now aligned to Qwen/Qwen3-8B to match the README. (2) --recipe: confirmed train.py drops QuantizationArguments from TRAINING_ARG_TYPES so passing --recipe there raises an argparse 'unrecognized argument' rather than warning — intentional UX; --recipe belongs to quantize.py/simple_qat_train.py. (3) dataset_utils.py vs transformers_dataset.py overlap: acknowledged, unification tracked as a follow-up for this [1/N] series. (4) _all_modelopt_fields global walk: intentional for cross-entrypoint config reuse (replied on the inline thread). (5) gpt-oss/sft.py: the new recipe field has default=None and parses cleanly under TrlParser(return_remaining_strings=True); will add a smoke test. (6) full-GPU test_llm_qat suite: will run and check the box before merge.

realAsma · 2026-06-02T18:13:53Z

/claude review

claude

ModelOpt-focused review summary

Re-reviewed structural and ModelOpt-specific concerns (mode/state composability, public-API surface, recipe wiring, backward-compat) — leaving style and security pattern-matching to CodeRabbit. Spot-checked previously raised items and they look resolved:

recipe.quantize (not recipe.ptq_cfg) is consistent across transformers_trainer.py, quantize.py, and simple_qat_train.py.
simple_qat_train.py now imports build_blend_dataset / load_blend_config from dataset_utils.py.
model_name_or_path default is now Qwen/Qwen3-8B, aligned with README.
ChatML mask no longer includes the trailing <|im_end|>; the divergence from native generation-tag masking is now called out in a comment.
train.py drops QuantizationArguments from TRAINING_ARG_TYPES, so passing --recipe to train.py raises an argparse error rather than silently being dropped (intended UX per author).
New CPU smoke test exercises gpt-oss/sft.py with the merged QuantizationArguments (incl. the new recipe field).

Remaining items found this round:

IMPORTANT

modelopt/torch/quantization/plugins/transformers_trainer.py — QuantizationArgumentsWithConfig (previously a public class re-exported via from .transformers_trainer import * in plugins/__init__.py) is silently removed. Functionally the merged QuantizationArguments.quant_cfg: str | QuantizeConfig | None covers the same case, but external imports of the old name will now hit ImportError, and the CHANGELOG entry only describes the example-side refactor. Recommend either keeping a one-line alias as a deprecation shim or calling out the rename in CHANGELOG so existing users aren't surprised.

SUGGESTIONS

examples/llm_qat/dataset_utils.py:_chatml_assistant_mask — skip_newline only clears on the next <|im_start|> or on a literal newline immediately after role. If an assistant turn's role isn't followed by \n (non-standard template), skip_newline carries through the whole turn and the first \n inside the real assistant content gets dropped from labels. Standard ChatML always emits the role-trailing newline so this doesn't bite in practice, but the assumption isn't documented and the comment at line 211 only mentions the trailing <|im_end|> divergence. A one-line fix (consume skip_newline on the first post-role token regardless) closes the edge case.
examples/llm_qat/quantize.py docstring uses meta-llama/Meta-Llama-3-8B while the rest of the example (README, arguments.py default, train configs) leads with Qwen/Qwen3-8B. Minor consistency.

No CRITICAL findings. With one outstanding IMPORTANT (compat shim/changelog for QuantizationArgumentsWithConfig), I'm leaving a non-blocking comment review rather than approve.

Refactor the examples/llm_qat example to use YAML configs driven by a shared ModelOptArgParser (HfArgumentParser wrapper), introduce ModelOptHFArguments dataclasses, a reusable QuantizationArguments dataclass, and refactored dataset/tokenization utilities. - Align ModelArguments.model_name_or_path default with the README (Qwen/Qwen3-8B) and document the intentional ChatML assistant-mask exclusion of trailing <|im_end|> and the post-header newline. - Add a CPU-only tiny gpt-oss SFT e2e test (test_gpt_oss_sft_toy) plus a tiny_gpt_oss_path fixture, validating that sft.py's TrlParser parses the shared QuantizationArguments incl. the new recipe field; move the release marker onto the heavy GPU pipeline test so the toy test runs in regular CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>

…tionArgumentsWithConfig alias - quantize.py docstring usage example: --model_name_or_path Llama-3-8B -> Qwen/Qwen3-8B and --output_dir -> qwen3-8b-quantized, matching the README and config templates. - Re-add backwards-compat alias QuantizationArgumentsWithConfig = QuantizationArguments so external imports of the pre-refactor public name keep working; document the deprecation in CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>

shengliangxu

Overall changes are scoped to the example. LGTM.

In _chatml_assistant_mask, consume skip_newline on the first post-role token regardless of whether it is a newline, so a malformed template whose role header is not immediately followed by a newline can no longer drop a later content newline from the labels. Behavior is unchanged for standard ChatML (role always followed by a newline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>

Add a 'Arguments by Script' table to the generated ARGUMENTS.md via a new docs_header_extra hook on ModelOptArgParser, clarifying which argument groups each of quantize.py/train.py reads and that mismatched-group keys are silently ignored. Link ARGUMENTS.md from the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma requested review from a team as code owners April 2, 2026 23:32

realAsma requested review from Edwardf0t1 and kevalmorabia97 April 2, 2026 23:32

coderabbitai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread .pre-commit-config.yaml

Comment thread examples/llm_qat/ARGUMENTS.md Outdated

Comment thread modelopt/torch/opt/plugins/transformers.py

realAsma force-pushed the asma/new-qat-1 branch from e764ad9 to 66e02a7 Compare April 3, 2026 12:53

realAsma requested a review from a team as a code owner April 3, 2026 12:53

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread examples/llm_qat/ARGUMENTS.md Outdated

Comment thread examples/llm_qat/ARGUMENTS.md Outdated

Comment thread examples/llm_qat/main.py Outdated

Comment thread modelopt/torch/opt/plugins/transformers.py

realAsma force-pushed the asma/new-qat-1 branch from 66e02a7 to be7ba74 Compare April 5, 2026 21:29

realAsma requested a review from a team as a code owner April 5, 2026 21:29

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread examples/llm_qat/llama_factory/launch_llamafactory.sh

Comment thread examples/llm_qat/quantize.py

Comment thread examples/llm_qat/README.md Outdated

realAsma force-pushed the asma/new-qat-1 branch from be7ba74 to 7b581c7 Compare April 6, 2026 12:43

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread examples/llm_qat/dataset_utils.py Outdated

realAsma force-pushed the asma/new-qat-1 branch from 7b581c7 to 3b4863c Compare April 6, 2026 13:34

realAsma requested a review from a team as a code owner April 6, 2026 13:34

realAsma requested a review from h-guo18 April 6, 2026 13:34

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

realAsma force-pushed the asma/new-qat-1 branch from 3b4863c to 7f82fe2 Compare April 6, 2026 19:40

realAsma requested a review from a team as a code owner April 6, 2026 19:40

realAsma force-pushed the asma/new-qat-1 branch 6 times, most recently from bfc343c to b3d46c7 Compare April 8, 2026 17:07

realAsma requested a review from h-guo18 June 2, 2026 16:34

realAsma force-pushed the asma/new-qat-1 branch from 500c04f to de30e03 Compare June 2, 2026 16:40