[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser#1172
[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser#1172realAsma wants to merge 4 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSplits the monolithic llm_qat launcher into modular Changes
Sequence Diagram(s)mermaid mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)
70-79: Consider adding error-handling tests.The empty YAML test is good. Consider adding tests for:
- Non-existent
--configfile path (should raise appropriate error)- Malformed YAML syntax (should raise parse error, not crash)
Example test for missing config file
def test_missing_config_file(self, tmp_path): parser = ModelOptArgParser((_ModelArgs, _TrainArgs)) with pytest.raises(FileNotFoundError): parser.parse_args_into_dataclasses( args=["--config", str(tmp_path / "nonexistent.yaml")] )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 70 - 79, Add two new unit tests to tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g., test_missing_config_file) should call parser.parse_args_into_dataclasses with args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises FileNotFoundError using pytest.raises; the other test (e.g., test_malformed_yaml_config) should write malformed YAML to a temp file and assert parser.parse_args_into_dataclasses raises the YAML parsing exception (yaml.YAMLError or the parser-specific parse error) using pytest.raises; reference ModelOptArgParser, parse_args_into_dataclasses, and use tmp_path/pytest.raises to locate where to add these tests.examples/llm_qat/main.py (1)
197-198: Consider validatingquant_cfgagainst known configurations.Using
getattr(mtq, quant_args.quant_cfg)with a user-provided string could potentially access unintended module attributes. While this is a CLI argument (not untrusted network input), adding validation against known configs would be more defensive.🛡️ Optional: Add validation against known configs
if quant_args.quant_cfg is not None: + if not hasattr(mtq, quant_args.quant_cfg) or not quant_args.quant_cfg.endswith("_CFG"): + raise ValueError( + f"Unknown quant_cfg: {quant_args.quant_cfg}. " + "See modelopt.torch.quantization.config for available options." + ) quant_args.quant_cfg = getattr(mtq, quant_args.quant_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/main.py` around lines 197 - 198, The current line uses getattr(mtq, quant_args.quant_cfg) directly which may resolve unintended attributes; validate quant_args.quant_cfg against an explicit whitelist of allowed config names before calling getattr. Update the logic around quant_args.quant_cfg: define the set/list of permitted config names (e.g., known keys in mtq), check that quant_args.quant_cfg is in that list, and only then call getattr(mtq, quant_args.quant_cfg); if not allowed, raise a clear error or exit with a helpful message. Ensure you reference the same symbol names (quant_args.quant_cfg and mtq) so the check is applied right where the getattr is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.pre-commit-config.yaml:
- Around line 130-142: The pre-commit hook runs examples/llm_qat/main.py but
heavy module-level imports (torch, transformers, modelopt modules) run before
the argument parser (parse_args_into_dataclasses()) and train(), causing
slowness and CUDA failures; fix by lazy-loading those heavy imports (move
torch/transformers/modelopt imports inside train() or into helper functions
called only when not generating docs) or add a tiny dedicated script that
imports only the argument dataclasses for the --generate_docs flow, or check
sys.argv for "--generate_docs" at module import time and skip heavy imports;
reference the examples/llm_qat/main.py entrypoint, the train() function, and
parse_args_into_dataclasses() when making the change.
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 35: Update the metadata help string for
TrainingArguments.model_max_length in examples/llm_qat/main.py to use the
hyphenated form "right-padded" instead of "right padded"; locate the
TrainingArguments definition or the argument registration for model_max_length
(symbol: TrainingArguments.model_max_length or the argument name
"--model_max_length") and modify its help/metadata string accordingly so the
generated ARGUMENTS.md shows "right-padded".
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The "--config" handling may access args[idx + 1] when
"--config" is last and raise IndexError; update the block that detects
"--config" to check that idx + 1 < len(args) before reading args[idx + 1], and
if missing raise a clear error (e.g., raise ValueError or use the argparse error
mechanism) with a message like "Missing value for --config" so callers get a
friendly error instead of an IndexError; ensure the rest of the flow (setting
config_path, loading YAML, and calling self.set_defaults(**config)) only runs
when a valid path is present.
---
Nitpick comments:
In `@examples/llm_qat/main.py`:
- Around line 197-198: The current line uses getattr(mtq, quant_args.quant_cfg)
directly which may resolve unintended attributes; validate quant_args.quant_cfg
against an explicit whitelist of allowed config names before calling getattr.
Update the logic around quant_args.quant_cfg: define the set/list of permitted
config names (e.g., known keys in mtq), check that quant_args.quant_cfg is in
that list, and only then call getattr(mtq, quant_args.quant_cfg); if not
allowed, raise a clear error or exit with a helpful message. Ensure you
reference the same symbol names (quant_args.quant_cfg and mtq) so the check is
applied right where the getattr is used.
In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 70-79: Add two new unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error
handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g.,
test_missing_config_file) should call parser.parse_args_into_dataclasses with
args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises
FileNotFoundError using pytest.raises; the other test (e.g.,
test_malformed_yaml_config) should write malformed YAML to a temp file and
assert parser.parse_args_into_dataclasses raises the YAML parsing exception
(yaml.YAMLError or the parser-specific parse error) using pytest.raises;
reference ModelOptArgParser, parse_args_into_dataclasses, and use
tmp_path/pytest.raises to locate where to add these tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9730ae6e-e2ad-4302-b952-ad714e48536f
📒 Files selected for processing (13)
.pre-commit-config.yamlexamples/llm_qat/ARGUMENTS.mdexamples/llm_qat/README.mdexamples/llm_qat/configs/finetune.yamlexamples/llm_qat/configs/ptq_eval.yamlexamples/llm_qat/configs/qad_nvfp4.yamlexamples/llm_qat/configs/qat_nvfp4.yamlexamples/llm_qat/launch.shexamples/llm_qat/main.pymodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pytests/examples/llm_qat/test_llm_qat.pytests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (1)
- examples/llm_qat/launch.sh
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1172 +/- ##
===========================================
- Coverage 77.41% 67.38% -10.03%
===========================================
Files 480 481 +1
Lines 52506 52904 +398
===========================================
- Hits 40645 35647 -4998
- Misses 11861 17257 +5396
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/llm_qat/main.py (1)
77-79:⚠️ Potential issue | 🟡 MinorGrammar: "right padded" should be "right-padded".
This is the source of the grammar issue in the generated
ARGUMENTS.md.📝 Proposed fix
model_max_length: int = field( default=4096, metadata={ "help": ( - "Maximum sequence length. Sequences will be right padded (and possibly truncated)." + "Maximum sequence length. Sequences will be right-padded (and possibly truncated)." ) }, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/main.py` around lines 77 - 79, Update the help string that currently reads "Maximum sequence length. Sequences will be right padded (and possibly truncated)." to use a hyphenated adjective: change it to "Maximum sequence length. Sequences will be right-padded (and possibly truncated)." Locate the string in examples/llm_qat/main.py where the argument's "help" value for "Maximum sequence length" is defined (the dict entry with key "help" or the parser.add_argument call providing that help text) and replace the phrase "right padded" with "right-padded".
🧹 Nitpick comments (3)
modelopt/torch/opt/plugins/transformers.py (2)
80-83: Missing error handling for YAML file operations.If the config file doesn't exist or contains invalid YAML, the error messages won't be user-friendly. Consider wrapping in try/except with clearer error context.
🛡️ Proposed improvement
- with open(config_path) as f: - config = yaml.safe_load(f) - if config: - self.set_defaults(**config) + try: + with open(config_path) as f: + config = yaml.safe_load(f) + except FileNotFoundError: + raise ValueError(f"Config file not found: {config_path}") + except yaml.YAMLError as e: + raise ValueError(f"Invalid YAML in config file {config_path}: {e}") + if config: + self.set_defaults(**config)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 83, Wrap the config file read and YAML parsing around a try/except to provide clear, user-friendly errors: catch FileNotFoundError to indicate the provided config_path is missing, catch yaml.YAMLError to indicate the file contains invalid YAML, and catch a generic Exception for any other I/O issues; in the try block keep the existing open(config_path) -> yaml.safe_load flow and call self.set_defaults(**config) when config is truthy, and in each except produce a clear message or re-raise a new exception that mentions config_path and the original error to aid debugging.
78-83: YAML import inside function body — consider moving to module level.The
import yamlstatement is inside the conditional block. While this works, it's unconventional and may cause slight latency on first use. Moving it to the top-level imports would be cleaner.♻️ Suggested refactor
import dataclasses import sys import types from contextlib import contextmanager from pathlib import Path +import yaml import torch import transformersThen remove the
import yamlfrom line 78.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 78 - 83, The local "import yaml" inside the block that reads config_path should be moved to the module top-level and the inline import removed; update the module imports to include "import yaml" and then keep the body as: open(config_path), yaml.safe_load, and if config: self.set_defaults(**config) so that yaml is only imported once at module import time and the inline import is deleted.tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)
36-120: Consider adding edge case tests for error handling.The current tests cover happy paths well. Consider adding tests for:
--configwith non-existent file path--configwith invalid YAML syntax--configas last argument without a value💡 Example additional tests
def test_config_file_not_found(self, tmp_path): parser = ModelOptArgParser((_ModelArgs, _TrainArgs)) with pytest.raises(FileNotFoundError): parser.parse_args_into_dataclasses( args=["--config", str(tmp_path / "nonexistent.yaml")] ) def test_config_missing_value(self): parser = ModelOptArgParser((_ModelArgs, _TrainArgs)) with pytest.raises((IndexError, ValueError)): parser.parse_args_into_dataclasses(args=["--config"])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 36 - 120, Add edge-case unit tests to tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error handling: create a test function (e.g., test_config_file_not_found) that constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls parse_args_into_dataclasses with args=["--config", str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g., test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test (e.g., test_config_missing_value) that calls parse_args_into_dataclasses with args=["--config"] and asserts an IndexError or ValueError is raised. Use the existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate where to add these tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Around line 28-37: The generated TrainingArguments docs omit parameters whose
names are filtered out even when their defaults differ from the HF base; update
the doc generation to surface overridden defaults: in the function
_generate_docs locate the logic that filters out fields by name and change it to
also compare each field's default to the corresponding attribute on
transformers.TrainingArguments (e.g., learning_rate, eval_strategy, do_eval) and
include any field whose default value differs, or alternatively add an explicit
"Overridden defaults" section to ARGUMENTS.md listing each overridden argument
and its non-HF default so users can see values like learning_rate=1e-4.
- Around line 9-11: The generated docs drop union/Optional info (e.g. `str |
None` becoming `str`) because _format_type currently simplifies annotations;
update _format_type so it preserves union/Optional annotations (use
typing.get_origin/get_args or equivalent) and format Union types by joining
inner types with " | " (so a source annotation `str | None` yields `str |
None`), then ensure _generate_docs uses this updated _format_type when rendering
argument types like quant_cfg and other optional fields.
In `@examples/llm_qat/main.py`:
- Around line 64-69: ModelArguments.model_name_or_path lacks metadata helper
text; add a metadata={"help": "..."} to that field so the generated ARGUMENTS.md
includes a description. Edit the ModelArguments class and update the
model_name_or_path declaration to include a clear help string (e.g., "The name
or path of the pretrained model to use.") in the metadata dict, mirroring the
style used for teacher_model.
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 116-122: The docstring claims "additional/overridden arguments"
but the current filter (own_fields = [f for f in dataclasses.fields(dc) if
f.name not in hf_training_fields]) only keeps truly new fields; update the
behavior by either (A) changing the text emitted in the lines.append call to
"Only additional arguments are shown below" to remove "overridden", or (B)
change the filtering logic to detect overrides: lookup the corresponding field
in the HuggingFace TrainingArguments (hf_training_fields) and compare default
values (and types if desired) with the dataclass field from
dataclasses.fields(dc), and include the field if the default differs (treating
missing defaults as different); modify the code around is_hf_subclass /
own_fields and the lines.append text accordingly so overridden fields are
included when defaults differ.
---
Outside diff comments:
In `@examples/llm_qat/main.py`:
- Around line 77-79: Update the help string that currently reads "Maximum
sequence length. Sequences will be right padded (and possibly truncated)." to
use a hyphenated adjective: change it to "Maximum sequence length. Sequences
will be right-padded (and possibly truncated)." Locate the string in
examples/llm_qat/main.py where the argument's "help" value for "Maximum sequence
length" is defined (the dict entry with key "help" or the parser.add_argument
call providing that help text) and replace the phrase "right padded" with
"right-padded".
---
Nitpick comments:
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-83: Wrap the config file read and YAML parsing around a
try/except to provide clear, user-friendly errors: catch FileNotFoundError to
indicate the provided config_path is missing, catch yaml.YAMLError to indicate
the file contains invalid YAML, and catch a generic Exception for any other I/O
issues; in the try block keep the existing open(config_path) -> yaml.safe_load
flow and call self.set_defaults(**config) when config is truthy, and in each
except produce a clear message or re-raise a new exception that mentions
config_path and the original error to aid debugging.
- Around line 78-83: The local "import yaml" inside the block that reads
config_path should be moved to the module top-level and the inline import
removed; update the module imports to include "import yaml" and then keep the
body as: open(config_path), yaml.safe_load, and if config:
self.set_defaults(**config) so that yaml is only imported once at module import
time and the inline import is deleted.
In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 36-120: Add edge-case unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error
handling: create a test function (e.g., test_config_file_not_found) that
constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls
parse_args_into_dataclasses with args=["--config",
str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g.,
test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts
parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test
(e.g., test_config_missing_value) that calls parse_args_into_dataclasses with
args=["--config"] and asserts an IndexError or ValueError is raised. Use the
existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate
where to add these tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf32b98f-0804-4249-87e1-ce58d13d15e2
📒 Files selected for processing (14)
.pre-commit-config.yamlexamples/llm_qad/README.mdexamples/llm_qat/ARGUMENTS.mdexamples/llm_qat/README.mdexamples/llm_qat/configs/finetune.yamlexamples/llm_qat/configs/ptq_eval.yamlexamples/llm_qat/configs/qad_nvfp4.yamlexamples/llm_qat/configs/qat_nvfp4.yamlexamples/llm_qat/launch.shexamples/llm_qat/main.pymodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pytests/examples/llm_qat/test_llm_qat.pytests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (1)
- examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (5)
- examples/llm_qad/README.md
- examples/llm_qat/configs/qat_nvfp4.yaml
- examples/llm_qat/configs/ptq_eval.yaml
- examples/llm_qat/configs/finetune.yaml
- examples/llm_qat/configs/qad_nvfp4.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- .pre-commit-config.yaml
- tests/examples/llm_qat/test_llm_qat.py
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
examples/llm_qat/ARGUMENTS.md (1)
10-10:⚠️ Potential issue | 🟡 MinorMinor docs grammar: use “right-padded.”
Line 10 still says “right padded”. Since this is auto-generated, update the originating help string so regenerated docs use “right-padded”.
As per coding guidelines,
**/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/ARGUMENTS.md` at line 10, Update the help string for the CLI option that defines --model_max_length (the add_argument call for '--model_max_length' / the "model_max_length" help text) to use the grammatically correct phrase "right-padded." instead of "right padded" so regenerated docs reflect the change; locate the add_argument or option definition that sets the default/help for model_max_length and change the help string to include "right-padded." and re-run doc generation/pre-commit.
🧹 Nitpick comments (4)
examples/llm_qat/train.py (2)
137-141: Sametrust_remote_codeconcern applies to teacher model loading.The teacher model loading should also use the configurable
trust_remote_codeparameter for consistency.♻️ Suggested fix
teacher_model = transformers.AutoModelForCausalLM.from_pretrained( model_args.teacher_model, cache_dir=training_args.cache_dir, torch_dtype=torch.bfloat16, + trust_remote_code=model_args.trust_remote_code, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/train.py` around lines 137 - 141, The teacher model is being loaded via transformers.AutoModelForCausalLM.from_pretrained without using the configurable trust_remote_code flag; update the teacher_model loading call (the call that assigns to teacher_model and uses model_args.teacher_model) to pass the same trust_remote_code parameter used for the student/model loading (i.e., forward the configured trust_remote_code boolean into from_pretrained) so teacher model remote code is handled consistently.
96-104: Consider exposingtrust_remote_codeas a configurable argument.The model and tokenizer loading accepts user-provided
model_name_or_pathbut doesn't exposetrust_remote_codeas a configurable parameter. While defaulting toFalseis correct for security, some models requiretrust_remote_code=Trueto function. Per coding guidelines, this should be caller-configurable rather than implicitly relying on the default.♻️ Suggested approach
Add
trust_remote_code: bool = FalsetoModelArgumentsinarguments.py, then pass it to the model and tokenizer loading calls:model = transformers.AutoModelForCausalLM.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, torch_dtype=torch.bfloat16, + trust_remote_code=model_args.trust_remote_code, ) ... tokenizer = transformers.AutoTokenizer.from_pretrained( - model_args.model_name_or_path, model_max_length=model_args.model_max_length + model_args.model_name_or_path, + model_max_length=model_args.model_max_length, + trust_remote_code=model_args.trust_remote_code, )As per coding guidelines, "Do not hardcode
trust_remote_code=Truewhen loading Hugging Face Transformers models. Let the caller decide via a parameter; default toFalse."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/train.py` around lines 96 - 104, The loader calls in train.py (transformers.AutoModelForCausalLM.from_pretrained and transformers.AutoTokenizer.from_pretrained) currently don't expose trust_remote_code; add a new boolean field trust_remote_code: bool = False to the ModelArguments dataclass (arguments.py) and then pass model_args.trust_remote_code into both from_pretrained calls (e.g., trust_remote_code=model_args.trust_remote_code) so callers can opt-in while retaining a safe default.examples/llm_qat/utils.py (1)
43-43: Consider moving import to module level for clarity.The runtime import inside the function works but is unconventional. If there's no circular import concern, moving it to the module level would improve readability and enable better static analysis.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/utils.py` at line 43, The file currently does a runtime import of build_blend_dataset and load_blend_config from dataset_utils inside a function; move the import statement for build_blend_dataset and load_blend_config to the top of examples/llm_qat/utils.py (module-level) and delete the in-function import so callers use the module-level imports instead, ensuring there are no circular-import issues before changing.examples/llm_qat/dataset_utils.py (1)
721-723: Consider makingtrust_remote_codeexplicit for defense-in-depth.While
trust_remote_codedefaults toFalsein HuggingFace Transformers, the coding guidelines recommend making this explicit or caller-configurable. For a CLI tool that accepts arbitrary tokenizer paths, consider adding an explicittrust_remote_code=Falseor exposing it as a CLI flag.♻️ Suggested improvement
tokenizer = transformers.AutoTokenizer.from_pretrained( - args.tokenizer, model_max_length=args.max_length + args.tokenizer, model_max_length=args.max_length, trust_remote_code=False )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/dataset_utils.py` around lines 721 - 723, The AutoTokenizer instantiation uses transformers.AutoTokenizer.from_pretrained without explicitly setting trust_remote_code, which should be made explicit for defense-in-depth; update the call in dataset_utils.py where tokenizer is created (transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g., args.trust_remote_code) and pass that value to from_pretrained so callers can opt in explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_qat/llama_factory/launch_llamafactory.sh`:
- Line 143: The --use_fsdp2 flag is parsed and shown in help but has no effect
because default config selection no longer reads USE_FSDP2; either remove the
flag from the help and argument parsing or make it effective by wiring it into
config selection and validation. Fix by updating the argument parsing block that
sets USE_FSDP2 (the case branch handling "--use_fsdp2") and the help echo line
to either delete that option, or if you keep it, validate the provided value
(true/false), set the USE_FSDP2 variable accordingly, and use USE_FSDP2 when
choosing the default config (where default config selection currently ignores
USE_FSDP2); if keeping the flag also emit a clear warning/error when an
unsupported value is provided.
In `@examples/llm_qat/quantize.py`:
- Around line 89-93: The calibration forward_loop currently runs forward passes
with gradients enabled and without evaluation mode; wrap the loop in
torch.inference_mode() and call model.eval() before iterating to disable
autograd and dropout/batchnorm updates (e.g., in forward_loop use model.eval()
and with torch.inference_mode(): for batch in calib_dataloader ...); also
consider restoring the original training state after calibration if the caller
expects the model's training flag unchanged.
In `@examples/llm_qat/README.md`:
- Around line 58-67: Update the README examples to use flag-style boolean
arguments to match ARGUMENTS.md: remove explicit True/False values and use the
flags `--distill` and `--lora` where currently written as `--distill True` or
`--lora True` (for example update the train invocation block containing
`--distill True` and any other occurrences such as the `--lora True` example
around line 225) so the CLI examples consistently use `--distill` and `--lora`
flag syntax.
---
Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: Update the help string for the CLI option that defines
--model_max_length (the add_argument call for '--model_max_length' / the
"model_max_length" help text) to use the grammatically correct phrase
"right-padded." instead of "right padded" so regenerated docs reflect the
change; locate the add_argument or option definition that sets the default/help
for model_max_length and change the help string to include "right-padded." and
re-run doc generation/pre-commit.
---
Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 721-723: The AutoTokenizer instantiation uses
transformers.AutoTokenizer.from_pretrained without explicitly setting
trust_remote_code, which should be made explicit for defense-in-depth; update
the call in dataset_utils.py where tokenizer is created
(transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and
args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g.,
args.trust_remote_code) and pass that value to from_pretrained so callers can
opt in explicitly.
In `@examples/llm_qat/train.py`:
- Around line 137-141: The teacher model is being loaded via
transformers.AutoModelForCausalLM.from_pretrained without using the configurable
trust_remote_code flag; update the teacher_model loading call (the call that
assigns to teacher_model and uses model_args.teacher_model) to pass the same
trust_remote_code parameter used for the student/model loading (i.e., forward
the configured trust_remote_code boolean into from_pretrained) so teacher model
remote code is handled consistently.
- Around line 96-104: The loader calls in train.py
(transformers.AutoModelForCausalLM.from_pretrained and
transformers.AutoTokenizer.from_pretrained) currently don't expose
trust_remote_code; add a new boolean field trust_remote_code: bool = False to
the ModelArguments dataclass (arguments.py) and then pass
model_args.trust_remote_code into both from_pretrained calls (e.g.,
trust_remote_code=model_args.trust_remote_code) so callers can opt-in while
retaining a safe default.
In `@examples/llm_qat/utils.py`:
- Line 43: The file currently does a runtime import of build_blend_dataset and
load_blend_config from dataset_utils inside a function; move the import
statement for build_blend_dataset and load_blend_config to the top of
examples/llm_qat/utils.py (module-level) and delete the in-function import so
callers use the module-level imports instead, ensuring there are no
circular-import issues before changing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4e285767-bcf3-4de5-8296-f9658a97092c
📒 Files selected for processing (33)
.pre-commit-config.yamlexamples/llm_qad/README.mdexamples/llm_qat/ARGUMENTS.mdexamples/llm_qat/README.mdexamples/llm_qat/accelerate_config/fsdp1.yamlexamples/llm_qat/arguments.pyexamples/llm_qat/configs/accelerate/ddp.yamlexamples/llm_qat/configs/accelerate/deepspeed.yamlexamples/llm_qat/configs/accelerate/fsdp2.yamlexamples/llm_qat/configs/dataset/README.mdexamples/llm_qat/configs/dataset/blend.yamlexamples/llm_qat/configs/dataset/blend_example.yamlexamples/llm_qat/configs/dataset/blend_test.yamlexamples/llm_qat/configs/training/finetune.yamlexamples/llm_qat/configs/training/ptq_eval.yamlexamples/llm_qat/configs/training/qad_nvfp4.yamlexamples/llm_qat/configs/training/qat_nvfp4.yamlexamples/llm_qat/dataset_utils.pyexamples/llm_qat/launch.shexamples/llm_qat/llama_factory/README.mdexamples/llm_qat/llama_factory/launch_llamafactory.shexamples/llm_qat/llama_factory/llama_config.yamlexamples/llm_qat/llama_factory/llama_factory.pyexamples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynbexamples/llm_qat/quantize.pyexamples/llm_qat/simple_qat_train.pyexamples/llm_qat/train.pyexamples/llm_qat/utils.pyexamples/vllm_serve/README.mdmodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pytests/examples/llm_qat/test_llm_qat.pytests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
- examples/llm_qat/launch.sh
- examples/llm_qat/accelerate_config/fsdp1.yaml
✅ Files skipped from review due to trivial changes (14)
- examples/llm_qad/README.md
- examples/vllm_serve/README.md
- examples/llm_qat/llama_factory/llama_config.yaml
- examples/llm_qat/configs/training/ptq_eval.yaml
- examples/llm_qat/configs/training/finetune.yaml
- examples/llm_qat/configs/training/qat_nvfp4.yaml
- examples/llm_qat/configs/training/qad_nvfp4.yaml
- examples/llm_qat/configs/dataset/blend_test.yaml
- examples/llm_qat/configs/dataset/blend.yaml
- tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
- examples/llm_qat/configs/dataset/README.md
- examples/llm_qat/configs/dataset/blend_example.yaml
- .pre-commit-config.yaml
- examples/llm_qat/arguments.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/quantization/plugins/transformers_trainer.py
- modelopt/torch/opt/plugins/transformers.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/llm_qat/dataset_utils.py (1)
669-671: Exposetrust_remote_codeas a CLI parameter.While
trust_remote_codedefaults toFalse(safe), the coding guidelines recommend letting callers configure this parameter. For CLI tools, users should be able to opt-in when loading tokenizers that require custom code.♻️ Suggested improvement
parser.add_argument("--tokenizer", required=True, help="Tokenizer name or path") parser.add_argument("--max_length", type=int, default=4096, help="Max sequence length") + parser.add_argument( + "--trust_remote_code", + action="store_true", + default=False, + help="Allow executing custom code from the tokenizer repository (use with caution)", + ) args = parser.parse_args() config = load_blend_config(args.config) tokenizer = transformers.AutoTokenizer.from_pretrained( - args.tokenizer, model_max_length=args.max_length + args.tokenizer, + model_max_length=args.max_length, + trust_remote_code=args.trust_remote_code, )As per coding guidelines: "Do not hardcode
trust_remote_code=Truewhen loading Hugging Face Transformers models. Let the caller decide via a parameter; default toFalse."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/dataset_utils.py` around lines 669 - 671, Add a CLI flag to expose Hugging Face's trust_remote_code instead of hardcoding it: update the argument parser to accept something like --trust-remote-code (default False) and pass that flag into the tokenizer loading call (transformers.AutoTokenizer.from_pretrained) using args.trust_remote_code, so the tokenizer creation (tokenizer = transformers.AutoTokenizer.from_pretrained(args.tokenizer, model_max_length=args.max_length, trust_remote_code=args.trust_remote_code)) respects the caller's choice.tests/examples/llm_qat/test_llm_qat.py (2)
82-82: Normalizetiny_llama_pathbefore appending it toextra_cmd_args.
_run_quantize()is typed as takinglist[str], and the other filesystem arguments in these calls are already wrapped withstr(...). Convertingtiny_llama_pathhere keeps the argv contract consistent and avoids surprises if the fixture returns aPath.🧩 Proposed fix
- "--model_name_or_path", tiny_llama_path, + "--model_name_or_path", str(tiny_llama_path),Apply the same change in all three
_run_quantize()call sites.Also applies to: 105-105, 130-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/llm_qat/test_llm_qat.py` at line 82, The test passes tiny_llama_path directly into the argv list which can be a Path object; update all three _run_quantize(...) call sites to convert tiny_llama_path to a string (e.g., str(tiny_llama_path)) when adding it to extra_cmd_args so the list[str] contract for _run_quantize and consistency with other filesystem args is preserved; locate the calls that pass "--model_name_or_path", tiny_llama_path and replace tiny_llama_path with its stringified form.
35-35: Avoid relying onPATHfor the Python and Accelerate entry points.Using bare
pythonandacceleratecan pick a different environment than the one running pytest, which makes these integration tests brittle in CI and local virtualenvs. Ifrun_example_command()does not already normalize the launcher, prefer invoking both steps from the active interpreter.Also applies to: 52-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/llm_qat/test_llm_qat.py` at line 35, The test currently invokes the example scripts via the bare strings "python" and "accelerate" which can resolve to the wrong interpreter; update the test to launch both steps with the active Python interpreter instead: replace occurrences of "python" with the running interpreter (use sys.executable) and replace "accelerate" with a module invocation via the interpreter (i.e. sys.executable with "-m accelerate" or equivalent) when calling run_example_command(); update both the quantize step (strings "python", "quantize.py") and the train/accelerate step (strings "accelerate", "launch" / related args for "train.py") so the commands use the active interpreter rather than relying on PATH.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current broad except Exception in the tokenization
fallback hides unexpected errors; change it to catch only expected
tokenization-related errors (e.g., ValueError, UnicodeEncodeError, and any
specific tokenizer error class your tokenizer exposes) around the tokenizer
invocation, keep the logger.warning and the fallback pad/label construction
using pad_id, max_length, and IGNORE_TOKEN_ID, and for any other exceptions
re-raise them so they surface (i.e., replace "except Exception as e" with a
narrowed except tuple and add a bare raise in an outer except to propagate
unexpected errors).
---
Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 669-671: Add a CLI flag to expose Hugging Face's trust_remote_code
instead of hardcoding it: update the argument parser to accept something like
--trust-remote-code (default False) and pass that flag into the tokenizer
loading call (transformers.AutoTokenizer.from_pretrained) using
args.trust_remote_code, so the tokenizer creation (tokenizer =
transformers.AutoTokenizer.from_pretrained(args.tokenizer,
model_max_length=args.max_length, trust_remote_code=args.trust_remote_code))
respects the caller's choice.
In `@tests/examples/llm_qat/test_llm_qat.py`:
- Line 82: The test passes tiny_llama_path directly into the argv list which can
be a Path object; update all three _run_quantize(...) call sites to convert
tiny_llama_path to a string (e.g., str(tiny_llama_path)) when adding it to
extra_cmd_args so the list[str] contract for _run_quantize and consistency with
other filesystem args is preserved; locate the calls that pass
"--model_name_or_path", tiny_llama_path and replace tiny_llama_path with its
stringified form.
- Line 35: The test currently invokes the example scripts via the bare strings
"python" and "accelerate" which can resolve to the wrong interpreter; update the
test to launch both steps with the active Python interpreter instead: replace
occurrences of "python" with the running interpreter (use sys.executable) and
replace "accelerate" with a module invocation via the interpreter (i.e.
sys.executable with "-m accelerate" or equivalent) when calling
run_example_command(); update both the quantize step (strings "python",
"quantize.py") and the train/accelerate step (strings "accelerate", "launch" /
related args for "train.py") so the commands use the active interpreter rather
than relying on PATH.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d1f6438-abe5-4f9c-94c4-93bd1e217408
📒 Files selected for processing (33)
.pre-commit-config.yamlexamples/llm_qad/README.mdexamples/llm_qat/ARGUMENTS.mdexamples/llm_qat/README.mdexamples/llm_qat/accelerate_config/fsdp1.yamlexamples/llm_qat/arguments.pyexamples/llm_qat/configs/accelerate/ddp.yamlexamples/llm_qat/configs/accelerate/deepspeed.yamlexamples/llm_qat/configs/accelerate/fsdp2.yamlexamples/llm_qat/configs/dataset/README.mdexamples/llm_qat/configs/dataset/blend.yamlexamples/llm_qat/configs/dataset/blend_example.yamlexamples/llm_qat/configs/dataset/blend_test.yamlexamples/llm_qat/configs/train/finetune.yamlexamples/llm_qat/configs/train/ptq_eval.yamlexamples/llm_qat/configs/train/qad_nvfp4.yamlexamples/llm_qat/configs/train/qat_nvfp4.yamlexamples/llm_qat/dataset_utils.pyexamples/llm_qat/launch.shexamples/llm_qat/llama_factory/README.mdexamples/llm_qat/llama_factory/launch_llamafactory.shexamples/llm_qat/llama_factory/llama_config.yamlexamples/llm_qat/llama_factory/llama_factory.pyexamples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynbexamples/llm_qat/quantize.pyexamples/llm_qat/simple_qat_train.pyexamples/llm_qat/train.pyexamples/llm_qat/utils.pyexamples/vllm_serve/README.mdmodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pytests/examples/llm_qat/test_llm_qat.pytests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
- examples/llm_qat/accelerate_config/fsdp1.yaml
- examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (13)
- examples/llm_qat/configs/train/ptq_eval.yaml
- examples/vllm_serve/README.md
- examples/llm_qat/llama_factory/llama_config.yaml
- examples/llm_qat/configs/train/finetune.yaml
- examples/llm_qad/README.md
- examples/llm_qat/configs/train/qat_nvfp4.yaml
- .pre-commit-config.yaml
- examples/llm_qat/configs/dataset/blend.yaml
- examples/llm_qat/configs/train/qad_nvfp4.yaml
- examples/llm_qat/configs/dataset/blend_test.yaml
- examples/llm_qat/configs/dataset/blend_example.yaml
- examples/llm_qat/llama_factory/README.md
- examples/llm_qat/configs/dataset/README.md
🚧 Files skipped from review as they are similar to previous changes (5)
- examples/llm_qat/llama_factory/llama_factory.py
- tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
- examples/llm_qat/utils.py
- examples/llm_qat/arguments.py
- examples/llm_qat/train.py
There was a problem hiding this comment.
♻️ Duplicate comments (5)
examples/llm_qat/ARGUMENTS.md (2)
10-10:⚠️ Potential issue | 🟡 MinorGrammar: "right padded" should be hyphenated.
Since this file is auto-generated, fix the source help text in the
model_max_lengthfield metadata inexamples/llm_qat/arguments.pyto use "right-padded".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/ARGUMENTS.md` at line 10, The generated docs show "right padded" unhyphenated; update the source help text for the model_max_length argument in examples/llm_qat/arguments.py (the metadata for the model_max_length argument/flag) to use "right-padded" instead of "right padded" so the auto-generated ARGUMENTS.md will render the hyphenated form; edit the help/message string in the model_max_length field definition where its help/description is set.
11-11:⚠️ Potential issue | 🟡 MinorType display drops Optional/Union annotation.
The
--teacher_modelfield has defaultNonebut type shows as`str`instead of`str | None`. This stems from_format_typein the generator not preserving union types.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/ARGUMENTS.md` at line 11, The rendered arg type drops Optional/Union (so `--teacher_model` shows `str` instead of `str | None`) because _format_type in the generator flattens or ignores union/Optional metadata; update _format_type to detect typing.Union/typing.Optional (use typing.get_origin and typing.get_args) and reconstruct the type string by joining inner argument type names with " | " (ensuring NoneType is rendered as "None"), then return that composed union string; locate the generator function named _format_type and modify its branch that handles non-primitive/annotated types to handle get_origin==Union and format get_args accordingly.modelopt/torch/opt/plugins/transformers.py (2)
73-85:⚠️ Potential issue | 🟡 MinorMissing bounds check for
--configargument.If
--configis passed as the last argument without a path,args[idx + 1]will raise anIndexError. Add validation before accessing the next element.🛡️ Proposed fix
# --config <yaml_file>: load YAML as defaults, CLI args override if "--config" in args: idx = args.index("--config") + if idx + 1 >= len(args): + raise ValueError("--config requires a path argument") config_path = args[idx + 1] args = args[:idx] + args[idx + 2 :] # strip --config <path> from argv🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 73 - 85, The code accesses args[idx + 1] without checking bounds when parsing the "--config" flag; add a guard after finding idx to ensure idx + 1 < len(args) and if not raise a clear exception (e.g., ValueError("Missing path after --config")) or call argparse/error handling, so you don't get IndexError. Update the block that computes config_path and strips args (references: args, idx, config_path, yaml.safe_load, self.set_defaults, and super().parse_args_into_dataclasses) to validate the presence of the path before opening the file and only proceed to open/configure if the check passes.
116-122:⚠️ Potential issue | 🟡 MinorDocumentation claims "overridden" but filter excludes overridden fields.
The docstring says "additional/overridden arguments" but the filter at line 117 excludes any field whose name exists in the base
TrainingArguments. Fields that override defaults won't appear in generated docs.Either update the text to say "Only additional arguments" or adjust the logic to include fields with changed defaults.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 116 - 122, The doc claims "additional/overridden arguments" but the current filter (own_fields = [f for f in dataclasses.fields(dc) if f.name not in hf_training_fields]) drops any field that exists on HuggingFace TrainingArguments, so overridden fields are omitted; fix by either (A) updating the text to "Only additional arguments are shown below" or (B) include overridden fields by changing the filter to keep fields whose names are in hf_training_fields only when their defaults differ from the base TrainingArguments defaults (e.g., build a map of base_fields = {f.name: f for f in dataclasses.fields(transformers.TrainingArguments)} and include f if f.name not in hf_training_fields OR base_fields.get(f.name) has a different default/default_factory/value than f), and then update lines and the comment accordingly (refer to is_hf_subclass, dataclasses.fields(dc), hf_training_fields, and the lines list).examples/llm_qat/dataset_utils.py (1)
224-231:⚠️ Potential issue | 🟡 MinorBroad exception handling may mask underlying issues.
Catching all
Exceptiontypes could hide unexpected errors likeMemoryErroror tokenizer misconfiguration. Consider narrowing to expected tokenization failures.♻️ Suggested refinement
- except Exception as e: + except (ValueError, TypeError, KeyError) as e: logger.warning("Failed to tokenize sample: %s. Skipping.", e)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/dataset_utils.py` around lines 224 - 231, The current blanket except Exception in the tokenization fallback (around tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too broad; restrict it to expected tokenization errors (e.g., ValueError, TypeError, UnicodeEncodeError and any tokenizer-specific error class from your tokenizer library such as TokenizerError) so you don't swallow system-level errors like MemoryError, and keep the same fallback return. Also include the exception class/name in the logger message to aid debugging (e.g., log type(e) and message) while leaving the pad_id/labels fallback logic unchanged.
🧹 Nitpick comments (5)
examples/llm_qat/train.py (1)
134-141: Consider raisingValueErrorinstead of bareassertfor better error handling.The
assertwill be stripped in optimized mode (python -O). For user-facing validation, prefer an explicit check withValueError.♻️ Proposed fix
if training_args.distill: - assert model_args.teacher_model is not None, "Teacher model is required for distillation." + if model_args.teacher_model is None: + raise ValueError("--teacher_model is required when --distill is enabled.") teacher_model = transformers.AutoModelForCausalLM.from_pretrained(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/train.py` around lines 134 - 141, Replace the bare assert that checks for the teacher model when training_args.distill is true with an explicit runtime check that raises ValueError if model_args.teacher_model is None; update the logic around training_args.distill and model_args.teacher_model (before calling transformers.AutoModelForCausalLM.from_pretrained) to raise ValueError("Teacher model is required for distillation.") so the error is always raised in user-facing validation and not stripped in optimized mode, then proceed to load teacher_model as before.examples/llm_qat/dataset_utils.py (1)
150-156: Specify explicit encoding when opening YAML config file.Use
encoding="utf-8"for consistent cross-platform behavior.♻️ Proposed fix
def load_blend_config(config_path: str) -> BlendConfig: """Parse a dataset blend YAML file into a :class:`BlendConfig`.""" - with open(config_path) as f: + with open(config_path, encoding="utf-8") as f: raw = yaml.safe_load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_qat/dataset_utils.py` around lines 150 - 156, The function load_blend_config currently opens the YAML file without an explicit encoding which can cause cross-platform inconsistencies; update the open call in load_blend_config to specify encoding="utf-8" when opening config_path so yaml.safe_load reads a UTF-8 stream consistently (keep the rest of the logic that constructs DatasetSourceConfig and returns BlendConfig unchanged).modelopt/torch/opt/plugins/transformers.py (2)
80-81: Specify explicit encoding when opening files.Use
encoding="utf-8"for consistent cross-platform behavior when reading YAML config files.♻️ Proposed fix
- with open(config_path) as f: + with open(config_path, encoding="utf-8") as f: config = yaml.safe_load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 81, The file opens the YAML config with open(config_path) which can be platform-dependent; update the open call that reads the config (the with open(...) as f: block where config = yaml.safe_load(f) is executed) to explicitly specify encoding="utf-8" so the YAML is read consistently across platforms, i.e. change the with open usage to include encoding="utf-8".
149-149: Specify encoding when writing generated docs.For consistency with the read path suggested above, use explicit UTF-8 encoding when writing the generated markdown file.
♻️ Proposed fix
- Path(output_path).write_text("\n".join(lines) + "\n") + Path(output_path).write_text("\n".join(lines) + "\n", encoding="utf-8")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` at line 149, The write to disk uses Path(output_path).write_text("\n".join(lines) + "\n") without an explicit encoding; change the call to pass encoding="utf-8" so the generated markdown is written using UTF-8 (i.e. update the Path(...).write_text invocation that writes the joined lines from the lines variable for output_path to include encoding="utf-8").tests/examples/llm_qat/test_llm_qat.py (1)
43-68:--fsdp_transformer_layer_cls_to_wrapmay be ignored for non-FSDP backends.This argument is FSDP-specific but is passed to all backends including
ddpanddeepspeed. While HfArgumentParser typically ignores unknown/irrelevant args, consider conditionally adding it only for FSDP backends for clarity.♻️ Proposed fix
def _run_train(extra_cmd_args: list[str], backend: str = "fsdp2"): config_file = BACKEND_CONFIGS[backend] gradient_args = ( ["--gradient_checkpointing", "True"] if backend in GRADIENT_CHECKPOINTING_BACKENDS else [] ) + fsdp_args = ( + ["--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer"] + if backend.startswith("fsdp") + else [] + ) run_example_command( [ "accelerate", "launch", "--config-file", config_file, "train.py", "--dataset_config", "configs/dataset/blend_test.yaml", - "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer", "--num_train_epochs", "0.3", "--learning_rate", "1e-5", "--save_steps", "5", "--eval_steps", "5", "--train_samples", "256", "--eval_samples", "64", + *fsdp_args, *gradient_args, *extra_cmd_args, ],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/llm_qat/test_llm_qat.py` around lines 43 - 68, The test helper _run_train currently always passes the FSDP-only flag "--fsdp_transformer_layer_cls_to_wrap" to run_example_command, which can be confusing for non-FSDP backends; change _run_train to append that flag only when the selected backend is an FSDP backend (e.g., check backend membership in an FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if it denotes FSDP backends) so that run_example_command is invoked without FSDP-specific args for ddp/deepspeed backends; update references in _run_train where the argument list is built (alongside BACKEND_CONFIGS, GRADIENT_CHECKPOINTING_BACKENDS, and run_example_command) to conditionally include the "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: The generated docs show "right padded" unhyphenated; update the
source help text for the model_max_length argument in
examples/llm_qat/arguments.py (the metadata for the model_max_length
argument/flag) to use "right-padded" instead of "right padded" so the
auto-generated ARGUMENTS.md will render the hyphenated form; edit the
help/message string in the model_max_length field definition where its
help/description is set.
- Line 11: The rendered arg type drops Optional/Union (so `--teacher_model`
shows `str` instead of `str | None`) because _format_type in the generator
flattens or ignores union/Optional metadata; update _format_type to detect
typing.Union/typing.Optional (use typing.get_origin and typing.get_args) and
reconstruct the type string by joining inner argument type names with " | "
(ensuring NoneType is rendered as "None"), then return that composed union
string; locate the generator function named _format_type and modify its branch
that handles non-primitive/annotated types to handle get_origin==Union and
format get_args accordingly.
In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current blanket except Exception in the tokenization
fallback (around tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too
broad; restrict it to expected tokenization errors (e.g., ValueError, TypeError,
UnicodeEncodeError and any tokenizer-specific error class from your tokenizer
library such as TokenizerError) so you don't swallow system-level errors like
MemoryError, and keep the same fallback return. Also include the exception
class/name in the logger message to aid debugging (e.g., log type(e) and
message) while leaving the pad_id/labels fallback logic unchanged.
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The code accesses args[idx + 1] without checking bounds
when parsing the "--config" flag; add a guard after finding idx to ensure idx +
1 < len(args) and if not raise a clear exception (e.g., ValueError("Missing path
after --config")) or call argparse/error handling, so you don't get IndexError.
Update the block that computes config_path and strips args (references: args,
idx, config_path, yaml.safe_load, self.set_defaults, and
super().parse_args_into_dataclasses) to validate the presence of the path before
opening the file and only proceed to open/configure if the check passes.
- Around line 116-122: The doc claims "additional/overridden arguments" but the
current filter (own_fields = [f for f in dataclasses.fields(dc) if f.name not in
hf_training_fields]) drops any field that exists on HuggingFace
TrainingArguments, so overridden fields are omitted; fix by either (A) updating
the text to "Only additional arguments are shown below" or (B) include
overridden fields by changing the filter to keep fields whose names are in
hf_training_fields only when their defaults differ from the base
TrainingArguments defaults (e.g., build a map of base_fields = {f.name: f for f
in dataclasses.fields(transformers.TrainingArguments)} and include f if f.name
not in hf_training_fields OR base_fields.get(f.name) has a different
default/default_factory/value than f), and then update lines and the comment
accordingly (refer to is_hf_subclass, dataclasses.fields(dc),
hf_training_fields, and the lines list).
---
Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 150-156: The function load_blend_config currently opens the YAML
file without an explicit encoding which can cause cross-platform
inconsistencies; update the open call in load_blend_config to specify
encoding="utf-8" when opening config_path so yaml.safe_load reads a UTF-8 stream
consistently (keep the rest of the logic that constructs DatasetSourceConfig and
returns BlendConfig unchanged).
In `@examples/llm_qat/train.py`:
- Around line 134-141: Replace the bare assert that checks for the teacher model
when training_args.distill is true with an explicit runtime check that raises
ValueError if model_args.teacher_model is None; update the logic around
training_args.distill and model_args.teacher_model (before calling
transformers.AutoModelForCausalLM.from_pretrained) to raise ValueError("Teacher
model is required for distillation.") so the error is always raised in
user-facing validation and not stripped in optimized mode, then proceed to load
teacher_model as before.
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-81: The file opens the YAML config with open(config_path) which
can be platform-dependent; update the open call that reads the config (the with
open(...) as f: block where config = yaml.safe_load(f) is executed) to
explicitly specify encoding="utf-8" so the YAML is read consistently across
platforms, i.e. change the with open usage to include encoding="utf-8".
- Line 149: The write to disk uses Path(output_path).write_text("\n".join(lines)
+ "\n") without an explicit encoding; change the call to pass encoding="utf-8"
so the generated markdown is written using UTF-8 (i.e. update the
Path(...).write_text invocation that writes the joined lines from the lines
variable for output_path to include encoding="utf-8").
In `@tests/examples/llm_qat/test_llm_qat.py`:
- Around line 43-68: The test helper _run_train currently always passes the
FSDP-only flag "--fsdp_transformer_layer_cls_to_wrap" to run_example_command,
which can be confusing for non-FSDP backends; change _run_train to append that
flag only when the selected backend is an FSDP backend (e.g., check backend
membership in an FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if
it denotes FSDP backends) so that run_example_command is invoked without
FSDP-specific args for ddp/deepspeed backends; update references in _run_train
where the argument list is built (alongside BACKEND_CONFIGS,
GRADIENT_CHECKPOINTING_BACKENDS, and run_example_command) to conditionally
include the "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3c9bac8a-7059-4aa8-aa2f-adab7aefd0d7
📒 Files selected for processing (34)
.pre-commit-config.yamlexamples/llm_qad/README.mdexamples/llm_qat/ARGUMENTS.mdexamples/llm_qat/README.mdexamples/llm_qat/accelerate_config/fsdp1.yamlexamples/llm_qat/arguments.pyexamples/llm_qat/configs/accelerate/ddp.yamlexamples/llm_qat/configs/accelerate/deepspeed.yamlexamples/llm_qat/configs/accelerate/fsdp2.yamlexamples/llm_qat/configs/dataset/README.mdexamples/llm_qat/configs/dataset/blend.yamlexamples/llm_qat/configs/dataset/blend_example.yamlexamples/llm_qat/configs/dataset/blend_test.yamlexamples/llm_qat/configs/train/finetune.yamlexamples/llm_qat/configs/train/ptq_eval.yamlexamples/llm_qat/configs/train/qad_nvfp4.yamlexamples/llm_qat/configs/train/qat_nvfp4.yamlexamples/llm_qat/dataset_utils.pyexamples/llm_qat/launch.shexamples/llm_qat/llama_factory/README.mdexamples/llm_qat/llama_factory/launch_llamafactory.shexamples/llm_qat/llama_factory/llama_config.yamlexamples/llm_qat/llama_factory/llama_factory.pyexamples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynbexamples/llm_qat/quantize.pyexamples/llm_qat/simple_qat_train.pyexamples/llm_qat/train.pyexamples/llm_qat/utils.pyexamples/vllm_serve/README.mdmodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pymodelopt_recipes/general/ptq/int4_blockwise_weight_only.ymltests/examples/llm_qat/test_llm_qat.pytests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
- examples/llm_qat/accelerate_config/fsdp1.yaml
- examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (15)
- examples/llm_qat/configs/train/ptq_eval.yaml
- examples/vllm_serve/README.md
- examples/llm_qad/README.md
- examples/llm_qat/llama_factory/llama_config.yaml
- examples/llm_qat/configs/dataset/blend_test.yaml
- examples/llm_qat/configs/train/finetune.yaml
- modelopt_recipes/general/ptq/int4_blockwise_weight_only.yml
- examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
- examples/llm_qat/configs/train/qad_nvfp4.yaml
- examples/llm_qat/llama_factory/launch_llamafactory.sh
- examples/llm_qat/configs/dataset/blend.yaml
- tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
- examples/llm_qat/configs/dataset/blend_example.yaml
- examples/llm_qat/configs/dataset/README.md
- examples/llm_qat/arguments.py
🚧 Files skipped from review as they are similar to previous changes (7)
- examples/llm_qat/configs/train/qat_nvfp4.yaml
- examples/llm_qat/llama_factory/README.md
- examples/llm_qat/simple_qat_train.py
- examples/llm_qat/llama_factory/llama_factory.py
- .pre-commit-config.yaml
- modelopt/torch/quantization/plugins/transformers_trainer.py
- examples/llm_qat/quantize.py
bfc343c to
b3d46c7
Compare
|
/claude review |
There was a problem hiding this comment.
ModelOpt-focused review summary
Reviewed structural and ModelOpt-specific concerns (mode/state composability, export/recipe wiring, backward-compat) — leaving style/security pattern-matching to CodeRabbit. Most prior critical findings from earlier rounds (recipe.ptq_cfg → recipe.quantize, simple_qat_train.py import, --config IndexError, ChatML boundary fix in the fallback) appear resolved.
Remaining items found this round:
IMPORTANT
examples/llm_qat/arguments.py:31—model_name_or_pathdefault is stillmeta-llama/Llama-2-7b-hf, but the new README leads end-to-end commands withQwen/Qwen3-8B. Users who runquantize.pywithout--model_name_or_pathwill hit a gated-repo error rather than the documented Qwen flow. Either align the default with the README or make the field required.
SUGGESTIONS
modelopt/torch/opt/plugins/transformers.py:227-236—_all_modelopt_fields()collects fields from every importedModelOptHFArgumentssubclass, so a typo'd YAML key that collides with a field of an unrelated subclass loaded elsewhere in the process (e.g.quant_cfg,criterion) is silently accepted and dropped. Scoping toself.dataclass_typeswould tighten validation without losing the cross-subclass typo-detection intent.modelopt/torch/distill/plugins/huggingface.py:52-65—to_distill_kwargsvalidatescriterionagainst a hardcoded set then unconditionally constructsLMLogitsLoss(). As more criteria are added, validation and construction will drift apart. A small{name: cls}dispatch keeps them in sync.examples/llm_qat/dataset_utils.py:217-236— The ChatML fallback mask excludes<|im_end|>from labels, while native HF{% generation %}masks for Qwen-family templates typically include it. The same dataset can produce different labels across tokenizer paths. Worth aligning, or adding a comment explaining the intentional divergence.
No CRITICAL findings, but with one IMPORTANT outstanding I'm leaving this as a comment review rather than approving. The structural refactor (YAML configs + ModelOptArgParser + recipe-driven quantization) looks solid; the items above are mostly about defaults and consistency.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Large refactor (42 files, +2895/-1028) that has been through several review rounds. Most prior critical items (recipe.ptq_cfg→quantize, broken import, bare asserts→ValueError, ChatML boundary token, --config IndexError, broad except narrowing, FSDP1 removal docs) appear resolved. Recommending human sign-off on the following before approval:
- 💬 Author replied previously: "train.py now warns if --recipe is passed" — current
arguments.pyactually dropsQuantizationArgumentsfromTRAINING_ARG_TYPESentirely, so--recipewould raise an argparse "unrecognized argument" error rather than warn. Functionally fine (no silent discard) but the resolution differs from what was stated; a human should confirm this is the intended UX. - 💬 Claude bot flagged:
ModelArguments.model_name_or_pathdefault is stillmeta-llama/Llama-2-7b-hfwhile the new README and example commands all useQwen/Qwen3-8B— author hasn't replied to this. Users runningquantize.pywithout--model_name_or_pathwill hit a gated-repo error rather than the documented Qwen flow. - 💬 h-guo18 raised the design question:
examples/llm_qat/dataset_utils.py(829 lines) overlaps in purpose withmodelopt/torch/utils/plugins/transformers_dataset.py(LanguageDataCollator, chat-template + assistant-mask handling) andexamples/dataset/. Author acknowledged this and deferred unification as a follow-up, which is reasonable for a refactor of this size, but a human reviewer with architectural context should sign off on adding a second in-repo system for chat-template tokenization rather than extending the existing one. _all_modelopt_fields()walks every importedModelOptHFArgumentssubclass globally to validate YAML keys, so a typo that collides with a field on an unrelated subclass (e.g.criterionfromDistillArgumentswhile runningquantize.py) is silently accepted. Author defended this as intentional ("safer default"); worth a second opinion since it weakens typo detection.- New
QuantizationArguments(now anModelOptHFArgumentssubclass, with newrecipefield) is also imported byexamples/gpt-oss/sft.pyviaTrlParser— confirm that example still parses correctly with the addedrecipefield and the auto-@dataclassfrom__init_subclass__(no test exercises this path in the PR). - Outstanding test plan checkbox:
pytest tests/examples/llm_qat/test_llm_qat.pyfull GPU/backend suite is unchecked in the PR description.
Re: #1172 (review) Thanks for the consolidated pass. Status on the flagged items: (1) model_name_or_path default now aligned to Qwen/Qwen3-8B to match the README. (2) --recipe: confirmed train.py drops QuantizationArguments from TRAINING_ARG_TYPES so passing --recipe there raises an argparse 'unrecognized argument' rather than warning — intentional UX; --recipe belongs to quantize.py/simple_qat_train.py. (3) dataset_utils.py vs transformers_dataset.py overlap: acknowledged, unification tracked as a follow-up for this [1/N] series. (4) _all_modelopt_fields global walk: intentional for cross-entrypoint config reuse (replied on the inline thread). (5) gpt-oss/sft.py: the new recipe field has default=None and parses cleanly under TrlParser(return_remaining_strings=True); will add a smoke test. (6) full-GPU test_llm_qat suite: will run and check the box before merge. |
|
/claude review |
There was a problem hiding this comment.
ModelOpt-focused review summary
Re-reviewed structural and ModelOpt-specific concerns (mode/state composability, public-API surface, recipe wiring, backward-compat) — leaving style and security pattern-matching to CodeRabbit. Spot-checked previously raised items and they look resolved:
recipe.quantize(notrecipe.ptq_cfg) is consistent acrosstransformers_trainer.py,quantize.py, andsimple_qat_train.py.simple_qat_train.pynow importsbuild_blend_dataset/load_blend_configfromdataset_utils.py.model_name_or_pathdefault is nowQwen/Qwen3-8B, aligned with README.- ChatML mask no longer includes the trailing
<|im_end|>; the divergence from native generation-tag masking is now called out in a comment. train.pydropsQuantizationArgumentsfromTRAINING_ARG_TYPES, so passing--recipetotrain.pyraises an argparse error rather than silently being dropped (intended UX per author).- New CPU smoke test exercises
gpt-oss/sft.pywith the mergedQuantizationArguments(incl. the newrecipefield).
Remaining items found this round:
IMPORTANT
modelopt/torch/quantization/plugins/transformers_trainer.py—QuantizationArgumentsWithConfig(previously a public class re-exported viafrom .transformers_trainer import *inplugins/__init__.py) is silently removed. Functionally the mergedQuantizationArguments.quant_cfg: str | QuantizeConfig | Nonecovers the same case, but external imports of the old name will now hitImportError, and the CHANGELOG entry only describes the example-side refactor. Recommend either keeping a one-line alias as a deprecation shim or calling out the rename in CHANGELOG so existing users aren't surprised.
SUGGESTIONS
-
examples/llm_qat/dataset_utils.py:_chatml_assistant_mask—skip_newlineonly clears on the next<|im_start|>or on a literal newline immediately after role. If an assistant turn's role isn't followed by\n(non-standard template),skip_newlinecarries through the whole turn and the first\ninside the real assistant content gets dropped from labels. Standard ChatML always emits the role-trailing newline so this doesn't bite in practice, but the assumption isn't documented and the comment at line 211 only mentions the trailing<|im_end|>divergence. A one-line fix (consumeskip_newlineon the first post-role token regardless) closes the edge case. -
examples/llm_qat/quantize.pydocstring usesmeta-llama/Meta-Llama-3-8Bwhile the rest of the example (README,arguments.pydefault, train configs) leads withQwen/Qwen3-8B. Minor consistency.
No CRITICAL findings. With one outstanding IMPORTANT (compat shim/changelog for QuantizationArgumentsWithConfig), I'm leaving a non-blocking comment review rather than approve.
1ecf721 to
f6c8ec9
Compare
Refactor the examples/llm_qat example to use YAML configs driven by a shared ModelOptArgParser (HfArgumentParser wrapper), introduce ModelOptHFArguments dataclasses, a reusable QuantizationArguments dataclass, and refactored dataset/tokenization utilities. - Align ModelArguments.model_name_or_path default with the README (Qwen/Qwen3-8B) and document the intentional ChatML assistant-mask exclusion of trailing <|im_end|> and the post-header newline. - Add a CPU-only tiny gpt-oss SFT e2e test (test_gpt_oss_sft_toy) plus a tiny_gpt_oss_path fixture, validating that sft.py's TrlParser parses the shared QuantizationArguments incl. the new recipe field; move the release marker onto the heavy GPU pipeline test so the toy test runs in regular CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>
…tionArgumentsWithConfig alias - quantize.py docstring usage example: --model_name_or_path Llama-3-8B -> Qwen/Qwen3-8B and --output_dir -> qwen3-8b-quantized, matching the README and config templates. - Re-add backwards-compat alias QuantizationArgumentsWithConfig = QuantizationArguments so external imports of the pre-refactor public name keep working; document the deprecation in CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>
shengliangxu
left a comment
There was a problem hiding this comment.
Overall changes are scoped to the example. LGTM.
In _chatml_assistant_mask, consume skip_newline on the first post-role token regardless of whether it is a newline, so a malformed template whose role header is not immediately followed by a newline can no longer drop a later content newline from the labels. Behavior is unchanged for standard ChatML (role always followed by a newline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>
Add a 'Arguments by Script' table to the generated ARGUMENTS.md via a new docs_header_extra hook on ModelOptArgParser, clarifying which argument groups each of quantize.py/train.py reads and that mismatched-group keys are silently ignored. Link ARGUMENTS.md from the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: realAsma <akuriparambi@nvidia.com>
What does this PR do?
Type of change: new example
Refactors
examples/llm_qatfrom a monolithic launch/script flow into a modular, config-driven Hugging Face QAT/QAD workflow. The high-level user flow is now:Highlights:
examples/llm_qat/launch.sh+examples/llm_qat/main.pypath with separatequantize.pyandtrain.pyentrypoints.ModelOptArgParser, including--config <yaml>defaults, CLI overrides, and generatedexamples/llm_qat/ARGUMENTS.md.examples/llm_qat/configs/for training modes, dataset blends, and Accelerate backends.dataset_utils.pyfor weighted multi-source dataset blending, Hugging Face streaming, local dataset loading, distributed rank-aware loading, tokenization caching, pre-tokenization, chat templating, and assistant-token label masking.DistillArguments,QADTrainer, and teacher-model distillation kwargs.QuantizationArgumentsto prefer recipe paths via--recipe, while keeping legacy--quant_cfgavailable with deprecation warnings for in-trainer quantization.QATTrainerhandling for pre-quantized checkpoints, FSDP2 TensorQuantizer buffers, and recipe-resolved PTQ configs.general/ptq/int4_blockwise_weight_onlyrecipe and refreshes docs for NVFP4, FP8, INT4, FSDP2, DDP, DeepSpeed, QLoRA, and LLaMA-Factory integration.Usage
From the repo root:
Dataset blends can be pre-tokenized and cached before training:
cd examples/llm_qat python dataset_utils.py \ --dataset_config configs/dataset/blend.yaml \ --model_name_or_path Qwen/Qwen3-8BTesting
Focused coverage added or updated:
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.pytests/examples/llm_qat/test_dataset_tokenization.pytests/examples/llm_qat/test_assistant_mask.pytests/examples/llm_qat/test_llm_qat.pyRecorded validation:
pytest tests/unit/torch/opt/plugins/test_modelopt_arg_parser.pypytest tests/examples/llm_qat/test_llm_qat.py::test_dataset_utils_pretokenizepre-commit run --all-filespytest tests/examples/llm_qat/test_llm_qat.pyfull GPU/backend suiteBefore your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).examples/llm_qatCLI/file layout (main.py,launch.sh, and FSDP1 config are removed); libraryquant_cfgusage remains available but is deprecated for this workflow.CONTRIBUTING.md: Yes.train.py/simple_qat_train.pyretain the upstream Alpaca attribution where applicable;examples/llm_qat/requirements.txtswitchestensorboardXtotensorboard.Additional Information
This PR intentionally changes the
llm_qatexample surface. Existing users should move fromexamples/llm_qat/main.pyandexamples/llm_qat/launch.shtoexamples/llm_qat/quantize.py,examples/llm_qat/train.py, and the YAML configs underexamples/llm_qat/configs/.