Add Alpamayo-1 example#1594
Conversation
📝 WalkthroughWalkthroughThis PR adds an Alpamayo-R1 quantization example: a README plus a new CLI script that implements PTQ (FP8/NVFP4), AutoQuantize per-layer search with a flow-matching MSE calibration, calibration data loading/loop, a teacher-forced forward patch, HF checkpointing integration, and changelog entries. ChangesAlpamayo-R1 Quantization Example
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)
655-681: ⚡ Quick winGuard checkpoint writes to rank 0 for distributed-safe execution.
Current save/write operations are unconditional; in multi-rank runs, each rank may write the same files concurrently.
🛡️ Proposed fix
+ is_rank0 = ( + not torch.distributed.is_available() + or not torch.distributed.is_initialized() + or torch.distributed.get_rank() == 0 + ) @@ - os.makedirs(args.output_dir, exist_ok=True) - print(f"Saving quantized checkpoint to {args.output_dir!r} ...") + if is_rank0: + os.makedirs(args.output_dir, exist_ok=True) + print(f"Saving quantized checkpoint to {args.output_dir!r} ...") @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) - - quant_cfg = get_quant_config(model) - with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: - json.dump(quant_cfg, f) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) + quant_cfg = get_quant_config(model) + with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: + json.dump(quant_cfg, f) + + if torch.distributed.is_available() and torch.distributed.is_initialized(): + torch.distributed.barrier()As per coding guidelines "Develop with distributed processing in mind ... guarding shared side effects such as file writes ... against race conditions between ranks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 655 - 681, The saves under both the real_quant and else branches (mtq.compress + model.save_pretrained, processor.save_pretrained, model.config.save_pretrained, and writing hf_quant_config.json from get_quant_config) must be guarded so only the main process/rank 0 performs file writes in distributed runs; detect distributed mode via torch.distributed.is_initialized() and torch.distributed.get_rank() (or the project’s existing is_main_process helper) and wrap all save_pretrained calls and the json file write in a conditional that only runs on rank 0 to avoid concurrent writes to args.output_dir.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 346-355: The function read_clip_ids_from_parquet currently indexes
cols_lower["key"] which can KeyError; update it to search cols_lower for common
column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the
first match if present, otherwise fall back to using df.index.astype(str). After
obtaining the list, convert to strings and deduplicate while preserving first
occurrence order (use a seen set/ordered iteration); reference
read_clip_ids_from_parquet and the cols_lower variable to locate the change.
- Around line 27-55: Add an explicit __all__ list at module top to define the
public API; create a single-line assignment __all__ = [...] after the imports
that enumerates the module's public helpers and re-exports (e.g.,
"load_physical_aiavdataset", "AlpamayoR1", "to_special_token",
"create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint",
"get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the
actual names this module exposes) and keep it updated when symbols change so
star-import behavior is explicit.
- Around line 500-512: The prints in the hot loop (inside forward_step's
tensor-stat print and loss_func) call .item() on tensors every iteration,
causing unwanted CUDA synchronizations; wrap these debug prints so they only run
when args.debug (and optionally only on the main rank) is true, and compute
.item() values only inside that conditional; update the debug logging in
forward_step (the v_pred/v_target print) and in loss_func (loss print) to be
gated by args.debug and a rank check (e.g., only on rank 0 or when not
distributed) so normal runs avoid per-batch .item() calls and GPU syncs.
---
Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Around line 655-681: The saves under both the real_quant and else branches
(mtq.compress + model.save_pretrained, processor.save_pretrained,
model.config.save_pretrained, and writing hf_quant_config.json from
get_quant_config) must be guarded so only the main process/rank 0 performs file
writes in distributed runs; detect distributed mode via
torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 81978cc3-c66c-4b24-8be8-fe303abf7d0d
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (2)
examples/alpamayo/README.mdexamples/alpamayo/quantize.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1594 +/- ##
==========================================
- Coverage 77.41% 75.45% -1.96%
==========================================
Files 480 480
Lines 52499 52552 +53
==========================================
- Hits 40642 39654 -988
- Misses 11857 12898 +1041
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
New ~700-line example. A few real concerns:
-
Likely bug — backslash-escaped special tokens. In
create_message, the f-strings use<\|traj_history_start\|>,<\|traj_history\|>,<\|traj_history_end\|>,<\|cot_start\|>. In Python\|is not a recognized escape sequence, so the literal string contains a backslash before each pipe. Qwen-family special tokens are<|...|>without backslashes — passing the backslash form toapply_chat_templatewill not match the tokenizer's added special tokens and these will tokenize as ordinary text, silently corrupting calibration. Looks like a copy-paste from a markdown-escaped source. (Bonus: these will start emittingSyntaxWarning: invalid escape sequenceon newer Python.) -
Reaches into private API instead of using the public one.
enable_huggingface_checkpointing_patch()re-implementsmto.enable_huggingface_checkpointing()by importing the private_LIBRARY_CLASSES_FOR_PATCHING,_PATCHED_CLASSES, andpatch_pretrained_methodsfrommodelopt.torch.opt.plugins.huggingface, just to skip the_from_configpatch. If the only goal is to skip_from_config, please make that an option on the public API (or document why this divergence is needed) instead of duplicating the loop with private symbols. Also, the file does this at import time as a top-level side effect, which is surprising. -
Monkey-patching
AlpamayoR1.teacher_forced_flow_loss_forwardat module import time is acknowledged in a docstring but it's a strong coupling: the body is a verbatim copy of an internal training-fork method, and any drift innvlabs/alpamayo(e.g.action_in_projsignature,expert_non_causal_attention,action_spaceAPI) will silently produce wrong calibration. At minimum: assert that the publicAlpamayoR1exposes the methods/attrs this body relies on, and pin a known-good Alpamayo SHA in the README setup section instead ofgit cloneofmain. -
read_clip_ids_from_parquetdoesn't match its docstring. The docstring says it "tries common column names; falls back to index if needed", but the implementation hard-codescols_lower["key"]and willKeyErroron any parquet without a literalkeycolumn. Either implement the documented fallback or simplify the docstring. -
Binary parquet checked into the repo. A
.parquetcalibration set is being added toexamples/alpamayo/. The repo doesn't appear to use git-LFS (.gitattributesis missing). Even if this file is small, distributing calibration data through the example tree is unusual for this repo — most examples download from HF on demand. Please confirm the file size is small and that there are no licensing/redistribution restrictions on the AV calibration clips list, or fetch it at runtime instead. -
No tests / no CHANGELOG entry. The PR template's test and changelog checkboxes are still unfilled. New examples in this repo (
vlm_ptq, etc.) typically don't get unit tests, but at least a CHANGELOG entry under "New Features" is expected. -
Minor:
args.parquetdefault is the bare filename, joined toscript_dir; if the user passes an absolute path it works (pathlib handles that), but a plain relative path in the user's CWD will be silently resolved against the script's directory, which is surprising. Consider documenting this or usingPath(args.parquet)as-is when absolute/exists, else fall back toscript_dir / args.parquet.
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
6cc978e to
0a9c5ca
Compare
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)
257-257: 💤 Low valueConsider moving the patch call into
main().Executing
patch_teacher_forced_flow_loss_forward()at module import is a side effect that runs even if users only import a helper function. Moving this call intomain()keeps imports side-effect-free and aligns with the principle applied to the HF checkpointing setup.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` at line 257, The top-level call to patch_teacher_forced_flow_loss_forward() causes a side effect at import; remove that call from module scope and invoke patch_teacher_forced_flow_loss_forward() from inside main() (e.g., at the start of the main() function before any training or model code runs) so imports remain side-effect-free; reference the existing patch_teacher_forced_flow_loss_forward function and ensure the patch is applied once (idempotently) before any code that relies on the patched behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/opt/plugins/huggingface.py`:
- Around line 171-176: Validate the skip_methods input at the function boundary:
compute the set of valid method names by iterating _LIBRARY_CLASSES_FOR_PATCHING
and collecting the first element of each entry in the patch_methods lists, then
if any name in skip_methods is not in that valid set raise an error (e.g.
ValueError) instead of silently ignoring it; keep the existing logic that
converts skip_methods to a set and then continue to call
patch_pretrained_methods(cls, [m for m in patch_methods if m[0] not in
skip_methods]) for each class, while referencing skip_methods,
_LIBRARY_CLASSES_FOR_PATCHING, patch_pretrained_methods, and _PATCHED_CLASSES to
locate and update the code.
---
Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Line 257: The top-level call to patch_teacher_forced_flow_loss_forward()
causes a side effect at import; remove that call from module scope and invoke
patch_teacher_forced_flow_loss_forward() from inside main() (e.g., at the start
of the main() function before any training or model code runs) so imports remain
side-effect-free; reference the existing patch_teacher_forced_flow_loss_forward
function and ensure the patch is applied once (idempotently) before any code
that relies on the patched behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eecdd970-10bf-41e4-a1ca-8f334dd1dacd
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (4)
CHANGELOG.rstexamples/alpamayo/README.mdexamples/alpamayo/quantize.pymodelopt/torch/opt/plugins/huggingface.py
✅ Files skipped from review due to trivial changes (2)
- examples/alpamayo/README.md
- CHANGELOG.rst
| skip_methods = set(skip_methods or ()) | ||
| for name, (classes, methods_list) in _LIBRARY_CLASSES_FOR_PATCHING.items(): | ||
| for cls, patch_methods in zip(classes, methods_list): | ||
| if cls in _PATCHED_CLASSES: | ||
| continue | ||
| patch_pretrained_methods(cls, patch_methods) | ||
| patch_pretrained_methods(cls, [m for m in patch_methods if m[0] not in skip_methods]) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Validate skip_methods instead of silently ignoring unknown names.
A typo here leaves the conflicting hook patched and turns this escape hatch into a silent no-op. Please reject unknown entries up front so callers fail fast.
Proposed fix
def enable_huggingface_checkpointing(skip_methods: set[str] | None = None):
@@
- skip_methods = set(skip_methods or ())
+ skip_methods = set(skip_methods or ())
+ known_methods = {
+ method_name
+ for _, (_, methods_list) in _LIBRARY_CLASSES_FOR_PATCHING.items()
+ for patch_methods in methods_list
+ for method_name, _ in patch_methods
+ }
+ unknown_methods = skip_methods - known_methods
+ if unknown_methods:
+ raise ValueError(f"Unknown skip_methods: {sorted(unknown_methods)}")
+
for name, (classes, methods_list) in _LIBRARY_CLASSES_FOR_PATCHING.items():
for cls, patch_methods in zip(classes, methods_list):
if cls in _PATCHED_CLASSES:
continue
patch_pretrained_methods(cls, [m for m in patch_methods if m[0] not in skip_methods])As per coding guidelines, "Validate external input once at the interface boundary; internal code can trust those checks and avoid redundant assertions".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/torch/opt/plugins/huggingface.py` around lines 171 - 176, Validate
the skip_methods input at the function boundary: compute the set of valid method
names by iterating _LIBRARY_CLASSES_FOR_PATCHING and collecting the first
element of each entry in the patch_methods lists, then if any name in
skip_methods is not in that valid set raise an error (e.g. ValueError) instead
of silently ignoring it; keep the existing logic that converts skip_methods to a
set and then continue to call patch_pretrained_methods(cls, [m for m in
patch_methods if m[0] not in skip_methods]) for each class, while referencing
skip_methods, _LIBRARY_CLASSES_FOR_PATCHING, patch_pretrained_methods, and
_PATCHED_CLASSES to locate and update the code.
What does this PR do?
Type of change: ? New example
Adds example for Alpamayo-1 quantization with ModelOpt (FP8, NVFP4, AutoQuant)
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Documentation
New Features