Skip to content

feat: Layerwise calibration: nested config + QDQ-from-prev-layer flag + checkpoint I/O knobs#1571

Open
Fridah-nv wants to merge 6 commits into
mainfrom
fridah/layerwise-config
Open

feat: Layerwise calibration: nested config + QDQ-from-prev-layer flag + checkpoint I/O knobs#1571
Fridah-nv wants to merge 6 commits into
mainfrom
fridah/layerwise-config

Conversation

@Fridah-nv
Copy link
Copy Markdown
Contributor

@Fridah-nv Fridah-nv commented May 29, 2026

What does this PR do?

Type of change: new feature

Groups all layerwise-calibration options under a nested LayerwiseConfig and adds three new behavior knobs to it. All changes are backward compatible.

1. Nested layerwise config

QuantizeAlgorithmConfig.layerwise changes from bool to a Pydantic submodel:

class LayerwiseConfig(ModeloptBaseConfig):
    enable: bool = False
    get_qdq_activations_from_prev_layer: bool = False
    checkpoint_dir: str | None = None
    save_every: int = 1
    save_quantizers_only: bool = False

Backward compatibility:

  • layerwise: True/False still accepted (emits DeprecationWarning).
  • Flat layerwise_checkpoint_dir silently migrated into layerwise.checkpoint_dir.
  • Legacy use_sequential alias preserved (and resolved during flat-key migration so it can't be dropped).
  • Conflicting flat+nested checkpoint_dir values raise.
  • All 7 shipped PTQ recipes (modelopt_recipes/general/ptq/*.yaml, huggingface/qwen3_5*/ptq/*.yaml) migrated to the
    canonical nested shape — no semantic change.

2. get_qdq_activations_from_prev_layer — correct GPTQ vs max-calib semantics

Controls what layer N's calibration sees:

  • True (GPTQ default): activations carry the quantize-dequantize error of layers 0..N-1 — GPTQ's Hessian-compensation
    goal.
  • False (max/mse/local_hessian default): full-precision activations, matching the non-layerwise pass exactly.

The False branch wraps the next-layer input capture forward with the existing set_quantizer_by_cfg_context deny-all idiom
({"quantizer_name": "*", "enable": False}).

GPTQ's per-algorithm default is enforced via @model_validator(mode="after") that reads LayerwiseConfig.model_fields_set
works for every input shape (empty constructor, bool, partial dict, full dict) and lets explicit user values override.

3. save_every — gate the large activation-cache writes

save_every: int = 1 (ge=1). With N > 1, the per-layer next_inputs.pt (cached activation tensors, the largest checkpoint
artifact for most models) is only written for the boundary layer of each N-layer window. Per-layer
weight/quantizer/output_meta files are still written every layer (resume needs them to replay skip layers correctly).
Interrupting mid-window re-calibrates that window on resume.

4. save_quantizers_only — algorithm-aware weight-blob skipping

save_quantizers_only: bool = False. When True, skip weights.pt entirely and persist just the per-quantizer state_dict
slice (carries _amax) to a new quantizer_buffers.pt. On resume, full_restore reloads only the quantizer slice and
trusts that algorithm semantics didn't mutate layer.weight.

Safety is enforced by a whitelist: _supports_save_quantizers_only: ClassVar[bool] = False on QuantizeAlgorithmConfig,
overridden to True only on MaxCalibConfig, MseCalibConfig, LocalHessianCalibConfig (audited — these only touch
_amax). Weight-mutating algorithms (GPTQ folds Hessian updates, AWQ/SmoothQuant fold pre-quant scales) reject the flag at
config-construction time so in-place weight updates can't be silently lost on resume.

Usage

import modelopt.torch.quantization as mtq

# GPTQ — `get_qdq_activations_from_prev_layer` defaults to True (Hessian semantics).
# save_every reduces activation-cache I/O.
mtq.quantize(
    model,
    {
        "quant_cfg": [...],
        "algorithm": {
            "method": "gptq",
            "layerwise": {
                "enable": True,
                "checkpoint_dir": "/path/to/ckpts",
                "save_every": 4,
            },
        },
    },
    forward_loop=forward_loop,
)

# Max-calibration — `get_qdq_activations_from_prev_layer` defaults to False (FP from prior layers).
# save_quantizers_only skips the weights blob since max only updates _amax.
mtq.quantize(
    model,
    {
        "quant_cfg": [...],
        "algorithm": {
            "method": "max",
            "layerwise": {
                "enable": True,
                "checkpoint_dir": "/path/to/ckpts",
                "save_quantizers_only": True,
            },
        },
    },
    forward_loop=forward_loop,
)

Testing

New / updated unit tests in tests/unit/torch/quantization/:

  • test_config_validation.pyTestLayerwiseNestedConfig covers nested-form acceptance, bool-form
    DeprecationWarning, flat layerwise_checkpoint_dir migration, conflicting flat+nested checkpoint_dir, use_sequential
    alias survival under migration, per-algorithm qdq defaults (parametrized Max/GPTQ), save_every ge=1 validation, and the
    save_quantizers_only whitelist — parametrized rejection on [GPTQ, AWQLite, SmoothQuant] and acceptance on [Max, Mse, LocalHessian].

  • test_layerwise_calibrate.py

    • test_layerwise_no_qdq_matches_sequential_amax — behavioral equivalence: layerwise + qdq=False produces the same
      per-quantizer _amax as the non-layerwise (sequential) max-calibration flow (verified via torch.testing.assert_close).
    • test_layerwise_save_every_writes_next_inputs_only_at_window_boundaries — window-save layout (all layer dirs present,
      next_inputs.pt only at boundaries).
    • test_layerwise_save_quantizers_only_resume_matches_one_shot_amax — end-to-end resume: full run → manifest rewound →
      fresh model resumes → final _amax matches the one-shot baseline; also pins the on-disk shape (no weights.pt,
      quantizer_buffers.pt present).

End-to-end correctness verification

Ran 4 PTQ jobs on Qwen3-8B with NVFP4 W4A16 quant_cfg and --calib_size 16, one GPU
each on 4 GPUs

Run Algorithm Layerwise config Purpose
A GPTQ enable=true, get_qdq_activations_from_prev_layer=true, save_every=5 New nested form + the newsave_every knob
B MSE enable=true, get_qdq_activations_from_prev_layer=false, save_quantizers_only=true New nested form + the new save_quantizers_only knob
C GPTQ Legacy flat form: layerwise: true, layerwise_checkpoint_dir: ... Backward-compat baseline for A (exercises the migration validator)
D MSE enable=false Non-layerwise baseline for B

Across two pairwise comparisons (A vs C ; B vs D), all 905 tensors in the exported HF checkpoints are bit-identical with hf_quant_config.json matching exactly — confirming the new layerwise knobs preserve correctness and the flat-form backward-compat path is intact.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines
and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best
Practices
(e.g.
avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ — layerwise: True/False still accepted with a DeprecationWarning; flat
    layerwise_checkpoint_dir silently migrated; use_sequential alias preserved. The two new knobs default to no-op behavior
    (save_every=1, save_quantizers_only=False).
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
    — no new dependencies.
  • Did you write any new necessary tests?: ✅ — see Testing section.
  • Did you update Changelog?: ✅
    ready for review.
  • Did you get Claude approval on this PR?: ✅ — /claude review consulted iteratively; review findings (GPTQ default
    survival, save_quantizers_only whitelist scope, docstring accuracy, dead layer param) addressed in-PR.

Additional Information

Notes on design choices that came out of internal review:

  • GPTQ's qdq=True default uses a model_validator(mode="after") + model_fields_set check rather than a default_factory
    — a default_factory is only fired when the field is absent, so any user-supplied dict (the natural way to enable
    layerwise) would silently lose the GPTQ default.
  • save_quantizers_only is enforced as a whitelist (_supports_save_quantizers_only) rather than a per-algorithm blacklist,
    which keeps future weight-mutating algorithms safe by default.
  • set_quantizer_by_cfg_context is reused for the qdq-disable scope instead of a bespoke helper, keeping model_calib.py
    aligned with the existing "deny-all" idiom documented at conversion.py:240.
  • Pre-validation recipe helpers in examples/llm_ptq/example_utils.py (needs_checkpoint_path_update /
    resolve_checkpoint_dir) accept both flat and nested shapes since they run before Pydantic validation;
    resolve_checkpoint_dir now also returns the resolved path so the caller doesn't re-derive it.

Summary by CodeRabbit

  • New Features
    • Layerwise calibration now uses a nested config with options: enable, get_qdq_activations_from_prev_layer, checkpoint_dir, save_every, and save_quantizers_only; checkpointing supports quantizer-only saves and windowed commits.
  • Deprecations
    • Legacy flat-style layerwise keys remain supported with deprecation warnings and automatic migration to the nested form.
  • Examples/UX
    • Tools auto-resolve legacy vs nested checkpoint locations and report the resolved path.
  • Tests
    • Expanded coverage for nested configs, checkpoint/resume semantics, windowed saves, and validation.

Fridah-nv added 3 commits May 29, 2026 21:50
…layer

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv requested review from a team as code owners May 29, 2026 23:02
@Fridah-nv Fridah-nv requested a review from cjluo-nv May 29, 2026 23:02
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR restructures layerwise quantization calibration config from flat boolean and separate checkpoint-dir into a nested LayerwiseConfig with options enable, get_qdq_activations_from_prev_layer, checkpoint_dir, save_every, and save_quantizers_only. It adds migration/validation, wires the options through calibration and checkpointing, updates examples/recipes, and expands tests.

Changes

Layerwise Configuration Restructuring

Layer / File(s) Summary
LayerwiseConfig schema, validation, and legacy migration
modelopt/torch/quantization/config.py, tests/unit/torch/quantization/test_config_validation.py
New LayerwiseConfig model replaces flat layerwise: bool + layerwise_checkpoint_dir with nested enable/checkpoint_dir/save_every/save_quantizers_only/get_qdq_activations_from_prev_layer. _coerce_layerwise_input normalizes legacy boolean/alias inputs (emits DeprecationWarning) and migrates flat checkpoint_dir into nested form. Per-algorithm _supports_save_quantizers_only flags control quantizer-only checkpoint support; GPTQ defaults get_qdq_activations_from_prev_layer=True when unset. Tests cover parsing, migration, conflicts, and algorithm-specific rules.
Config consumption in calibration pipeline
modelopt/torch/quantization/mode.py, modelopt/torch/quantization/model_calib.py
wrapped_calib_func and layerwise_calibrate extract nested layerwise parameters (enable, checkpoint_dir, get_qdq_activations_from_prev_layer, save_every, save_quantizers_only). Pipeline uses a conditional context to control QDQ activation capture for next-layer inputs and initializes _CheckpointState with the new options. Error strings and GPTQ docstring updated to reference layerwise.enable.
Checkpoint persistence with windowing and quantizer-only support
modelopt/torch/quantization/utils/layerwise_calib.py, tests/unit/torch/quantization/test_layerwise_calibrate.py
Manifest now stores save_every and save_quantizers_only. Adds _save_layer_files to write per-layer qstate/output_meta plus either weights.pt or quantizer_buffers.pt. _CheckpointState accepts save_every and save_quantizers_only, tracks _last_saved_layer for window commits, and from_folder validates/propagates these keys. save(...) snapshots per layer but only commits manifest and next_inputs.pt at window boundaries or final layer. Tests verify windowing, quantizer-only layout, resume, and crash-recovery semantics.
User-facing integration: examples, recipes, and HF script
examples/llm_ptq/example_utils.py, examples/llm_ptq/hf_ptq.py, modelopt_recipes/general/ptq/*.yaml, modelopt_recipes/huggingface/*/*.yaml, tests/unit/torch/quantization/test_layerwise_calibrate.py
_layerwise_checkpoint_dir_location() detects flat vs nested shapes; resolve_checkpoint_dir() returns (quant_cfg, resolved_path) and writes the resolved path back in the correct shape. HF PTQ example reads layerwise.enable and prints the resolved path. Multiple PTQ recipes updated to nested layerwise: { enable: bool, checkpoint_dir: ... }. End-to-end tests validate qdq-from-prev behavior and checkpoint resume correctness.
Changelog documentation
CHANGELOG.rst
Documents introduction of nested LayerwiseConfig with new checkpoint and QDQ activation options, and deprecation of legacy boolean and flat checkpoint_dir fields.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Config as QuantizationConfig
  participant Pipeline as CalibPipeline
  participant Checkpoint as CheckpointState

  User->>Config: provide legacy bool or nested layerwise config
  Config->>Config: coerce legacy boolean/alias → nested LayerwiseConfig (emit DeprecationWarning)
  Config->>Config: migrate layerwise_checkpoint_dir → layerwise.checkpoint_dir (if present)
  Config->>Pipeline: return normalized LayerwiseConfig

  Pipeline->>Pipeline: extract enable, checkpoint_dir, save_every, save_quantizers_only, qdq_from_prev
  alt layerwise.enable == True
    Pipeline->>Checkpoint: _CheckpointState.from_folder(checkpoint_dir, num_layers, save_every, save_quantizers_only)
    loop per layer
      Pipeline->>Pipeline: calibrate layer
      alt qdq_from_prev == False
        Pipeline->>Pipeline: capture next_inputs before calib_func (disable quantizers under current layer)
      else qdq_from_prev == True
        Pipeline->>Pipeline: capture next_inputs after calib_func (allow QDQ effects)
      end
      alt layer_idx % save_every == 0 OR final layer
        Pipeline->>Checkpoint: save(layer_idx, model, layers, next_inputs?)
        Checkpoint->>Checkpoint: write qstate, (weights.pt OR quantizer_buffers.pt), output_meta
        Checkpoint->>Checkpoint: commit manifest.json and next_inputs.pt for window boundary or final
      else
        Pipeline->>Checkpoint: write per-layer files without committing manifest
      end
    end
  else
    Pipeline->>Pipeline: run sequential calibration
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • sugunav14
  • cjluo-nv

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Two torch.load calls in layerwise_calib.py (lines 690-691 and 699-700) use weights_only=False without inline comments justifying why it is safe, violating SECURITY.md requirement #1. Add inline comments to lines 690 and 699 explaining why weights_only=False is safe (files are internally generated by _save_layer, not user-supplied), matching the pattern used for other torch.load calls in the file.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main changes: introducing a nested configuration for layerwise calibration with new options (QDQ-from-prev-layer flag and checkpoint I/O knobs).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/layerwise-config

Comment @coderabbitai help to get the list of available commands and usage tips.

@Fridah-nv Fridah-nv requested review from realAsma and sugunav14 May 29, 2026 23:03
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv
Copy link
Copy Markdown
Contributor Author

/claude review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1571/

Built to branch gh-pages at 2026-06-01 15:43 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

The nested LayerwiseConfig refactor, get_qdq_activations_from_prev_layer flag, and save_quantizers_only whitelist look well-designed and the back-compat shims (bool coercion + flat→nested migration with the use_sequential rescue) are thorough. Tests around config validation and the qdq=False-matches-sequential-amax behavioral check are good.

However, save_every > 1 looks broken: by the time _CheckpointState.save() is invoked for the boundary layer, the earlier layers in the window have already been replaced by _SkipLayer instances in transformer_layers (via _set_layer_states(layer_idx+1) swapping layer_idx-1 to a dummy during cache_outputs_for_next_layer_calib). Reading layers[i].state_dict() / quantizer_state(layers[i]) / layers[i]._layerwise_calib.output_meta on a _SkipLayer returns empty/None state because _original is attached via object.__setattr__ and is not a registered submodule. The new test_layerwise_save_every_writes_next_inputs_only_at_window_boundaries only checks file existence, so the silent corruption isn't caught — there is no resume test for save_every > 1. See inline comment for details and the suggested fix (snapshot to memory immediately after each layer is calibrated, then flush at boundaries; or save real layers eagerly and only window-gate next_inputs.pt).

Minor: changelog says "TODO before marking ready for review" in the PR body but the changelog entry is already added — please confirm the description is up to date before merge.

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated
Comment thread modelopt/torch/quantization/utils/layerwise_calib.py
Comment thread modelopt/torch/quantization/config.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 153-160: Add an explicit __all__ declaration near the top of the
module that lists the new public API symbols (e.g. "LayerwiseConfig" and any
other public config classes, functions or constants defined in this file and the
other block referenced around 636-691); place it after the imports and ensure it
enumerates every symbol intended for export so star-imports are safe and stable,
and update corresponding package __init__.py re-exports if needed to re-export
these names.
- Around line 706-708: _coerce_layerwise_input currently returns
value.model_dump() for LayerwiseConfig which expands defaults and loses the
original model_fields_set causing GPTQ's _gptq_qdq_default to miss injecting
defaults; change _coerce_layerwise_input to return the LayerwiseConfig instance
(value) unchanged when isinstance(value, LayerwiseConfig) so downstream parsing
preserves model_fields_set (e.g., self.layerwise.model_fields_set) and
GPTQCalibConfig._gptq_qdq_default can correctly detect which fields the user
actually set.

In `@modelopt/torch/quantization/utils/layerwise_calib.py`:
- Around line 729-761: The loop that saves windowed checkpoints is capturing
_SkipLayer placeholders rather than the underlying calibrated layer because
cache_outputs_for_next_layer_calib() can replace entries in layers with
_SkipLayer before save; update the save loop in the function that iterates
layers[i] to detect instances of _SkipLayer and use layer = layer._original (or
otherwise snapshot the original layer list before cache outputs mutate it) so
_save_layer receives the real calibrated layer, its quantizer state
(quantizer_state(layer)), and correct output_meta rather than the placeholder.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e32cbbac-64bc-4625-80b8-93a1790c3d62

📥 Commits

Reviewing files that changed from the base of the PR and between 40a4dd3 and dbca1a5.

📒 Files selected for processing (16)
  • CHANGELOG.rst
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/mode.py
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/utils/layerwise_calib.py
  • modelopt_recipes/general/ptq/nvfp4_default-kv_none-gptq.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8_layerwise.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yaml
  • modelopt_recipes/huggingface/qwen3_5/ptq/w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml
  • modelopt_recipes/huggingface/qwen3_5_moe/ptq/w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml
  • tests/unit/torch/quantization/test_config_validation.py
  • tests/unit/torch/quantization/test_layerwise_calibrate.py

Comment thread modelopt/torch/quantization/config.py
Comment thread modelopt/torch/quantization/config.py
Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review — 1 CRITICAL, 0 IMPORTANT, 2 SUGGESTIONs.

Most impactful finding (inline on modelopt/torch/quantization/utils/layerwise_calib.py):

_save_layer calls _write_manifest(...) at the end of every individual call, but during a save_every > 1 window save the non-boundary layers in the window are written without a next_inputs.pt. So if the process is killed (or torch.save raises) between the first and last _save_layer call within a window, the manifest advances to a layer that has no next_inputs.pt. On resume, _CheckpointState.setup_resume then raises FileNotFoundError("Cannot resume: next_inputs.pt missing for layer N") and the checkpoint is permanently broken — there's no fallback that re-calibrates the partial window. The PR description's claim that "interrupting mid-window re-calibrates that window on resume" is the intended invariant, but the implementation doesn't enforce it.

The fix is small: only advance the manifest after the entire window has been flushed (drop the _write_manifest call from _save_layer and call it once at the end of _CheckpointState.save() with last_completed_layer = layer_idx). A regression test that monkeypatches _write_manifest to raise mid-window would lock this in. With save_every=1 (the default) the bug isn't reachable, so the existing tests don't catch it — this only bites users who opt into the new knob.

Other:

  • SUGGESTION: legacy flat layerwise_checkpoint_dir is migrated silently in _migrate_layerwise_checkpoint_dir. The bool-form layerwise emits a DeprecationWarning; consider symmetric treatment so the migration window for the flat key is signaled.
  • SUGGESTION: resolve_checkpoint_dir in examples/llm_ptq/example_utils.py only updates whichever shape it picks — if the user happens to specify both flat and nested with the same value, the rewrite produces a Pydantic conflict whose error message points at the resolver's output, not at the user's redundancy. Edge case but worth a guard.

Risk: Low–medium. The default path (save_every=1, no save_quantizers_only) is well-tested and the algorithm-correctness pieces (qdq-from-prev semantics, GPTQ default via model_fields_set, save_quantizers_only whitelist) look right and are covered by the new tests. The save_every>1 path has the resume-corruption bug above, which only opt-in users will hit.

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py
Comment thread modelopt/torch/quantization/config.py
Comment thread examples/llm_ptq/example_utils.py
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: the previously flagged critical save_every > 1 corruption bug is still present in modelopt/torch/quantization/utils/layerwise_calib.py. The new test only checks file existence, not file contents, so the corruption remains silent. Other previous comments (manifest update during partial-window crash, _coerce_layerwise_input model_dump() losing model_fields_set) are also unaddressed. Marking comment rather than nudge because the critical bug from the previous review is unresolved.

Status of previous comments:

  • (critical, UNRESOLVED) save_every > 1 reads from _SkipLayer stand-ins → silent on-disk corruption. Still no resume test for save_every > 1.
  • (critical, UNRESOLVED) Manifest is advanced per-_save_layer call, so a mid-window crash leaves manifest=K with no next_inputs.pt for layer K, breaking resume permanently.
  • (major, UNRESOLVED) _coerce_layerwise_input calls .model_dump() on a LayerwiseConfig instance, which expands defaults and breaks _gptq_qdq_default's reliance on model_fields_set.
  • (minor) Cross-version save_every / save_quantizers_only mismatch on resume — still no manifest-side guard.
  • (minor) __all__ declaration in config.py — still missing.
  • (minor) Symmetric DeprecationWarning for flat layerwise_checkpoint_dir migration — still silent.
  • (minor) resolve_checkpoint_dir flat+nested redundancy — still no guard.

Resolved/no-longer-flagging:

  • Changelog entry is present and the "TODO before marking ready" note appears to have been dropped from the PR body.

Design-review pass: the PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the design protocol's "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable; correctness, not design, is the issue here.


Additional comments (outside the PR diff):

  • modelopt/torch/quantization/utils/layerwise_calib.py:522 — > Bot comment.

Critical (carried over, still unresolved): _save_layer calls _write_manifest(checkpoint_dir, idx, num_layers) at the end of every individual call, but during a save_every > 1 window save the non-boundary layers in the window are written without a next_inputs.pt. If the process is killed (or torch.save raises) between the first _save_layer and the boundary _save_layer within a window, the manifest advances to a layer that has no next_inputs.pt. On resume, setup_resume then raises FileNotFoundError("Cannot resume: next_inputs.pt missing for layer N") and the checkpoint is permanently broken — the PR description's claim that "interrupting mid-window re-calibrates that window on resume" is not what the code does.

Fix: drop the _write_manifest call from _save_layer and call it once at the end of _CheckpointState.save() (with last_completed_layer = layer_idx) so the manifest only advances after the boundary next_inputs.pt is on disk. A regression test that monkeypatches _write_manifest to raise mid-window would lock this in.

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated
Comment thread modelopt/torch/quantization/config.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 99.19355% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.91%. Comparing base (eb5ed2d) to head (af1bb4f).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1571      +/-   ##
==========================================
+ Coverage   76.66%   76.91%   +0.25%     
==========================================
  Files         478      478              
  Lines       52376    53213     +837     
==========================================
+ Hits        40156    40931     +775     
- Misses      12220    12282      +62     
Flag Coverage Δ
examples 41.49% <42.10%> (+8.62%) ⬆️
gpu 59.52% <91.22%> (-0.55%) ⬇️
regression 15.20% <35.96%> (+0.01%) ⬆️
unit 53.67% <99.15%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv
Copy link
Copy Markdown
Contributor Author

/claude review

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: all previously-flagged critical issues are resolved with regression tests.

  • save_every > 1 _SkipLayer corruption (critical, was unresolved) — Fixed. _CheckpointState.save() now reads from layers[layer_idx] (the just-calibrated layer) and is called every iteration before the next-layer capture forward swaps prior layers to _SkipLayer. Per-layer files (weights/quantizer state/output_meta) are written eagerly; only next_inputs.pt + manifest are deferred to window boundaries.
  • Mid-window crash advancing manifest (critical, was unresolved) — Fixed. _save_layer_files no longer writes the manifest; _CheckpointState.save() writes it once at boundary/final, so a torch.save crash mid-window leaves the manifest pointing at the previous boundary. Covered by test_layerwise_save_every_mid_window_crash_recovers_at_prev_boundary (monkeypatches torch.save to crash on layer 2 mid-window, asserts manifest stays at layer 1).
  • _coerce_layerwise_input .model_dump() expanding defaults (critical/major, was unresolved) — Fixed. Now uses model_dump(exclude_unset=True) so LayerwiseConfig instance inputs preserve model_fields_set, and GPTQ's _gptq_qdq_default correctly injects True for fields the user didn't explicitly set. Test parametrization extended to cover LayerwiseConfig(enable=True) + LayerwiseConfig(enable=True, get_qdq_activations_from_prev_layer=False).
  • Save_every>1 resume parity test (was missing) — Added. test_layerwise_checkpoint_resume_matches_one_shot_amax parametrized over (save_every=2, save_quantizers_only=False) and (save_every=1, save_quantizers_only=True) rewinds the manifest and asserts amax equivalence with a one-shot baseline — i.e. it would fail under the previous corruption.
  • Cross-version save_every/save_quantizers_only mismatch (minor) — Addressed. Manifest now persists both keys; from_folder raises on any mismatch. Covered by test_layerwise_checkpoint_mismatch_save_every_raises.
  • Symmetric DeprecationWarning for flat layerwise_checkpoint_dir (minor) — Addressed; warning emitted in _migrate_layerwise_checkpoint_dir.
  • resolve_checkpoint_dir flat+nested redundancy (minor) — Addressed; resolver writes back to both shapes when both present, keeping the dict internally consistent.
  • __all__ declaration (minor) — Author marked out of scope; acceptable.

Design-review pass: this PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable.

Complex PR: spans 8 directories (≥ 5). Looping in a human for approval.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Comment on lines +1831 to +1844
# When qdq_from_prev is False, temporarily disable every quantizer
# under the just-calibrated layer so the next layer sees full-precision
# activations (matches non-layerwise calibration semantics).
capture_ctx: AbstractContextManager = (
nullcontext()
if qdq_from_prev
else set_quantizer_by_cfg_context(
layer, [{"quantizer_name": "*", "enable": False}]
)
)
with capture_ctx:
next_inputs = input_getter.cache_outputs_for_next_layer_calib(
layer, forward_loop
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. even if we disable quantizers, some algorithms such as GPTQ modify the weights.

can we do input caching before calling calib_func if qdq_from_prev is false?

Copy link
Copy Markdown
Contributor Author

@Fridah-nv Fridah-nv Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a real gap. Updated the code.

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: all four previously-flagged critical issues are resolved with meaningful regression tests.

  • 💬 save_every > 1 _SkipLayer corruption (critical, was unresolved) — Fixed. _CheckpointState.save() now reads from layers[layer_idx] (the just-calibrated layer) before the next iteration's capture forward swaps prior layers to _SkipLayer. Per-layer files (weights/quantizer state/output_meta) are written eagerly; only next_inputs.pt + manifest are deferred to window boundaries. Covered by the parametrized test_layerwise_checkpoint_resume_matches_one_shot_amax(save_every=2) parity test against a one-shot baseline.
  • 💬 Mid-window crash advancing manifest (critical, was unresolved) — Fixed. _save_layer_files no longer writes the manifest; _CheckpointState.save() writes it once at boundary/final, so a torch.save crash mid-window leaves the manifest pointing at the previous boundary. Covered by test_layerwise_save_every_mid_window_crash_recovers_at_prev_boundary (monkeypatches torch.save to crash on layer 2 mid-window, asserts manifest stays at layer 1).
  • 💬 _coerce_layerwise_input .model_dump() expanding defaults (critical, was unresolved) — Fixed. Now uses model_dump(exclude_unset=True) so LayerwiseConfig instance inputs preserve model_fields_set, and GPTQ's _gptq_qdq_default correctly injects True for fields the user didn't explicitly set. Test parametrization extended to cover LayerwiseConfig(enable=True) and LayerwiseConfig(enable=True, get_qdq_activations_from_prev_layer=False).
  • 💬 qdq_from_prev=False weight-mutation leak (was raised by realAsma) — Fixed. Next-layer input capture moved to before calib_func when qdq_from_prev=False, so weight-mutating algorithms (GPTQ/AWQ/SmoothQuant folds) can't leak into downstream captures. Covered by test_layerwise_no_qdq_captures_inputs_before_calib_func_mutates_weights (destructive vs identity calib_func produce identical captured inputs).
  • 💬 Cross-version save_every / save_quantizers_only mismatch (minor) — Addressed. Manifest now persists both keys; from_folder raises on any mismatch. Covered by test_layerwise_checkpoint_mismatch_save_every_raises.
  • 💬 Symmetric DeprecationWarning for flat layerwise_checkpoint_dir (minor) — Addressed; warning emitted in _migrate_layerwise_checkpoint_dir.
  • 💬 resolve_checkpoint_dir flat+nested redundancy (minor) — Addressed; resolver writes the resolved path back to both shapes when both are present so the migration validator sees consistent values.
  • 💬 __all__ declaration in config.py (minor) — Author marked out of scope; acceptable.

Design-review pass: this PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable.

Looping in a human for final sign-off because (a) complexity gate fired (16 files, +853/-135, 8 directories), and (b) author's reported end-to-end verification (Qwen3-8B, 4 PTQ runs, 905 tensors bit-identical across legacy-vs-nested and nested-vs-baseline pairs) is a strong correctness signal but lives outside CI — a human reviewer with the runtime context should confirm the bit-equivalence claim is sufficient.

@realAsma
Copy link
Copy Markdown
Contributor

realAsma commented Jun 1, 2026

@Fridah-nv can you take a look at #1592?

I have a improvement for layerwise (Avoid write back with FSDP2 / CPU/Disk offload unless needed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants