Skip to content

Refactor local_hessian onto shared MSE flow + fused-MoE expert support#1578

Open
Fridah-nv wants to merge 4 commits into
mainfrom
fridah/local-hessian-refactor
Open

Refactor local_hessian onto shared MSE flow + fused-MoE expert support#1578
Fridah-nv wants to merge 4 commits into
mainfrom
fridah/local-hessian-refactor

Conversation

@Fridah-nv
Copy link
Copy Markdown
Contributor

@Fridah-nv Fridah-nv commented Jun 1, 2026

What does this PR do?

Type of change: Bug fix + new feature (fused-MoE coverage)

Dusted off and refactored local_hessian_calibrate to align with the new MSE calibration flow, fix latent
drift, extend coverage to fused-MoE experts, and decouple module-specific handling behind a clean extension point.

Core changes (modelopt/torch/quantization/model_calib.py):

  • Extracted a shared _mse_calibrate_weights helper now used by both mse_calibrate and local_hessian_calibrate
  • Replaced the monolithic LocalHessianHelper + bespoke per-weight loop with a small _LocalHessianAccumulator (lazy fp32 buffer, freed after building the error func), removing ~200 lines of duplicated scale-search logic and all manual cuda.synchronize/empty_cache bookkeeping. The XᵀX GEMM accumulates in fp32 to avoid bf16/fp16 precision loss.
  • Removed dead NVFP4-static promotion (now handled inside max_calibrate).
  • Fused-MoE expert support: per-expert Hessians captured from each expert's routed activations, keyed by
    id(weight_quantizer) so dense and per-expert paths share one calibration loop. Never-routed experts / non-eager kernels / cin not divisible by block_size / registered backends fall back to plain MSE (with an eager, module-named warning).

Decoupling (zero module-type-specific code in model_calib.py):

  • Added QuantModule.register_calibration_input_hooks(callback) — the activation-side counterpart to iter_weights_for_calibration. Base default is a no-op; QuantLinearConvBase pairs the weight quantizer with the
    forward input (linear only), and _QuantFusedExperts (in plugins/huggingface.py) owns the per-expert pairing via _current_expert_idx. Any future module type gains local-Hessian support by implementing this one method.

Usage

# Add a code snippet demonstrating how to use this

Testing

  • new tests/unit/torch/quantization/test_local_hessian.py (accumulator math/shape/dtype, dense end-to-end, backend-skip, block-size guard) and a per-expert MoE test in test_fused_experts.py.
  • Behavior-preserving check: refactored branch produces bit-identical dense weight scales to main (216/216 tensors) on Qwen3-8B with the fp32 accumulation neutralized to isolate the structural change.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅ / ❌ / N/A
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Hessian-weighted calibration option for improved weight-quantizer refinement.
    • Activation-side calibration hooks enabling per-expert calibration in fused-expert models.
  • Behavior

    • Custom error-function usage is skipped with a warning for certain FP8 sweep backends.
    • User-facing warnings for block-size / alignment mismatches and for missing capture flows.
  • Tests

    • New unit tests covering local-Hessian calibration, per-expert capture, error-function wiring, and related warnings.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 1, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8e2735e2-b1b1-4f71-afed-29d0fa09720e

📥 Commits

Reviewing files that changed from the base of the PR and between 63ff148 and 9a6bc61.

📒 Files selected for processing (1)
  • tests/unit/torch/quantization/test_local_hessian.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/torch/quantization/test_local_hessian.py

📝 Walkthrough

Walkthrough

Refactors local-Hessian-weighted MSE calibration: adds activation-capture hooks, threads per-quantizer error functions into MSE calibrators, implements per-block Hessian accumulation during forward capture, and runs shared MSE refinement; includes fused-experts hook wiring and unit tests.

Changes

Local Hessian Calibration Enhancement

Layer / File(s) Summary
Calibration hook registration API
modelopt/torch/quantization/nn/modules/quant_module.py
QuantModule.register_calibration_input_hooks provides the base extension point for pairing weight quantizers with activations; QuantLinearConvBase overrides to register a forward pre-hook for 2-D weight quantizers when enabled.
Fused experts calibration hook implementation
modelopt/torch/quantization/plugins/huggingface.py
_QuantFusedExperts.register_calibration_input_hooks registers forward pre-hooks on shared per-projection input quantizers to pair the routed expert's per-expert weight quantizer with the current input activation using _current_expert_idx.
MSE calibrator error-function plumbing
modelopt/torch/quantization/model_calib.py
Imports distributed warn helpers and removes legacy reduce_amax; _make_weight_mse_calibrator gains an error_func parameter and threads it into NVFP4/ MSE calibrators while guarding FP8 sweep backends from unsupported overrides.
Shared MSE loop with per-quantizer error dispatch
modelopt/torch/quantization/model_calib.py
Refactors mse_calibrate to use _mse_calibrate_weights which queries an error_func_for per TensorQuantizer and passes it into _make_weight_mse_calibrator.
Hessian accumulator and error-fn builder
modelopt/torch/quantization/model_calib.py
Adds _LocalHessianAccumulator to capture per-block XᵀX during forward hooks and to construct Hessian-weighted error_funcs; adds warnings for CIN divisibility and block-size misalignment and documents debug caching.
Local Hessian calibration orchestration
modelopt/torch/quantization/model_calib.py
local_hessian_calibrate now silences eligible weight quantizers, registers accumulators via the hook API, runs the forward capture to populate per-block Hessians, restores state, optionally caches debug accumulators, builds per-quantizer error functions, and calls the shared _mse_calibrate_weights loop with those functions.
Comprehensive tests
tests/unit/torch/quantization/test_local_hessian.py, tests/unit/torch/quantization/plugins/test_fused_experts.py
Adds unit tests for _LocalHessianAccumulator accumulation and error-fn correctness, integration tests for local_hessian_calibrate behavior and warnings, activation-capture extension-point tests, error-func wiring tests, and fused-experts per-expert calibration refinement test.

Sequence Diagram

sequenceDiagram
  participant Calibration as local_hessian_calibrate
  participant Model as Model
  participant Quantizer as WeightQuantizer
  participant Accumulator as _LocalHessianAccumulator
  participant ForwardLoop as forward_loop
  participant MSE as _mse_calibrate_weights

  Calibration->>Quantizer: silence enabled weight quantizers
  Calibration->>Model: register hooks via register_calibration_input_hooks
  Accumulator->>Accumulator: init per-quantizer accumulators
  Calibration->>ForwardLoop: execute forward pass
  ForwardLoop->>Model: forward(input)
  Model->>Accumulator: invoke hook (weight_quantizer, weight, input)
  Accumulator->>Accumulator: accumulate XᵀX per block
  Calibration->>Quantizer: restore weight quantizers
  Calibration->>Accumulator: build error_func from Hessians
  Calibration->>MSE: _mse_calibrate_weights(error_func_for)
  MSE->>MSE: refine amax using Hessian-weighted loss
Loading

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: refactoring local_hessian calibration onto the shared MSE flow and adding fused-MoE expert support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No SECURITY.md violations found. No unsafe torch.load, numpy.load, hardcoded trust_remote_code in code, eval/exec on untrusted input, or nosec comments in the 5 PR files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/local-hessian-refactor

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1578/

Built to branch gh-pages at 2026-06-01 21:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 93.02326% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.81%. Comparing base (40a4dd3) to head (9a6bc61).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 94.05% 6 Missing ⚠️
modelopt/torch/quantization/plugins/huggingface.py 88.23% 2 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_module.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1578      +/-   ##
==========================================
- Coverage   77.33%   76.81%   -0.53%     
==========================================
  Files         478      479       +1     
  Lines       52421    52754     +333     
==========================================
- Hits        40541    40524      -17     
- Misses      11880    12230     +350     
Flag Coverage Δ
examples 41.67% <12.40%> (+0.86%) ⬆️
gpu 59.76% <74.41%> (-0.62%) ⬇️
regression 15.20% <12.40%> (+0.09%) ⬆️
unit 53.85% <93.02%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv marked this pull request as ready for review June 1, 2026 20:45
@Fridah-nv Fridah-nv requested a review from a team as a code owner June 1, 2026 20:45
@Fridah-nv Fridah-nv requested a review from realAsma June 1, 2026 20:45
@Fridah-nv
Copy link
Copy Markdown
Contributor Author

/claude review

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Solid refactor that consolidates local_hessian_calibrate onto the shared MSE flow and extends Hessian capture to HF fused-MoE experts via a new register_calibration_input_hooks extension point on QuantModule. Code is well-structured, tests are meaningful (accumulator math, dense end-to-end, conv fallback, block-size mismatch, registered-backend skip, per-expert MoE capture + refinement), and bit-identical preservation on Qwen3-8B is claimed.

Design check passed: the new extension point is a small activation-side counterpart to the existing iter_weights_for_calibration, not a parallel subsystem; this is the right shape rather than introducing a new registry/loader. The PR explicitly documents fallback paths (never-routed experts, non-eager kernels, cin % block_size != 0, registered backends) and makes them warn rather than silently miscalibrate.

Nudging (not approving) for two reasons that warrant a human signoff:

  1. Distributed correctness gap is acknowledged but not fixed. The new code emits a warn_rank_0 and a TODO that "the per-block Hessian is not synced across TP/DP ranks ... refined amaxes can diverge under tensor/data parallelism. Treat local_hessian as single-rank for now." The previous (LocalHessianHelper) implementation had the same gap (also TODO'd), so this is no regression — but the change still ships a feature whose multi-rank behavior is known-divergent. A maintainer should confirm this is acceptable for the intended use cases (and whether local_hessian_calibrate should hard-error rather than warn under TP/DP > 1, as max_calibrate does for NVFP4-static + TP).

  2. Bit-equivalence claim is human-verified outside CI. The PR body says the refactor produces "bit-identical dense weight scales to main (216/216 tensors) on Qwen3-8B with the fp32 accumulation neutralized to isolate the structural change." This is exactly the kind of claim that needs human confirmation since it can't be re-run in CI; please have the original reviewer/runner sign off that the comparison was done with fp32 accumulation neutralized (i.e. bf16 GEMM as in the old code) and that the deliberate fp32-accumulation switch in the new path is the only intended numerical change.

Minor (non-blocking) observations:

  • from modelopt.torch.utils.distributed import ... is split across three statements; the two new ones (is_initialized as dist_is_initialized, size as dist_size) could be folded into the existing line.
  • _warn_local_hessian_fallback is only called when captures is non-empty, which means a cin % block_size != 0 warning is suppressed for non-pairable modules (where it wouldn't matter anyway). Consider a brief comment so a future reader doesn't think the warning is missing.
  • SequentialQuantizer weight quantizers are silently skipped (matches mse_calibrate), but this is invisible to users running local_hessian on a SequentialQuantizer config — they'll see no fallback warning since register_calibration_input_hooks does pair them. Worth a one-line comment in local_hessian_calibrate or a sibling warning path.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/torch/quantization/model_calib.py (1)

497-520: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fallback weights are skipped instead of getting plain MSE.

With fp8_scale_sweep=True, this helper returns None for every unregistered backend and every non-static-NVFP4 quantizer. In local_hessian_calibrate(), that is also the path used for "no Hessian available" fallbacks, so those weights stay at max calibration instead of receiving the documented plain-MSE refinement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/model_calib.py` around lines 497 - 520, The
fp8_scale_sweep branch currently returns None for unregistered backends and
non-static NVFP4 quantizers, causing those weights to skip plain-MSE refinement;
change that fallback to return the same plain-MSE calibrator used by
local_hessian_calibrate instead of None. Specifically, in the block after
_uses_modelopt_fp8_weight_scales(...) where it now returns None, instantiate and
return the plain MSE calibrator (the same class/factory local_hessian_calibrate
uses for "no Hessian available") with the same parameters you pass to
NVFP4MSECalibrator (initial_amax, axis, quant_func, error_func and global_amax
from weight_quantizer if needed) so unregistered/non-static backends receive
plain-MSE calibration rather than None.
🧹 Nitpick comments (1)
modelopt/torch/quantization/model_calib.py (1)

503-509: ⚡ Quick win

Use warn_rank_0 for this backend-skip warning.

This can fire once per quantizer on every rank, so warnings.warn will get noisy in distributed calibration.

Suggested fix
-                warnings.warn(
+                warn_rank_0(
                     f"local_hessian: backend '{backend}' does not support a custom error "
                     "function; skipping Hessian-weighted calibration for this quantizer."
                 )

As per coding guidelines, "Develop with distributed processing in mind by using print_rank_0 or warn_rank_0 to avoid noisy logs and guarding shared side effects such as file writes or shared state updates against race conditions between ranks".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/model_calib.py` around lines 503 - 509, The
warning for skipping Hessian refinement should use the distributed-safe helper
warn_rank_0 instead of warnings.warn to avoid noisy warnings on every rank; in
the block that checks error_func (the local_hessian/backend skip branch) replace
warnings.warn(...) with warn_rank_0(...) (and add an import for warn_rank_0 if
missing) so the message about "backend '{backend}' does not support a custom
error function; skipping Hessian-weighted calibration for this quantizer." only
prints on rank 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 657-665: build_error_func currently clears self.hessian_per_block
before model._local_hessian_accumulators can be populated, so debug mode can't
inspect the accumulated Hessian; fix by preserving the raw accumulator until
after any debug assignment: when building hessian = self.hessian_per_block /
self.num_samples, first, if debug inspection is enabled (check
model._local_hessian_accumulators or a debug flag), assign or copy
self.hessian_per_block (or hessian) into model._local_hessian_accumulators, then
only set self.hessian_per_block = None; alternatively, conditionally only clear
self.hessian_per_block when debug is false—apply the same change at the other
clearing site referenced (the similar clear around lines 820-821).
- Around line 763-797: During the local Hessian capture pass we only silence
weight quantizers; also collect and temporarily disable activation/input
quantizers so QuantInputBase.forward() does not quantize activations during
forward_loop. Add a silenced_input_quantizers list (analogous to
silenced_weight_quantizers), and inside the loop over name_to_module when you
register captures check modules that expose an input quantizer (e.g.,
module.input_quantizer or instances of QuantInputBase) and if that quantizer is
a TensorQuantizer (or has is_enabled and _if_quant) append it to
silenced_input_quantizers and disable it; then in the finally block re-enable
each quantizer in silenced_input_quantizers (and still re-enable
silenced_weight_quantizers and remove handles) so the forward pass uses
full-precision activations for ΣXᵀX collection.

In `@modelopt/torch/quantization/nn/modules/quant_module.py`:
- Around line 262-272: The hook is currently registered for any enabled
weight_quantizer (including SequentialQuantizer) which contaminates captures;
change register_calibration_input_hooks to only register when
self.weight_quantizer is an instance of TensorQuantizer (in addition to existing
checks on weight presence/dim and is_enabled). Update the condition in
register_calibration_input_hooks to check isinstance(self.weight_quantizer,
TensorQuantizer) so only TensorQuantizer-backed weights get the forward-pre
hook; keep the _pre_hook and registration logic unchanged.

In `@tests/unit/torch/quantization/plugins/test_fused_experts.py`:
- Around line 662-664: The test currently imports modelopt.torch.quantization as
mtq and local_hessian_calibrate inside the test function; move these imports to
the module top so import errors surface at collection time—add top-level imports
for mtq (modelopt.torch.quantization) and local_hessian_calibrate
(modelopt.torch.quantization.model_calib) near other test imports and remove the
in-function import statements in test_fused_experts.py so the test uses the
module-scope mtq and local_hessian_calibrate symbols.

---

Outside diff comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 497-520: The fp8_scale_sweep branch currently returns None for
unregistered backends and non-static NVFP4 quantizers, causing those weights to
skip plain-MSE refinement; change that fallback to return the same plain-MSE
calibrator used by local_hessian_calibrate instead of None. Specifically, in the
block after _uses_modelopt_fp8_weight_scales(...) where it now returns None,
instantiate and return the plain MSE calibrator (the same class/factory
local_hessian_calibrate uses for "no Hessian available") with the same
parameters you pass to NVFP4MSECalibrator (initial_amax, axis, quant_func,
error_func and global_amax from weight_quantizer if needed) so
unregistered/non-static backends receive plain-MSE calibration rather than None.

---

Nitpick comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 503-509: The warning for skipping Hessian refinement should use
the distributed-safe helper warn_rank_0 instead of warnings.warn to avoid noisy
warnings on every rank; in the block that checks error_func (the
local_hessian/backend skip branch) replace warnings.warn(...) with
warn_rank_0(...) (and add an import for warn_rank_0 if missing) so the message
about "backend '{backend}' does not support a custom error function; skipping
Hessian-weighted calibration for this quantizer." only prints on rank 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 967322de-f723-4696-8e51-a3a3a8baf41c

📥 Commits

Reviewing files that changed from the base of the PR and between 552ab90 and 0bfe1d4.

📒 Files selected for processing (5)
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/nn/modules/quant_module.py
  • modelopt/torch/quantization/plugins/huggingface.py
  • tests/unit/torch/quantization/plugins/test_fused_experts.py
  • tests/unit/torch/quantization/test_local_hessian.py

Comment thread modelopt/torch/quantization/model_calib.py Outdated
Comment thread modelopt/torch/quantization/model_calib.py Outdated
Comment thread modelopt/torch/quantization/nn/modules/quant_module.py
Comment thread tests/unit/torch/quantization/plugins/test_fused_experts.py Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review summary

Findings: CRITICAL: 0, IMPORTANT: 1, SUGGESTION: 1

The refactor is structurally clean — the new _LocalHessianAccumulator, the shared _mse_calibrate_weights helper, and the register_calibration_input_hooks extension point all match well with the existing MSE flow, and the per-expert capture via _current_expert_idx is a nice fit for the fused-MoE plugin. Tests cover the accumulator math, the dense end-to-end path, the fallback warnings, and the per-expert MoE path.

Most impactful finding

[IMPORTANT Compatibility] During the activation-capture forward, SequentialQuantizer-wrapped weight quantizers are no longer silenced. The isinstance(weight_quantizer, TensorQuantizer) filter on the silencing list excludes them (SequentialQuantizer is an nn.Sequential), whereas the previous implementation disabled them via the delegated SequentialQuantizer.disable()/enable(). For layers that use SequentialQuantizer for weights (e.g. INT4 weights + FP8 scale, or downgraded restore paths), the Hessian-capture forward now propagates quantized weights, subtly corrupting the captured Hessian for downstream TensorQuantizer layers. The PR's bit-identical Qwen3-8B check uses pure TensorQuantizer paths and wouldn't catch this. See inline comment at model_calib.py:776–786 for a suggested fix (iterate SequentialQuantizer leaves when building the silencing list).

Nit

[SUGGESTION] With debug=True, model._local_hessian_accumulators retains accumulators whose hessian_per_block was already freed by build_error_func. The docstring's "retain … for inspection" implies the Hessian tensor is inspectable. Either snapshot before build_error_func, or tighten the docstring.

Risk assessment

Low–medium. The common path (TensorQuantizer-only weight quantizers, e.g. INT8 / FP8 / static NVFP4) is preserved bit-identically per the author's own check. The SequentialQuantizer regression affects only INT4-with-FP8-scale-style configurations and only manifests as a small Hessian-quality drift for downstream TensorQuantizer refinements. Worth fixing before merge but doesn't break the headline use case.

Comment thread modelopt/torch/quantization/model_calib.py
Comment thread modelopt/torch/quantization/model_calib.py
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/torch/quantization/test_local_hessian.py`:
- Line 184: Move the in-test import of SequentialQuantizer to the module top:
remove the inline "from modelopt.torch.quantization.nn import
SequentialQuantizer" inside the test and add it to the file's top-level imports
so import errors surface at collection time; keep the inline import only if
there is a documented circular/optional-dependency reason and add a comment
explaining that. Ensure the test references the top-level SequentialQuantizer
symbol and run tests to confirm no circular import was introduced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b1029f88-ec3a-452d-8f44-3f0fd96d0a83

📥 Commits

Reviewing files that changed from the base of the PR and between 0bfe1d4 and 63ff148.

📒 Files selected for processing (4)
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/nn/modules/quant_module.py
  • tests/unit/torch/quantization/plugins/test_fused_experts.py
  • tests/unit/torch/quantization/test_local_hessian.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/torch/quantization/plugins/test_fused_experts.py
  • modelopt/torch/quantization/model_calib.py

Comment thread tests/unit/torch/quantization/test_local_hessian.py Outdated
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@cjluo-nv
Copy link
Copy Markdown
Collaborator

cjluo-nv commented Jun 1, 2026

Can we test doing this on a model and runs some simple evals and compare that with the base MSE without hessian?

@Fridah-nv
Copy link
Copy Markdown
Contributor Author

/claude review

@Fridah-nv
Copy link
Copy Markdown
Contributor Author

Can we test doing this on a model and runs some simple evals and compare that with the base MSE without hessian?

I plan to do a larger scale experiments after this PR lands, I'd like to merge it first to make sure basic correctness and code quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants