Refactor local_hessian onto shared MSE flow + fused-MoE expert support by Fridah-nv · Pull Request #1578 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-06-01T15:44:16Z

What does this PR do?

Type of change: Bug fix + new feature (fused-MoE coverage)

Dusted off and refactored local_hessian_calibrate to align with the new MSE calibration flow, fix latent
drift, extend coverage to fused-MoE experts, and decouple module-specific handling behind a clean extension point.

Core changes (modelopt/torch/quantization/model_calib.py):

Extracted a shared _mse_calibrate_weights helper now used by both mse_calibrate and local_hessian_calibrate
Replaced the monolithic LocalHessianHelper + bespoke per-weight loop with a small _LocalHessianAccumulator (lazy fp32 buffer, freed after building the error func), removing ~200 lines of duplicated scale-search logic and all manual cuda.synchronize/empty_cache bookkeeping. The XᵀX GEMM accumulates in fp32 to avoid bf16/fp16 precision loss.
Removed dead NVFP4-static promotion (now handled inside max_calibrate).
Fused-MoE expert support: per-expert Hessians captured from each expert's routed activations, keyed by
id(weight_quantizer) so dense and per-expert paths share one calibration loop. Never-routed experts / non-eager kernels / cin not divisible by block_size / registered backends fall back to plain MSE (with an eager, module-named warning).

Decoupling (zero module-type-specific code in model_calib.py):

Added QuantModule.register_calibration_input_hooks(callback) — the activation-side counterpart to iter_weights_for_calibration. Base default is a no-op; QuantLinearConvBase pairs the weight quantizer with the
forward input (linear only), and _QuantFusedExperts (in plugins/huggingface.py) owns the per-expert pairing via _current_expert_idx. Any future module type gains local-Hessian support by implementing this one method.

Usage

# Add a code snippet demonstrating how to use this

Testing

new tests/unit/torch/quantization/test_local_hessian.py (accumulator math/shape/dtype, dense end-to-end, backend-skip, block-size guard) and a per-expert MoE test in test_fused_experts.py.
Behavior-preserving check: refactored branch produces bit-identical dense weight scales to main (216/216 tensors) on Qwen3-8B with the fp32 accumulation neutralized to isolate the structural change.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Hessian-weighted calibration option for improved weight-quantizer refinement.
- Activation-side calibration hooks enabling per-expert calibration in fused-expert models.
Behavior
- Custom error-function usage is skipped with a warning for certain FP8 sweep backends.
- User-facing warnings for block-size / alignment mismatches and for missing capture flows.
Tests
- New unit tests covering local-Hessian calibration, per-expert capture, error-function wiring, and related warnings.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

copy-pr-bot · 2026-06-01T15:44:21Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-01T15:44:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8e2735e2-b1b1-4f71-afed-29d0fa09720e

📥 Commits

Reviewing files that changed from the base of the PR and between 63ff148 and 9a6bc61.

📒 Files selected for processing (1)

tests/unit/torch/quantization/test_local_hessian.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/torch/quantization/test_local_hessian.py

📝 Walkthrough

Walkthrough

Refactors local-Hessian-weighted MSE calibration: adds activation-capture hooks, threads per-quantizer error functions into MSE calibrators, implements per-block Hessian accumulation during forward capture, and runs shared MSE refinement; includes fused-experts hook wiring and unit tests.

Changes

Local Hessian Calibration Enhancement

Layer / File(s)	Summary
Calibration hook registration API `modelopt/torch/quantization/nn/modules/quant_module.py`	`QuantModule.register_calibration_input_hooks` provides the base extension point for pairing weight quantizers with activations; `QuantLinearConvBase` overrides to register a forward pre-hook for 2-D weight quantizers when enabled.
Fused experts calibration hook implementation `modelopt/torch/quantization/plugins/huggingface.py`	`_QuantFusedExperts.register_calibration_input_hooks` registers forward pre-hooks on shared per-projection input quantizers to pair the routed expert's per-expert weight quantizer with the current input activation using `_current_expert_idx`.
MSE calibrator error-function plumbing `modelopt/torch/quantization/model_calib.py`	Imports distributed warn helpers and removes legacy `reduce_amax`; `_make_weight_mse_calibrator` gains an `error_func` parameter and threads it into NVFP4/ MSE calibrators while guarding FP8 sweep backends from unsupported overrides.
Shared MSE loop with per-quantizer error dispatch `modelopt/torch/quantization/model_calib.py`	Refactors `mse_calibrate` to use `_mse_calibrate_weights` which queries an `error_func_for` per `TensorQuantizer` and passes it into `_make_weight_mse_calibrator`.
Hessian accumulator and error-fn builder `modelopt/torch/quantization/model_calib.py`	Adds `_LocalHessianAccumulator` to capture per-block XᵀX during forward hooks and to construct Hessian-weighted `error_func`s; adds warnings for CIN divisibility and block-size misalignment and documents debug caching.
Local Hessian calibration orchestration `modelopt/torch/quantization/model_calib.py`	`local_hessian_calibrate` now silences eligible weight quantizers, registers accumulators via the hook API, runs the forward capture to populate per-block Hessians, restores state, optionally caches debug accumulators, builds per-quantizer error functions, and calls the shared `_mse_calibrate_weights` loop with those functions.
Comprehensive tests `tests/unit/torch/quantization/test_local_hessian.py`, `tests/unit/torch/quantization/plugins/test_fused_experts.py`	Adds unit tests for `_LocalHessianAccumulator` accumulation and error-fn correctness, integration tests for `local_hessian_calibrate` behavior and warnings, activation-capture extension-point tests, error-func wiring tests, and fused-experts per-expert calibration refinement test.

Sequence Diagram

sequenceDiagram
  participant Calibration as local_hessian_calibrate
  participant Model as Model
  participant Quantizer as WeightQuantizer
  participant Accumulator as _LocalHessianAccumulator
  participant ForwardLoop as forward_loop
  participant MSE as _mse_calibrate_weights

  Calibration->>Quantizer: silence enabled weight quantizers
  Calibration->>Model: register hooks via register_calibration_input_hooks
  Accumulator->>Accumulator: init per-quantizer accumulators
  Calibration->>ForwardLoop: execute forward pass
  ForwardLoop->>Model: forward(input)
  Model->>Accumulator: invoke hook (weight_quantizer, weight, input)
  Accumulator->>Accumulator: accumulate XᵀX per block
  Calibration->>Quantizer: restore weight quantizers
  Calibration->>Accumulator: build error_func from Hessians
  Calibration->>MSE: _mse_calibrate_weights(error_func_for)
  MSE->>MSE: refine amax using Hessian-weighted loss

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the two main changes: refactoring local_hessian calibration onto the shared MSE flow and adding fused-MoE expert support.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No SECURITY.md violations found. No unsafe torch.load, numpy.load, hardcoded trust_remote_code in code, eval/exec on untrusted input, or nosec comments in the 5 PR files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/local-hessian-refactor

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-01T15:48:13Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1578/
Built to branch `gh-pages` at 2026-06-01 21:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-01T15:57:27Z

Codecov Report

❌ Patch coverage is 93.02326% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.81%. Comparing base (40a4dd3) to head (9a6bc61).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	94.05%	6 Missing ⚠️
modelopt/torch/quantization/plugins/huggingface.py	88.23%	2 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_module.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1578      +/-   ##
==========================================
- Coverage   77.33%   76.81%   -0.53%     
==========================================
  Files         478      479       +1     
  Lines       52421    52754     +333     
==========================================
- Hits        40541    40524      -17     
- Misses      11880    12230     +350

Flag	Coverage Δ
examples	`41.67% <12.40%> (+0.86%)`	⬆️
gpu	`59.76% <74.41%> (-0.62%)`	⬇️
regression	`15.20% <12.40%> (+0.09%)`	⬆️
unit	`53.85% <93.02%> (+0.24%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv · 2026-06-01T20:45:47Z

/claude review

cjluo-nv

Bot review — DM the bot to share feedback.

Solid refactor that consolidates local_hessian_calibrate onto the shared MSE flow and extends Hessian capture to HF fused-MoE experts via a new register_calibration_input_hooks extension point on QuantModule. Code is well-structured, tests are meaningful (accumulator math, dense end-to-end, conv fallback, block-size mismatch, registered-backend skip, per-expert MoE capture + refinement), and bit-identical preservation on Qwen3-8B is claimed.

Design check passed: the new extension point is a small activation-side counterpart to the existing iter_weights_for_calibration, not a parallel subsystem; this is the right shape rather than introducing a new registry/loader. The PR explicitly documents fallback paths (never-routed experts, non-eager kernels, cin % block_size != 0, registered backends) and makes them warn rather than silently miscalibrate.

Nudging (not approving) for two reasons that warrant a human signoff:

Distributed correctness gap is acknowledged but not fixed. The new code emits a warn_rank_0 and a TODO that "the per-block Hessian is not synced across TP/DP ranks ... refined amaxes can diverge under tensor/data parallelism. Treat local_hessian as single-rank for now." The previous (LocalHessianHelper) implementation had the same gap (also TODO'd), so this is no regression — but the change still ships a feature whose multi-rank behavior is known-divergent. A maintainer should confirm this is acceptable for the intended use cases (and whether local_hessian_calibrate should hard-error rather than warn under TP/DP > 1, as max_calibrate does for NVFP4-static + TP).
Bit-equivalence claim is human-verified outside CI. The PR body says the refactor produces "bit-identical dense weight scales to main (216/216 tensors) on Qwen3-8B with the fp32 accumulation neutralized to isolate the structural change." This is exactly the kind of claim that needs human confirmation since it can't be re-run in CI; please have the original reviewer/runner sign off that the comparison was done with fp32 accumulation neutralized (i.e. bf16 GEMM as in the old code) and that the deliberate fp32-accumulation switch in the new path is the only intended numerical change.

Minor (non-blocking) observations:

from modelopt.torch.utils.distributed import ... is split across three statements; the two new ones (is_initialized as dist_is_initialized, size as dist_size) could be folded into the existing line.
_warn_local_hessian_fallback is only called when captures is non-empty, which means a cin % block_size != 0 warning is suppressed for non-pairable modules (where it wouldn't matter anyway). Consider a brief comment so a future reader doesn't think the warning is missing.
SequentialQuantizer weight quantizers are silently skipped (matches mse_calibrate), but this is invisible to users running local_hessian on a SequentialQuantizer config — they'll see no fallback warning since register_calibration_input_hooks does pair them. Worth a one-line comment in local_hessian_calibrate or a sibling warning path.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/model_calib.py (1)
497-520: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fallback weights are skipped instead of getting plain MSE.

With fp8_scale_sweep=True, this helper returns None for every unregistered backend and every non-static-NVFP4 quantizer. In local_hessian_calibrate(), that is also the path used for "no Hessian available" fallbacks, so those weights stay at max calibration instead of receiving the documented plain-MSE refinement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/model_calib.py` around lines 497 - 520, The
fp8_scale_sweep branch currently returns None for unregistered backends and
non-static NVFP4 quantizers, causing those weights to skip plain-MSE refinement;
change that fallback to return the same plain-MSE calibrator used by
local_hessian_calibrate instead of None. Specifically, in the block after
_uses_modelopt_fp8_weight_scales(...) where it now returns None, instantiate and
return the plain MSE calibrator (the same class/factory local_hessian_calibrate
uses for "no Hessian available") with the same parameters you pass to
NVFP4MSECalibrator (initial_amax, axis, quant_func, error_func and global_amax
from weight_quantizer if needed) so unregistered/non-static backends receive
plain-MSE calibration rather than None.

🧹 Nitpick comments (1)

modelopt/torch/quantization/model_calib.py (1)
503-509: ⚡ Quick win

Use warn_rank_0 for this backend-skip warning.

This can fire once per quantizer on every rank, so warnings.warn will get noisy in distributed calibration.
Suggested fix
-                warnings.warn(
+                warn_rank_0(
                     f"local_hessian: backend '{backend}' does not support a custom error "
                     "function; skipping Hessian-weighted calibration for this quantizer."
                 )
As per coding guidelines, "Develop with distributed processing in mind by using print_rank_0 or warn_rank_0 to avoid noisy logs and guarding shared side effects such as file writes or shared state updates against race conditions between ranks".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/model_calib.py` around lines 503 - 509, The
warning for skipping Hessian refinement should use the distributed-safe helper
warn_rank_0 instead of warnings.warn to avoid noisy warnings on every rank; in
the block that checks error_func (the local_hessian/backend skip branch) replace
warnings.warn(...) with warn_rank_0(...) (and add an import for warn_rank_0 if
missing) so the message about "backend '{backend}' does not support a custom
error function; skipping Hessian-weighted calibration for this quantizer." only
prints on rank 0.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 657-665: build_error_func currently clears self.hessian_per_block
before model._local_hessian_accumulators can be populated, so debug mode can't
inspect the accumulated Hessian; fix by preserving the raw accumulator until
after any debug assignment: when building hessian = self.hessian_per_block /
self.num_samples, first, if debug inspection is enabled (check
model._local_hessian_accumulators or a debug flag), assign or copy
self.hessian_per_block (or hessian) into model._local_hessian_accumulators, then
only set self.hessian_per_block = None; alternatively, conditionally only clear
self.hessian_per_block when debug is false—apply the same change at the other
clearing site referenced (the similar clear around lines 820-821).
- Around line 763-797: During the local Hessian capture pass we only silence
weight quantizers; also collect and temporarily disable activation/input
quantizers so QuantInputBase.forward() does not quantize activations during
forward_loop. Add a silenced_input_quantizers list (analogous to
silenced_weight_quantizers), and inside the loop over name_to_module when you
register captures check modules that expose an input quantizer (e.g.,
module.input_quantizer or instances of QuantInputBase) and if that quantizer is
a TensorQuantizer (or has is_enabled and _if_quant) append it to
silenced_input_quantizers and disable it; then in the finally block re-enable
each quantizer in silenced_input_quantizers (and still re-enable
silenced_weight_quantizers and remove handles) so the forward pass uses
full-precision activations for ΣXᵀX collection.

In `@modelopt/torch/quantization/nn/modules/quant_module.py`:
- Around line 262-272: The hook is currently registered for any enabled
weight_quantizer (including SequentialQuantizer) which contaminates captures;
change register_calibration_input_hooks to only register when
self.weight_quantizer is an instance of TensorQuantizer (in addition to existing
checks on weight presence/dim and is_enabled). Update the condition in
register_calibration_input_hooks to check isinstance(self.weight_quantizer,
TensorQuantizer) so only TensorQuantizer-backed weights get the forward-pre
hook; keep the _pre_hook and registration logic unchanged.

In `@tests/unit/torch/quantization/plugins/test_fused_experts.py`:
- Around line 662-664: The test currently imports modelopt.torch.quantization as
mtq and local_hessian_calibrate inside the test function; move these imports to
the module top so import errors surface at collection time—add top-level imports
for mtq (modelopt.torch.quantization) and local_hessian_calibrate
(modelopt.torch.quantization.model_calib) near other test imports and remove the
in-function import statements in test_fused_experts.py so the test uses the
module-scope mtq and local_hessian_calibrate symbols.

---

Outside diff comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 497-520: The fp8_scale_sweep branch currently returns None for
unregistered backends and non-static NVFP4 quantizers, causing those weights to
skip plain-MSE refinement; change that fallback to return the same plain-MSE
calibrator used by local_hessian_calibrate instead of None. Specifically, in the
block after _uses_modelopt_fp8_weight_scales(...) where it now returns None,
instantiate and return the plain MSE calibrator (the same class/factory
local_hessian_calibrate uses for "no Hessian available") with the same
parameters you pass to NVFP4MSECalibrator (initial_amax, axis, quant_func,
error_func and global_amax from weight_quantizer if needed) so
unregistered/non-static backends receive plain-MSE calibration rather than None.

---

Nitpick comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 503-509: The warning for skipping Hessian refinement should use
the distributed-safe helper warn_rank_0 instead of warnings.warn to avoid noisy
warnings on every rank; in the block that checks error_func (the
local_hessian/backend skip branch) replace warnings.warn(...) with
warn_rank_0(...) (and add an import for warn_rank_0 if missing) so the message
about "backend '{backend}' does not support a custom error function; skipping
Hessian-weighted calibration for this quantizer." only prints on rank 0.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 967322de-f723-4696-8e51-a3a3a8baf41c

📥 Commits

Reviewing files that changed from the base of the PR and between 552ab90 and 0bfe1d4.

📒 Files selected for processing (5)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/nn/modules/quant_module.py
modelopt/torch/quantization/plugins/huggingface.py
tests/unit/torch/quantization/plugins/test_fused_experts.py
tests/unit/torch/quantization/test_local_hessian.py

claude

Claude review summary

Findings: CRITICAL: 0, IMPORTANT: 1, SUGGESTION: 1

The refactor is structurally clean — the new _LocalHessianAccumulator, the shared _mse_calibrate_weights helper, and the register_calibration_input_hooks extension point all match well with the existing MSE flow, and the per-expert capture via _current_expert_idx is a nice fit for the fused-MoE plugin. Tests cover the accumulator math, the dense end-to-end path, the fallback warnings, and the per-expert MoE path.

Most impactful finding

[IMPORTANT Compatibility] During the activation-capture forward, SequentialQuantizer-wrapped weight quantizers are no longer silenced. The isinstance(weight_quantizer, TensorQuantizer) filter on the silencing list excludes them (SequentialQuantizer is an nn.Sequential), whereas the previous implementation disabled them via the delegated SequentialQuantizer.disable()/enable(). For layers that use SequentialQuantizer for weights (e.g. INT4 weights + FP8 scale, or downgraded restore paths), the Hessian-capture forward now propagates quantized weights, subtly corrupting the captured Hessian for downstream TensorQuantizer layers. The PR's bit-identical Qwen3-8B check uses pure TensorQuantizer paths and wouldn't catch this. See inline comment at model_calib.py:776–786 for a suggested fix (iterate SequentialQuantizer leaves when building the silencing list).

Nit

[SUGGESTION] With debug=True, model._local_hessian_accumulators retains accumulators whose hessian_per_block was already freed by build_error_func. The docstring's "retain … for inspection" implies the Hessian tensor is inspectable. Either snapshot before build_error_func, or tighten the docstring.

Risk assessment

Low–medium. The common path (TensorQuantizer-only weight quantizers, e.g. INT8 / FP8 / static NVFP4) is preserved bit-identically per the author's own check. The SequentialQuantizer regression affects only INT4-with-FP8-scale-style configurations and only manifests as a small Hessian-quality drift for downstream TensorQuantizer refinements. Worth fixing before merge but doesn't break the headline use case.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/torch/quantization/test_local_hessian.py`:
- Line 184: Move the in-test import of SequentialQuantizer to the module top:
remove the inline "from modelopt.torch.quantization.nn import
SequentialQuantizer" inside the test and add it to the file's top-level imports
so import errors surface at collection time; keep the inline import only if
there is a documented circular/optional-dependency reason and add a comment
explaining that. Ensure the test references the top-level SequentialQuantizer
symbol and run tests to confirm no circular import was introduced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b1029f88-ec3a-452d-8f44-3f0fd96d0a83

📥 Commits

Reviewing files that changed from the base of the PR and between 0bfe1d4 and 63ff148.

📒 Files selected for processing (4)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/nn/modules/quant_module.py
tests/unit/torch/quantization/plugins/test_fused_experts.py
tests/unit/torch/quantization/test_local_hessian.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unit/torch/quantization/plugins/test_fused_experts.py
modelopt/torch/quantization/model_calib.py

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

cjluo-nv · 2026-06-01T21:39:57Z

Can we test doing this on a model and runs some simple evals and compare that with the base MSE without hessian?

Fridah-nv · 2026-06-01T21:47:10Z

/claude review

Fridah-nv · 2026-06-01T21:48:49Z

Can we test doing this on a model and runs some simple evals and compare that with the base MSE without hessian?

I plan to do a larger scale experiments after this PR lands, I'd like to merge it first to make sure basic correctness and code quality.

Refactor local_hessian onto shared MSE flow + fused-MoE expert support

6e60751

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Decouple fused-MoE local-Hessian via activation-capture hook

0bfe1d4

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv marked this pull request as ready for review June 1, 2026 20:45

Fridah-nv requested a review from a team as a code owner June 1, 2026 20:45

Fridah-nv requested a review from realAsma June 1, 2026 20:45

cjluo-nv reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_calib.py Outdated

Comment thread modelopt/torch/quantization/model_calib.py Outdated

Comment thread modelopt/torch/quantization/nn/modules/quant_module.py

Comment thread tests/unit/torch/quantization/plugins/test_fused_experts.py Outdated

claude Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_calib.py

claude Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_calib.py

address review feedbacks

63ff148

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/unit/torch/quantization/test_local_hessian.py Outdated

minor: more feedback

9a6bc61

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai Bot approved these changes Jun 1, 2026

View reviewed changes

Conversation

Fridah-nv commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-01 21:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Fridah-nv commented Jun 1, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude review summary

Most impactful finding

Nit

Risk assessment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv commented Jun 1, 2026

Uh oh!

Fridah-nv commented Jun 1, 2026

Uh oh!

Fridah-nv commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fridah-nv commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

github-actions Bot commented Jun 1, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-01 21:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 1, 2026 •

edited

Loading