Skip to content

Add calib_include/exclude_modules to calibration algorithms#1043

Open
Fridah-nv wants to merge 7 commits intomainfrom
fridah/in_exclude_mod
Open

Add calib_include/exclude_modules to calibration algorithms#1043
Fridah-nv wants to merge 7 commits intomainfrom
fridah/in_exclude_mod

Conversation

@Fridah-nv
Copy link
Contributor

@Fridah-nv Fridah-nv commented Mar 16, 2026

What does this PR do?

Type of change: New feature

Adds calib_include_modules and calib_exclude_modules fields to QuantizeAlgorithmConfig
so users can restrict any calibration algorithm (max, mse, smoothquant, awq, …) to a
subset of the model's layers. Patterns are fnmatch wildcards matched against module names
(e.g. "*lm_head*", "*self_attn*").

Implementation:

  • New filter_calib_modules context manager in model_calib.py temporarily disables
    TensorQuantizer instances in non-matching modules. TensorQuantizer.disable() does not
    clear _amax, so excluded modules retain their pre-existing calibration state.
  • Fields live on the base QuantizeAlgorithmConfig, so filtering applies uniformly to all
    algorithms with no per-algorithm changes.
  • wrapped_calib_func in mode.py pops these fields and wraps every calibration call in
    filter_calib_modules automatically.

Interaction with "enable": false in quant_cfg:
Quantizers disabled via quant_cfg (i.e. _disabled=True) are skipped by filter_calib_modules
— they are never added to the restore list and are never re-enabled. Their disabled state is fully
preserved regardless of calib_exclude/include_modules.

Lower-level API:
When calling calibration functions directly (outside mtq.calibrate()), wrap manually with the
context manager:

from modelopt.torch.quantization.model_calib import filter_calib_modules, mse_calibrate

with filter_calib_modules(model, exclude_modules=["*lm_head*"]):
    mse_calibrate(model, forward_loop)

Usage

    quant_cfg = copy.deepcopy(mtq.INT8_DEFAULT_CFG)
    quant_cfg["algorithm"] = {"method": "max", "calib_exclude_modules": ["*attn*"]}

    mtq.quantize(model, quant_cfg, forward_loop=forward_loop)

Testing

Added 6 new unit tests in tests/unit/torch/quantization/test_calib.py:

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ❌ (pending)

Additional Information

Summary by CodeRabbit

  • New Features

    • Add module-level calibration filtering and pattern-based include/exclude options so calibration can target or skip specific modules.
    • Add a public context mechanism to scope calibration to selected modules.
  • Refactor

    • Rename exported base calibration config to "CalibrationConfig" (propagated across config/type system).
  • Tests

    • Expanded tests to cover include/exclude filtering, wildcards, and config-driven calibration behavior.

Adds calib_include_modules and calib_exclude_modules fields to
QuantizeAlgorithmConfig so users can restrict any calibration algorithm
(max, mse, smoothquant, awq, ...) to a subset of the model's layers.
Filtering is applied via the new filter_calib_modules context manager,
which temporarily disables TensorQuantizer instances in non-matching
modules while preserving their pre-existing _amax values.

Also exposes --calib_include_modules / --calib_exclude_modules CLI args
in the hf_ptq.py example and wires them through build_quant_cfg in
example_utils.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv self-assigned this Mar 16, 2026
@Fridah-nv Fridah-nv requested review from a team as code owners March 16, 2026 03:35
@Fridah-nv Fridah-nv requested a review from Edwardf0t1 March 16, 2026 03:35
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Renamed the base calibration config to CalibrationConfig, added include_modules/exclude_modules fields for fnmatch-based module selection, implemented filter_calib_modules context manager that disables/re-enables per-module quantizers, and integrated filtering into the calibration workflow. Tests added to validate include/exclude behavior and patterns.

Changes

Cohort / File(s) Summary
Config
modelopt/torch/quantization/config.py
Renamed QuantizeAlgorithmConfigCalibrationConfig; added public fields `include_modules: list[str]
Calibration filtering
modelopt/torch/quantization/model_calib.py
Added filter_calib_modules(model, include_modules=None, exclude_modules=None) context manager (fnmatch-based) that disables TensorQuantizer instances for non-matching quantized linear modules and re-enables them on exit; exported via __all__.
Calibration integration / Modes
modelopt/torch/quantization/mode.py
Imported CalibrationConfig and filter_calib_modules; updated wrapped calibration function signature to accept CalibrationConfig, extracted include_modules/exclude_modules, and wrapped calibration execution with with filter_calib_modules(...); adjusted mode descriptor type hints to CalibrationConfig.
Tests
tests/unit/torch/quantization/test_calib.py, tests/unit/torch/quantization/test_mode.py
Expanded and added tests for module filtering (include/exclude, wildcard/bracket matching, no-op behavior) and updated tests to use CalibrationConfig; added helper scaffolds to assert amax buffers and calibration scope.

Sequence Diagram

sequenceDiagram
    participant User
    participant Config as CalibrationConfig
    participant Mode as wrapped_calib_func
    participant Filter as filter_calib_modules
    participant Model as QuantizedModel
    participant Quant as TensorQuantizers

    User->>Config: provide include_modules / exclude_modules
    Config-->>Mode: pass config (include/exclude)
    Mode->>Filter: enter context with patterns
    Filter->>Model: scan modules (fnmatch)
    Filter->>Quant: disable quantizers in non-matching modules
    Mode->>Model: run calibration routine (collect stats)
    rect rgba(100, 150, 200, 0.5)
        Model->>Quant: update amax/buffers for matching modules
    end
    Mode->>Filter: exit context
    Filter->>Quant: re-enable previously disabled quantizers
    Filter-->>Mode: context ends
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding include/exclude module filtering fields to calibration algorithms, which is reflected across all modified files.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. All files use only safe standard library imports (contextlib, fnmatch) with no dangerous operations like torch.load without weights_only, eval/exec on untrusted input, or new external dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/in_exclude_mod
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.14%. Comparing base (58417e5) to head (5a8c907).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/mode.py 62.50% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1043      +/-   ##
==========================================
+ Coverage   70.07%   70.14%   +0.06%     
==========================================
  Files         221      221              
  Lines       25499    25572      +73     
==========================================
+ Hits        17869    17938      +69     
- Misses       7630     7634       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/torch/quantization/test_calib.py (1)

469-473: Strengthen the no-op assertion to validate actual no-op behavior.

This currently only checks that amax is present. It should compare against the baseline snapshot to verify no values changed.

💡 Proposed test tightening
     # Amaxes should be consistent with standard max calibration (not None)
     for name in amaxes_before:
         amax_after = _get_weight_amax(model, name)
-        assert amax_after is not None, f"{name} should have a valid amax after calibration"
+        assert torch.allclose(amaxes_before[name], amax_after), (
+            f"{name} changed under no-op filter context"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_calib.py` around lines 469 - 473, The test
currently only asserts presence of amax after calibration; strengthen it by
asserting the amax values did not change: for each name in amaxes_before, fetch
amax_after using _get_weight_amax(model, name) and assert equality (or
approximate equality if floats) against amaxes_before[name] instead of just
checking not None; update the loop that references amaxes_before and
_get_weight_amax to perform the comparison (use math.isclose or pytest.approx
for float comparisons).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 252-266: The code currently mutates quant_cfg["algorithm"] in
place (when it's a dict or converted from a string), which can leak changes into
shared presets; update the logic to create a copy of the algorithm dict before
adding calib_exclude_modules/calib_include_modules (e.g., if
isinstance(quant_cfg["algorithm"], str) set quant_cfg["algorithm"] = {"method":
...} as a new dict, and if it's a dict replace it with a shallow or deep copy
like new_alg = dict(quant_cfg["algorithm"]) or copy.deepcopy(...) and assign
quant_cfg["algorithm"] = new_alg) then add the calib keys to that copy,
referencing quant_cfg["algorithm"], calib_exclude_modules, and
calib_include_modules when applying the changes.

In `@examples/llm_ptq/hf_ptq.py`:
- Around line 1249-1258: The parsed module-pattern lists
args.calib_exclude_modules and args.calib_include_modules must drop
empty/whitespace-only entries; update the list comprehensions to filter out
items where p.strip() is empty (e.g., use [p.strip() for p in
args.calib_exclude_modules.split(",") if p.strip()] and similarly for
calib_include_modules) and keep the existing conditional that yields None when
the original arg is falsy.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 107-113: The current loop only inspects modules where
is_quantized_linear(module) is true, so TensorQuantizer instances in non-linear
modules are not disabled when _should_calibrate(name) is false; change the logic
to iterate all modules from model.named_modules(), and for any module whose name
fails _should_calibrate(name) traverse its children to find TensorQuantizer
instances and call disable() on those not already _disabled, appending them to
disabled (i.e., replace the is_quantized_linear(module) guard with a direct
check of _should_calibrate(name) and disable all TensorQuantizer children
accordingly).

---

Nitpick comments:
In `@tests/unit/torch/quantization/test_calib.py`:
- Around line 469-473: The test currently only asserts presence of amax after
calibration; strengthen it by asserting the amax values did not change: for each
name in amaxes_before, fetch amax_after using _get_weight_amax(model, name) and
assert equality (or approximate equality if floats) against amaxes_before[name]
instead of just checking not None; update the loop that references amaxes_before
and _get_weight_amax to perform the comparison (use math.isclose or
pytest.approx for float comparisons).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 244f9095-d6a1-4b28-b47a-e77c51d699d9

📥 Commits

Reviewing files that changed from the base of the PR and between 1070d89 and 2a0feaf.

📒 Files selected for processing (6)
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/mode.py
  • modelopt/torch/quantization/model_calib.py
  • tests/unit/torch/quantization/test_calib.py

- Fix shared preset mutation in build_quant_cfg by always deep-copying
  the preset dict before modification (previously only awq path did this)
- Document linear-only filtering limitation in filter_calib_modules
  docstring and calib_include/exclude_modules field descriptions
- Filter empty strings from CLI pattern parsing in hf_ptq.py to handle
  trailing commas gracefully
- Strengthen test_filter_no_op_when_none to assert amax value equality
  rather than just non-None presence

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 251-265: The Gemma/smoothquant override replaces
quant_cfg["algorithm"] with a new dict and inadvertently drops
calib_include_modules/calib_exclude_modules; when you set the Gemma override
(the code path that assigns quant_cfg["algorithm"] = {"method": "int8_sq", ...}
or similar), merge or copy any existing
calib_include_modules/calib_exclude_modules from the previous
quant_cfg["algorithm"] (or from the local
calib_include_modules/calib_exclude_modules variables) into the new dict instead
of overwriting them — i.e., build the override dict then set
override_dict["calib_include_modules"]=calib_include_modules (if present) and
override_dict["calib_exclude_modules"]=calib_exclude_modules (if present) before
assigning back to quant_cfg["algorithm"] so the filters are preserved for
Gemma/smoothquant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a2cf9898-e53a-4981-8c68-5cdd70010bd6

📥 Commits

Reviewing files that changed from the base of the PR and between 2a0feaf and 378d524.

📒 Files selected for processing (5)
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/model_calib.py
  • tests/unit/torch/quantization/test_calib.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/torch/quantization/config.py
  • examples/llm_ptq/hf_ptq.py

Fridah-nv and others added 2 commits March 16, 2026 23:15
Users should set calib_include_modules / calib_exclude_modules directly
in the algorithm dict of their quantization config rather than via
dedicated CLI flags. Remove --calib_exclude_modules / --calib_include_modules
from hf_ptq.py and the corresponding parameters from build_quant_cfg.

Update test_filter_via_config_api to exercise the intended usage path:
embedding both fields in the algorithm dict and calling mtq.quantize,
covering exclude and include variants and asserting that uncalibrated
module _amax buffers are absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
calib_include/exclude_modules is a core library feature accessed via
the algorithm config dict; example scripts should not be modified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
examples/llm_ptq/example_utils.py (1)

250-251: ⚠️ Potential issue | 🟠 Major

Gemma override still drops calibration filters and other algorithm fields.

Replacing quant_cfg["algorithm"] here discards previously set keys (e.g., calib_include_modules, calib_exclude_modules, and moe_calib_experts_ratio), so filtering silently stops working on this path.

Suggested fix
-    if model_type == "gemma" and "int8_sq" in qformat:
-        quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": 0.5}
+    if model_type == "gemma" and "int8_sq" in qformat:
+        if isinstance(quant_cfg.get("algorithm"), dict):
+            quant_cfg["algorithm"]["method"] = "smoothquant"
+            quant_cfg["algorithm"]["alpha"] = 0.5
+        else:
+            quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": 0.5}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 250 - 251, In the branch
where model_type == "gemma" and "int8_sq" in qformat, don't replace
quant_cfg["algorithm"] outright (which discards existing keys like
calib_include_modules, calib_exclude_modules, and moe_calib_experts_ratio);
instead merge the new algorithm entries into the existing dict (e.g., ensure
quant_cfg.get("algorithm", {}) is updated with {"method": "smoothquant",
"alpha": 0.5"}) so existing calibration/filtering fields are preserved while
setting/overriding only the needed algorithm keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 250-251: In the branch where model_type == "gemma" and "int8_sq"
in qformat, don't replace quant_cfg["algorithm"] outright (which discards
existing keys like calib_include_modules, calib_exclude_modules, and
moe_calib_experts_ratio); instead merge the new algorithm entries into the
existing dict (e.g., ensure quant_cfg.get("algorithm", {}) is updated with
{"method": "smoothquant", "alpha": 0.5"}) so existing calibration/filtering
fields are preserved while setting/overriding only the needed algorithm keys.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fd05d10c-16cc-4a19-aaa2-7c37c90c903f

📥 Commits

Reviewing files that changed from the base of the PR and between 378d524 and 22bd94c.

📒 Files selected for processing (3)
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • tests/unit/torch/quantization/test_calib.py
💤 Files with no reviewable changes (1)
  • examples/llm_ptq/hf_ptq.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/torch/quantization/test_calib.py

),
)

calib_include_modules: list[str] | None = ModeloptField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the calib prefix? Is not it obvious that this is for calib include_modules?

Suggested change
calib_include_modules: list[str] | None = ModeloptField(
include_modules: list[str] | None = ModeloptField(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are using QuantizeAlgorithmConfig, in this case, user might think include_modules is for modules to quantize. I agree with Shengliang that CalibrationConfig is a more accurate name. Added an update for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, that sounds good!

Comment on lines +418 to +421
# net.4 should be untouched
assert torch.allclose(amax_net4_before, _get_weight_amax(model, "net.4")), (
"Excluded module net.4 should have unchanged amax"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using the same dataset for mtq.calibrate? How will amax change if we do this?
How about we dont pass in forward_loop in _make_quantized_mlp() and pass forward_loop in mtq.calibrate( -> this way net.4 wont have amax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_quantized_mlp uses max_calibrate while mtq.calibrate below uses mse. This case I can also test if a pre-existing amax is preserved. It also matches what we expect users to do (users run include/exclude after an initial calibration pass), but happy to change it if you prefer the simpler version.

Copy link
Contributor

@realAsma realAsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some comments - please address it.

else:
# Direct calibration (existing behavior)
func(model, forward_loop=forward_loop, **kwargs)
with filter_calib_modules(model, calib_include_modules, calib_exclude_modules):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check. If in the future we want to run multiple algorithms in the sequential flow, for example local_hessian followed by gptq will this work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another follow up question. Do we plan on running different calibration algorithms for different layers in the future? Can this context manager be helpful in that case?

Copy link
Contributor Author

@Fridah-nv Fridah-nv Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in the future we want to run multiple algorithms in the sequential flow, for example local_hessian followed by gptq will this work?

It should work by calling mtq.calibrate() twice.

Do we plan on running different calibration algorithms for different layers in the future?

Yes, one example is that we want to run awq on layers that can fold the pre_quant_scale. We can apply the calibration algos sequentially. I don't think we can run within one calibration loop if that's what you are asking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So for the case where we use sequential_calibrate and want to run multiple algos on one layer before moving on to the next, we are restricted to have the same list of include/exclude modules. Is my understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, We can discuss more if we want interleave two algorithms layer-by-layer within a single pass (in sequential calibration flow)

@cjluo-nv
Copy link
Collaborator

Questions:

  1. Will there be a case that include and exclude have conflicts (a module listed in both)?
  2. If a module is not in the include list or exclude list, what's the behavior?

Comment on lines +1087 to +1108
calib_include_modules: list[str] | None = ModeloptField(
default=None,
title="Patterns of modules to include in calibration.",
description=(
"If provided, only modules whose names match at least one of the fnmatch patterns are "
"calibrated. Modules that do not match any pattern are skipped and retain their "
"pre-existing calibration state. "
"Note: filtering applies only to quantized linear modules; TensorQuantizers in "
"non-linear modules (e.g. layer norms, embeddings) are unaffected."
),
)

calib_exclude_modules: list[str] | None = ModeloptField(
default=None,
title="Patterns of modules to exclude from calibration.",
description=(
"If provided, modules whose names match at least one of the fnmatch patterns are "
"skipped during calibration and retain their pre-existing calibration state. "
"Note: filtering applies only to quantized linear modules; TensorQuantizers in "
"non-linear modules (e.g. layer norms, embeddings) are unaffected."
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user passes both calib_include_modules and calib_exclude_modules, the behavior is implicitly "include first, then exclude"? Do you think we need to either:

  • Documented explicitly (what happens if a module matches both?), or
  • Validated to raise an error if both are set simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ordering does not actually matter, but we do need to document a clear semantic. These 2 actually are 2 exclude module lists:

  1. exclude those that not in the incllude_modules
  2. exclude those that in the exclude_modeuls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude takes precedence: _should_calibrate checks include_modules first (if the module is not in the include list it returns False), then checks exclude_modules (if matched it returns False). So a module in both lists is excluded. Added a doc note to make this explicit. Thanks for pointing out.

Comment on lines +1087 to +1108
calib_include_modules: list[str] | None = ModeloptField(
default=None,
title="Patterns of modules to include in calibration.",
description=(
"If provided, only modules whose names match at least one of the fnmatch patterns are "
"calibrated. Modules that do not match any pattern are skipped and retain their "
"pre-existing calibration state. "
"Note: filtering applies only to quantized linear modules; TensorQuantizers in "
"non-linear modules (e.g. layer norms, embeddings) are unaffected."
),
)

calib_exclude_modules: list[str] | None = ModeloptField(
default=None,
title="Patterns of modules to exclude from calibration.",
description=(
"If provided, modules whose names match at least one of the fnmatch patterns are "
"skipped during calibration and retain their pre-existing calibration state. "
"Note: filtering applies only to quantized linear modules; TensorQuantizers in "
"non-linear modules (e.g. layer norms, embeddings) are unaffected."
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ordering does not actually matter, but we do need to document a clear semantic. These 2 actually are 2 exclude module lists:

  1. exclude those that not in the incllude_modules
  2. exclude those that in the exclude_modeuls

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
modelopt/torch/quantization/model_calib.py (1)

91-93: ⚠️ Potential issue | 🟠 Major

Filtering still skips non-linear module quantizers.

Line 115 gates filtering behind is_quantized_linear(module), so TensorQuantizer instances attached to non-linear modules are never disabled even when their module name should be excluded. This leaves calibration partially unfiltered.

💡 Proposed fix
-    Note:
-        Only quantized linear modules (as identified by :func:`is_quantized_linear`) are filtered.
-        ``TensorQuantizer`` instances inside non-linear quantized modules (e.g. layer norms,
-        embeddings) are not disabled even if their module name matches a pattern.
+    Note:
+        Filters are evaluated against each ``TensorQuantizer`` parent module name.

-    disabled = []
-    for name, module in model.named_modules():
-        if is_quantized_linear(module) and not _should_calibrate(name):
-            for _, child in module.named_modules():
-                if isinstance(child, TensorQuantizer) and not child._disabled:
-                    child.disable()
-                    disabled.append(child)
+    disabled: list[TensorQuantizer] = []
+    for quantizer_name, quantizer in model.named_modules():
+        if not isinstance(quantizer, TensorQuantizer):
+            continue
+        parent_name = quantizer_name.rsplit(".", 1)[0] if "." in quantizer_name else ""
+        if _should_calibrate(parent_name) or quantizer._disabled:
+            continue
+        quantizer.disable()
+        disabled.append(quantizer)

Also applies to: 113-119

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 91 - 93, The current
filtering only runs when is_quantized_linear(module) is true, so TensorQuantizer
instances inside non-linear modules are never disabled; remove that gate and
apply the name-pattern exclusion to all modules that contain TensorQuantizer
instances. Locate the block in model_calib.py where is_quantized_linear(module)
is checked and instead iterate over module._tensor_quantizers (or however
TensorQuantizer instances are accessed) for every module, and if the module name
matches the exclusion pattern disable each TensorQuantizer (e.g., call the
existing disable/enable API or set the same flag used for linear modules); make
the same change for the duplicate logic region covering the other occurrence
noted (lines referenced in the review).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 91-93: The current filtering only runs when
is_quantized_linear(module) is true, so TensorQuantizer instances inside
non-linear modules are never disabled; remove that gate and apply the
name-pattern exclusion to all modules that contain TensorQuantizer instances.
Locate the block in model_calib.py where is_quantized_linear(module) is checked
and instead iterate over module._tensor_quantizers (or however TensorQuantizer
instances are accessed) for every module, and if the module name matches the
exclusion pattern disable each TensorQuantizer (e.g., call the existing
disable/enable API or set the same flag used for linear modules); make the same
change for the duplicate logic region covering the other occurrence noted (lines
referenced in the review).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 972421ca-3aec-4d1f-9b2e-65039170b3cf

📥 Commits

Reviewing files that changed from the base of the PR and between 22bd94c and c1ad3b2.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/model_calib.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/quantization/config.py

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@realAsma
Copy link
Contributor

Will there be a case that include and exclude have conflicts (a module listed in both)?

This is a good concern. Should we make these to exclusive -> that is only one can be specified at a time?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/torch/quantization/test_mode.py (1)

46-47: Annotate config_class as returning the class, not an instance.

Line 46 returns TestConfig itself, so the annotation should be type[CalibrationConfig], not CalibrationConfig. That keeps the test aligned with BaseCalibrateModeDescriptor.config_class and avoids noisy mypy failures.

🛠️ Proposed typing fix
-        def config_class(self) -> CalibrationConfig:
+        def config_class(self) -> type[CalibrationConfig]:
             return TestConfig
As per coding guidelines, "Use mypy for type checking on Python code (configured in `pyproject.toml`)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_mode.py` around lines 46 - 47, The test's
config_class method currently annotated to return a CalibrationConfig instance
but actually returns the class TestConfig; change the return type to a class
type (use type[CalibrationConfig] or Type[CalibrationConfig]) so it matches
BaseCalibrateModeDescriptor.config_class and avoids mypy errors—update the
signature of def config_class(self) -> type[CalibrationConfig]: (or
Type[CalibrationConfig]) and ensure TestConfig remains the returned symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1048-1049: Existing public base class name QuantizeAlgorithmConfig
was renamed to ModeloptBaseConfig and that breaks imports; restore a deprecated
compatibility alias by defining QuantizeAlgorithmConfig = ModeloptBaseConfig in
the module and emit a DeprecationWarning using the warnings module so users get
a notice (keep the alias in modelopt.torch.quantization.config alongside the new
ModeloptBaseConfig); ensure the alias is added near the class definition so
future imports work while guiding users to migrate.

In `@modelopt/torch/quantization/mode.py`:
- Around line 249-262: The current code applies filter_calib_modules around the
entire sequential_calibrate call, which disables excluded modules during the
recomputation of layer inputs; instead, remove the outer with
filter_calib_modules(...) wrapper and apply filtering only around the per-layer
calibration invocation after inputs are captured. Concretely, stop wrapping
sequential_calibrate(model, ...) with filter_calib_modules; either update
sequential_calibrate to call calib_func(layer, ...) inside a with
filter_calib_modules(model, include_modules, exclude_modules) block for each
layer after inputs are recomputed, or wrap the calib_func argument (func) with a
small wrapper that enters filter_calib_modules only when calling calib_func for
that layer so include_modules/exclude_modules only affect the calibration call,
not the input recomputation. Ensure symbols referenced are forward_loop,
sequential_calibrate, func/calib_func, filter_calib_modules, include_modules,
exclude_modules, and model.

---

Nitpick comments:
In `@tests/unit/torch/quantization/test_mode.py`:
- Around line 46-47: The test's config_class method currently annotated to
return a CalibrationConfig instance but actually returns the class TestConfig;
change the return type to a class type (use type[CalibrationConfig] or
Type[CalibrationConfig]) so it matches BaseCalibrateModeDescriptor.config_class
and avoids mypy errors—update the signature of def config_class(self) ->
type[CalibrationConfig]: (or Type[CalibrationConfig]) and ensure TestConfig
remains the returned symbol.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 879ea3f4-4c3b-4930-8aa5-463e0a51fe32

📥 Commits

Reviewing files that changed from the base of the PR and between c1ad3b2 and 68a4cee.

📒 Files selected for processing (4)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/mode.py
  • tests/unit/torch/quantization/test_calib.py
  • tests/unit/torch/quantization/test_mode.py

Comment on lines +249 to +262
with filter_calib_modules(model, include_modules, exclude_modules):
if sequential:
if forward_loop is None:
raise ValueError("forward_loop is required for calibration but got None.")
assert method in ["max"], (
f"Sequential calibration currently only supports max calibration, got {method}"
)
# Wrap with sequential processing
sequential_calibrate(
model,
forward_loop=forward_loop,
calib_func=func,
**kwargs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep module filtering out of sequential activation collection.

Line 249 wraps the entire sequential_calibrate(...) call. sequential_calibrate() first recomputes each layer's inputs from the full model before invoking calib_func(layer, ...), so excluded modules are also disabled during those recomputation passes. That means later included layers can be calibrated against activations that no longer reflect the already-calibrated earlier layers. Please scope the filter to the per-layer calibration call after inputs are captured, not to the whole sequential wrapper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/mode.py` around lines 249 - 262, The current code
applies filter_calib_modules around the entire sequential_calibrate call, which
disables excluded modules during the recomputation of layer inputs; instead,
remove the outer with filter_calib_modules(...) wrapper and apply filtering only
around the per-layer calibration invocation after inputs are captured.
Concretely, stop wrapping sequential_calibrate(model, ...) with
filter_calib_modules; either update sequential_calibrate to call
calib_func(layer, ...) inside a with filter_calib_modules(model,
include_modules, exclude_modules) block for each layer after inputs are
recomputed, or wrap the calib_func argument (func) with a small wrapper that
enters filter_calib_modules only when calling calib_func for that layer so
include_modules/exclude_modules only affect the calibration call, not the input
recomputation. Ensure symbols referenced are forward_loop, sequential_calibrate,
func/calib_func, filter_calib_modules, include_modules, exclude_modules, and
model.

Copy link
Contributor

@meenchen meenchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fridah-nv
Copy link
Contributor Author

  • Will there be a case that include and exclude have conflicts (a module listed in both)?

A module in both lists is excluded. I added a line in the config doc-string to state it explicitly.

  • If a module is not in the include list or exclude list, what's the behavior?

If a module is not in either of the lists, it keeps its previous enable state from quantization.

@Fridah-nv
Copy link
Contributor Author

Will there be a case that include and exclude have conflicts (a module listed in both)?

This is a good concern. Should we make these to exclusive -> that is only one can be specified at a time?

I'd keep them independent. Currently exclude takes precedence if both lists include a module. One use case is include_modules=["mlp"], exclude_modules=["mlpgate*"] -- include a broad pattern, then carve out exceptions. (Though I don't have a very complicate use case in mind yet) Wdyt?

@shengliangxu
Copy link
Contributor

shengliangxu commented Mar 18, 2026

Will there be a case that include and exclude have conflicts (a module listed in both)?

This is a good concern. Should we make these to exclusive -> that is only one can be specified at a time?

I'd keep them independent. Currently exclude takes precedence if both lists include a module. One use case is include_modules=["mlp"], exclude_modules=["_mlp_gate*"] -- include a broad pattern, then carve out exceptions. (Though I don't have a very complicate use case in mind yet) Wdyt?

I would like to reiterate that there is really no precedence here:

These 2 actually are 2 exclude module lists:

exclude those that are not in the include_modules
exclude those that are in the exclude_modules

so, the ordering of applying them do not matter, it's exclusion of the union of the 2

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants