GPTQ official by sugunav14 · Pull Request #853 · NVIDIA/Model-Optimizer

sugunav14 · 2026-02-04T18:23:02Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this
python hf_ptq.py     --pyt_ckpt_path Qwen/Qwen3-8B     --qformat nvfp4_gptq     --kv_cache_qformat none     --dataset cnn_dailymail     --batch_size 32     --calib_seq 512     --calib_size 512    --export_path exported_model

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2026-02-04T18:23:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-04T18:23:12Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f84821ad-a3b1-451c-ad6f-e60227d966a5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch svelury/gptq-official

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-04T18:49:55Z

Codecov Report

❌ Patch coverage is 25.18519% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.39%. Comparing base (e024097) to head (8ebd924).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	11.86%	52 Missing ⚠️
modelopt/torch/quantization/utils.py	19.14%	38 Missing ⚠️
modelopt/torch/utils/network.py	9.09%	10 Missing ⚠️
modelopt/torch/quantization/mode.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #853      +/-   ##
==========================================
- Coverage   73.73%   73.39%   -0.35%     
==========================================
  Files         196      196              
  Lines       20412    20583     +171     
==========================================
+ Hits        15050    15106      +56     
- Misses       5362     5477     +115

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sugunav14 · 2026-02-14T01:59:39Z

modelopt/torch/quantization/model_calib.py

+
+
+@torch.no_grad()
+def sequential_calibrate(


@Fridah-nv reference for sequential calibration
cc @realAsma

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

…ntizer, NVFP4MSECalibrator (#849) **Type of change:** ?  **Overview:** ?  ```python ```   - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No    * **New Features** * Added NVFP4StaticQuantizer for improved 4-bit quantization with enhanced precision control * Introduced NVFP4MSECalibrator with flexible candidate generation for calibration optimization * **Improvements** * Optimized GPU kernels for Hopper+ graphics cards with better performance * Extended Triton support to broader GPU compatibility * Enhanced backward compatibility for restoring previously quantized models * **Tests** * Added comprehensive test coverage for new quantizers and calibration methods  --------- Signed-off-by: realAsma <akuriparambi@nvidia.com>

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 commented Feb 14, 2026

View reviewed changes

sugunav14 force-pushed the svelury/gptq-official branch 2 times, most recently from b5f6073 to a8540d4 Compare February 24, 2026 00:17

sugunav14 force-pushed the svelury/gptq-official branch 4 times, most recently from 566c9ae to bdbeb03 Compare March 6, 2026 00:56

Fridah-nv and others added 20 commits March 18, 2026 23:31

add rabbit feedback

bcc05b1

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

minor

e2c781e

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

tested perplexity

7f21be1

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested, revert later

9b600bf

tested

0cc53df

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

refactor

cde122a

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

address reviewers feedback, delegate scaling factor calculation to NV…

d1ebcca

…FP4QTensor Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

tested perplexity

2cf8294

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested exported checkpoints on 0211

abf6e8d

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested nano v3

c604539

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added activation MSE logging

f94e577

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

super v3 run

b570e7b

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

debug logs

ae30ff1

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

added activationmse logging helper

f809975

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

input amax sync added + tested gptq super sft checkpoint

8bf4e29

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

checkpoints generated on 0223

01640cf

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested perplexity

c1c7c96

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested, revert later

bcf6fe3

tested

558041c

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 added 10 commits March 18, 2026 23:46

initial cleanup

a907038

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cleanup

1e5ce74

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed stray config

96f35c8

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed stray prints

72faf3e

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

fix rebase issues

890f28b

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

minor

40a6657

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

tested e2e on qwen

cd9246a

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

removed perplexity eval

49e54cc

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

update

2bc18cd

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

revert later

c311950

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 force-pushed the svelury/gptq-official branch from 8c4ff0e to c311950 Compare March 18, 2026 23:51

sugunav14 added 3 commits March 19, 2026 06:32

minor update

fec8f89

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

update

806e8ac

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

gptq faster

3f2d7c0

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 force-pushed the svelury/gptq-official branch from 0aa088c to 3f2d7c0 Compare March 20, 2026 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ official#853

GPTQ official#853
sugunav14 wants to merge 33 commits intomainfrom
svelury/gptq-official

sugunav14 commented Feb 4, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov bot commented Feb 4, 2026

Uh oh!

sugunav14 Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@torch.no_grad()
		def sequential_calibrate(

Conversation

sugunav14 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov bot commented Feb 4, 2026

Codecov Report

Uh oh!

sugunav14 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sugunav14 commented Feb 4, 2026 •

edited

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading