Skip to content

GPTQ official#853

Draft
sugunav14 wants to merge 33 commits intomainfrom
svelury/gptq-official
Draft

GPTQ official#853
sugunav14 wants to merge 33 commits intomainfrom
svelury/gptq-official

Conversation

@sugunav14
Copy link
Contributor

@sugunav14 sugunav14 commented Feb 4, 2026

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this
python hf_ptq.py     --pyt_ckpt_path Qwen/Qwen3-8B     --qformat nvfp4_gptq     --kv_cache_qformat none     --dataset cnn_dailymail     --batch_size 32     --calib_seq 512     --calib_size 512    --export_path exported_model

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f84821ad-a3b1-451c-ad6f-e60227d966a5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch svelury/gptq-official
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 25.18519% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.39%. Comparing base (e024097) to head (8ebd924).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 11.86% 52 Missing ⚠️
modelopt/torch/quantization/utils.py 19.14% 38 Missing ⚠️
modelopt/torch/utils/network.py 9.09% 10 Missing ⚠️
modelopt/torch/quantization/mode.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #853      +/-   ##
==========================================
- Coverage   73.73%   73.39%   -0.35%     
==========================================
  Files         196      196              
  Lines       20412    20583     +171     
==========================================
+ Hits        15050    15106      +56     
- Misses       5362     5477     +115     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.



@torch.no_grad()
def sequential_calibrate(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fridah-nv reference for sequential calibration
cc @realAsma

@sugunav14 sugunav14 force-pushed the svelury/gptq-official branch 2 times, most recently from b5f6073 to a8540d4 Compare February 24, 2026 00:17
@sugunav14 sugunav14 force-pushed the svelury/gptq-official branch 4 times, most recently from 566c9ae to bdbeb03 Compare March 6, 2026 00:56
Fridah-nv and others added 20 commits March 18, 2026 23:31
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
…ntizer, NVFP4MSECalibrator (#849)

**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->

**Overview:** ?

<!-- You can potentially add a usage example below. -->

```python
```

<!-- Mention how have you tested your change if applicable. -->

<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->

<!-- E.g. related issue. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **New Features**
* Added NVFP4StaticQuantizer for improved 4-bit quantization with
enhanced precision control
* Introduced NVFP4MSECalibrator with flexible candidate generation for
calibration optimization

* **Improvements**
* Optimized GPU kernels for Hopper+ graphics cards with better
performance
  * Extended Triton support to broader GPU compatibility
* Enhanced backward compatibility for restoring previously quantized
models

* **Tests**
* Added comprehensive test coverage for new quantizers and calibration
methods

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <akuriparambi@nvidia.com>
…FP4QTensor

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
@sugunav14 sugunav14 force-pushed the svelury/gptq-official branch from 8c4ff0e to c311950 Compare March 18, 2026 23:51
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
@sugunav14 sugunav14 force-pushed the svelury/gptq-official branch from 0aa088c to 3f2d7c0 Compare March 20, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants