Skip to content

Fridah/kinjal/vllm modelopt reload#1068

Draft
Fridah-nv wants to merge 1 commit intokinjal/vllm_modelopt_reloadfrom
fridah/kinjal/vllm_modelopt_reload
Draft

Fridah/kinjal/vllm modelopt reload#1068
Fridah-nv wants to merge 1 commit intokinjal/vllm_modelopt_reloadfrom
fridah/kinjal/vllm_modelopt_reload

Conversation

@Fridah-nv
Copy link
Contributor

@Fridah-nv Fridah-nv commented Mar 18, 2026

What does this PR do?

Type of change: ?

Enable vllm fakequant export in examples/llm_ptq: example command:

python ../llm_ptq/hf_ptq.py \
  --pyt_ckpt_path <MODEL_PATH> \
  --qformat nvfp4 \
  --calib_size 512 \
  --export_path <EXPORT_DIR> \
  --vllm_fakequant_export \
  --trust_remote_code

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 758471f1-4c68-4561-ae86-6e93d047977d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/kinjal/vllm_modelopt_reload
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Flake8 can be used to improve the quality of Python code reviews.

Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script.

To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root.

See Flake8 Documentation for more details.

@Fridah-nv Fridah-nv changed the base branch from main to kinjal/vllm_modelopt_reload March 18, 2026 21:39
@realAsma realAsma requested a review from kinjalpatel27 March 19, 2026 13:34
Comment on lines +76 to +86

Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface:

```bash
python hf_ptq_export.py \
--pyt_ckpt_path <MODEL_PATH> \
--quant_cfg NVFP4_DEFAULT_CFG \
--export_path <EXPORT_DIR> \
--trust_remote_code
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove hf_ptq_export.py -> in my understanding examples/llm_ptq/hf_ptq.py should be sufficient. Is that correct @kinjalpatel27 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes its sufficient. we added hf_ptq_export.py to not overcrowd the hf_ptq.py example.

Comment on lines +1025 to +1031
parser.add_argument(
"--vllm_fakequant_export",
default=False,
action="store_true",
help="Export as vLLM fake-quant checkpoint (produces vllm_fq_modelopt_state.pth "
"for use with vllm_serve_fakequant.py).",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the end of arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lets add a line in hf_ptq.py readme about this argument, pointing to vllm_serve readme export section

Comment on lines +76 to +86

Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface:

```bash
python hf_ptq_export.py \
--pyt_ckpt_path <MODEL_PATH> \
--quant_cfg NVFP4_DEFAULT_CFG \
--export_path <EXPORT_DIR> \
--trust_remote_code
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface:
```bash
python hf_ptq_export.py \
--pyt_ckpt_path <MODEL_PATH> \
--quant_cfg NVFP4_DEFAULT_CFG \
--export_path <EXPORT_DIR> \
--trust_remote_code
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants