Fridah/kinjal/vllm modelopt reload#1068
Fridah/kinjal/vllm modelopt reload#1068Fridah-nv wants to merge 1 commit intokinjal/vllm_modelopt_reloadfrom
Conversation
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip Flake8 can be used to improve the quality of Python code reviews.Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script. To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root. See Flake8 Documentation for more details. |
|
|
||
| Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface: | ||
|
|
||
| ```bash | ||
| python hf_ptq_export.py \ | ||
| --pyt_ckpt_path <MODEL_PATH> \ | ||
| --quant_cfg NVFP4_DEFAULT_CFG \ | ||
| --export_path <EXPORT_DIR> \ | ||
| --trust_remote_code | ||
| ``` | ||
|
|
There was a problem hiding this comment.
We should remove hf_ptq_export.py -> in my understanding examples/llm_ptq/hf_ptq.py should be sufficient. Is that correct @kinjalpatel27 ?
There was a problem hiding this comment.
yes its sufficient. we added hf_ptq_export.py to not overcrowd the hf_ptq.py example.
| parser.add_argument( | ||
| "--vllm_fakequant_export", | ||
| default=False, | ||
| action="store_true", | ||
| help="Export as vLLM fake-quant checkpoint (produces vllm_fq_modelopt_state.pth " | ||
| "for use with vllm_serve_fakequant.py).", | ||
| ) |
There was a problem hiding this comment.
move this to the end of arguments
There was a problem hiding this comment.
also lets add a line in hf_ptq.py readme about this argument, pointing to vllm_serve readme export section
|
|
||
| Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface: | ||
|
|
||
| ```bash | ||
| python hf_ptq_export.py \ | ||
| --pyt_ckpt_path <MODEL_PATH> \ | ||
| --quant_cfg NVFP4_DEFAULT_CFG \ | ||
| --export_path <EXPORT_DIR> \ | ||
| --trust_remote_code | ||
| ``` | ||
|
|
There was a problem hiding this comment.
| Alternatively, the dedicated `hf_ptq_export.py` script (**deprecated** — use `hf_ptq.py` with `--vllm_fakequant_export` instead) can be used for a simpler interface: | |
| ```bash | |
| python hf_ptq_export.py \ | |
| --pyt_ckpt_path <MODEL_PATH> \ | |
| --quant_cfg NVFP4_DEFAULT_CFG \ | |
| --export_path <EXPORT_DIR> \ | |
| --trust_remote_code | |
| ``` |
What does this PR do?
Type of change: ?
Enable vllm fakequant export in
examples/llm_ptq: example command:Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information