Conversation
Signed-off-by: Kai Xu <kaix@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1011 +/- ##
==========================================
- Coverage 72.12% 70.27% -1.86%
==========================================
Files 209 227 +18
Lines 23628 25857 +2229
==========================================
+ Hits 17042 18170 +1128
- Misses 6586 7687 +1101 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Added a separate ptq skill, needs further tuning. Claude opus can follow the skill, but sonnet needs more guide. |
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
18eb9c2 to
6968ad6
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
bd2d3da to
4f61bad
Compare
Copy nel-assistant skill as local evaluation skill so we can extend it to support optimized model evaluation requirements. Update modelopt orchestrator to reference the evaluation skill. Signed-off-by: Kai Xu <kaix@nvidia.com>
4f61bad to
28928a1
Compare
Add deployment skill (vLLM, SGLang, TRT-LLM serving) and update modelopt orchestrator to support three pipelines: - PTQ only - PTQ + Deploy (serve as API endpoint) - PTQ + Evaluate (accuracy benchmark) Signed-off-by: Kai Xu <kaix@nvidia.com>
3a320f6 to
5c46798
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Thanks. The skills are still at an early stage, so it’d be great to get more people using them and giving feedback. Testing across a broader set of models and optimization recipes will help us iterate quickly and make the workflows more robust. |
4d6d126 to
c658f4b
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
c658f4b to
1f58896
Compare
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
1f58896 to
1eb9c85
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
1eb9c85 to
6770524
Compare
What does this PR do?
Type of change: ?
Adds a Claude Code skill suite for interactive model optimization with ModelOpt. The skill guides users through an end-to-end workflow: optimize model with modelopt APIs, deploy on vLLM and benchmark speed, evaluate accuracy with NeMo Evaluator (nel).
Usage
Invoke the skill in Claude Code:
/ptq
Say which model you want to quantize and in what quantization spec, e.g. nvfp4 mlp only
Slack Bot Setup
1. Create the Slack App
Go to api.slack.com/apps:
modelopt-botconnections:write→ save asSLACK_APP_TOKENapp_mentions:readchat:writeim:history,im:readapp_mentionmessage.imSLACK_BOT_TOKEN2. Set up environment
cd slack-bot
pip install -r requirements.txt
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
export SKILLS_CWD="/path/to/modelopt_agent"
3. Run the bot
python bot.py
Expected output:
Starting ModelOpt Slack Bot...
Skills directory: /path/to/modelopt_agent
Found skills: ptq, deployment, evaluation, modelopt
4. Test in Slack
hello@modelopt quantize Qwen3-0.6B with fp8@modelopt deploy ./qwen3-0.6b-fp8@modelopt evaluate my model on mmluTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information