Skip to content

Add agent skills for modelopt#1011

Draft
kaix-nv wants to merge 16 commits intomainfrom
kaix/modelopt_agent
Draft

Add agent skills for modelopt#1011
kaix-nv wants to merge 16 commits intomainfrom
kaix/modelopt_agent

Conversation

@kaix-nv
Copy link
Contributor

@kaix-nv kaix-nv commented Mar 9, 2026

What does this PR do?

Type of change: ?

Adds a Claude Code skill suite for interactive model optimization with ModelOpt. The skill guides users through an end-to-end workflow: optimize model with modelopt APIs, deploy on vLLM and benchmark speed, evaluate accuracy with NeMo Evaluator (nel).

Usage

Invoke the skill in Claude Code:

/ptq

Say which model you want to quantize and in what quantization spec, e.g. nvfp4 mlp only

Slack Bot Setup

1. Create the Slack App

Go to api.slack.com/apps:

  1. Create New App → "From scratch" → name it, e.g., modelopt-bot
  2. Settings → Socket Mode → Enable → Generate token with connections:write → save as SLACK_APP_TOKEN
  3. OAuth & Permissions → Bot Token Scopes → Add:
    • app_mentions:read
    • chat:write
    • im:history, im:read
  4. Events → Event Subscriptions → Enable → Subscribe to bot events:
    • app_mention
    • message.im
  5. Install App → Request to Workspace Install → Copy Bot User OAuth Token as SLACK_BOT_TOKEN
  6. App Home → Messages Tab → Check "Allow users to send Slash commands and messages from the messages tab"

2. Set up environment

cd slack-bot
pip install -r requirements.txt

export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
export SKILLS_CWD="/path/to/modelopt_agent"

3. Run the bot

python bot.py

Expected output:
Starting ModelOpt Slack Bot...
Skills directory: /path/to/modelopt_agent
Found skills: ptq, deployment, evaluation, modelopt

4. Test in Slack

  • DM the bot: hello
  • @modelopt quantize Qwen3-0.6B with fp8
  • @modelopt deploy ./qwen3-0.6b-fp8
  • @modelopt evaluate my model on mmlu

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: Kai Xu <kaix@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kaix-nv kaix-nv requested a review from mxinO March 9, 2026 23:30
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5f5a10d4-22ed-4db1-8a68-e6350f5d5278

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kaix/modelopt_agent
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.27%. Comparing base (a4fde49) to head (6770524).
⚠️ Report is 54 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1011      +/-   ##
==========================================
- Coverage   72.12%   70.27%   -1.86%     
==========================================
  Files         209      227      +18     
  Lines       23628    25857    +2229     
==========================================
+ Hits        17042    18170    +1128     
- Misses       6586     7687    +1101     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mxinO added 2 commits March 9, 2026 19:38
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO
Copy link
Contributor

mxinO commented Mar 10, 2026

Added a separate ptq skill, needs further tuning. Claude opus can follow the skill, but sonnet needs more guide.

kaix-nv added 2 commits March 10, 2026 15:59
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 18eb9c2 to 6968ad6 Compare March 11, 2026 00:47
Signed-off-by: Meng Xin <mxin@nvidia.com>
@Edwardf0t1
Copy link
Contributor

Edwardf0t1 commented Mar 12, 2026

@kaix-nv @mxinO This is a great starting point to use agent skills for modelopt workflows 👍 We should test it with various models and optimization recipes to polish the skills.

@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from bd2d3da to 4f61bad Compare March 12, 2026 23:13
Copy nel-assistant skill as local evaluation skill so we can extend it
to support optimized model evaluation requirements. Update modelopt
orchestrator to reference the evaluation skill.

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 4f61bad to 28928a1 Compare March 12, 2026 23:17
Add deployment skill (vLLM, SGLang, TRT-LLM serving) and update
modelopt orchestrator to support three pipelines:
- PTQ only
- PTQ + Deploy (serve as API endpoint)
- PTQ + Evaluate (accuracy benchmark)

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 3a320f6 to 5c46798 Compare March 13, 2026 02:03
mxinO added 2 commits March 12, 2026 21:05
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
@kaix-nv
Copy link
Contributor Author

kaix-nv commented Mar 13, 2026

@kaix-nv @mxinO This is a great starting point to use agent skills for modelopt workflows 👍 We should test it with various models and optimization recipes to polish the skills.

Thanks. The skills are still at an early stage, so it’d be great to get more people using them and giving feedback. Testing across a broader set of models and optimization recipes will help us iterate quickly and make the workflows more robust.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mxinO mxinO force-pushed the kaix/modelopt_agent branch from 4d6d126 to c658f4b Compare March 19, 2026 09:06
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO mxinO force-pushed the kaix/modelopt_agent branch from c658f4b to 1f58896 Compare March 19, 2026 09:09
kaix-nv added 2 commits March 19, 2026 10:22
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 1f58896 to 1eb9c85 Compare March 19, 2026 17:24
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO mxinO force-pushed the kaix/modelopt_agent branch from 1eb9c85 to 6770524 Compare March 20, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants