Skip to content

Add Megatron Inference model backend#1001

Draft
tdene wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
tdene:tene/megatron-inference
Draft

Add Megatron Inference model backend#1001
tdene wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
tdene:tene/megatron-inference

Conversation

@tdene
Copy link
Copy Markdown

@tdene tdene commented Apr 3, 2026

Megatron Inference is an inference back-end that is used for RL. We have been using it through Megatron RL and NeMo RL.

As it has a near-identical schema with vLLM, we have been doing RL through Gym by pretending it is vLLM.

However, Megatron Inference includes 3 extra fields that track per-token staleness and are important to RL. We have been monkey-patching these into Gym. That is not a long-term solution.

This PR establishes official support for Megatron Inference. Currently that consists of a mixin with 3 extra fields. In the future, Megatron Inference may evolve further.

@tdene tdene force-pushed the tene/megatron-inference branch from 3ec25b5 to 525abdf Compare April 3, 2026 17:25
@cmunley1
Copy link
Copy Markdown
Contributor

cmunley1 commented Apr 4, 2026

could you share some more context on this?

@tdene
Copy link
Copy Markdown
Author

tdene commented Apr 4, 2026

could you share some more context on this?

Absolutely, sorry about that! I will also edit the main body of the PR.

Megatron Inference largely follows the same schema as vLLM, but it has 3 extra fields that are used to track per-token staleness. So while it is possible to use NeMo Gym + Megatron Inference by pretending it is just vLLM, doing so drops these extra 3 fields.

For Megatron RL, those 3 fields are important today. We can just monkey-patch them into the vLLM schema inside Gym, but Megatron Inference may gain even more functionality in the future, so the right thing to do is to officially add a new inference class with its own mixin.

@tdene tdene requested a review from a team as a code owner April 5, 2026 20:44
@tdene tdene force-pushed the tene/megatron-inference branch 2 times, most recently from 0ff5ca9 to 41d239d Compare April 5, 2026 21:50
Comment thread responses_api_models/megatron_inference/tests/test_app.py
policy_model:
responses_api_models:
megatron_inference:
entrypoint: app.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this have more fields like in https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_models/vllm_model/configs/vllm_model.yaml

and potentially a megatron_inference_for_training.yaml (I guess the default is probably for training here)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved; I'm following the vllm convention now with configs/megatron_inference.yaml and configs/megatron_inference_for_training.yaml.



MEGATRON_RESPONSES_TO_TRAIN = {
base: type(f"Megatron{train.__name__}", (base, MegatronTokenIDLogProbMixin), {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should (base, MegatronTokenIDLogProbMixin) be (train, MegatronTokenIDLogProbMixin) instead? I guess it doesnt matter at the moment but seems like maybe

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!



class MegatronTokenIDLogProbMixin(TokenIDLogProbMixin):
policy_epoch: list[list[tuple[int, int]]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/NVIDIA-NeMo/Gym/blob/main/nemo_gym/openai_utils.py#L101-L104

TokenIDLogProbMixin uses typing.List while this is using lowercase built ins. Probably fine, but maybe better to be consistent, what do you think

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Thank you for the review by the way!

@tdene tdene force-pushed the tene/megatron-inference branch 2 times, most recently from babe604 to e4ee97c Compare April 6, 2026 07:57
@tdene
Copy link
Copy Markdown
Author

tdene commented Apr 6, 2026

@cmunley1 please take a look at my last commit:

class TokenIDLogProbMixin(BaseModel, extra="allow"):

It's hacky, but I could find no other way to handle it.

The MegatronTokenIDLogProbMixin subclass works on the client side. But on the server side, we do SimpleAgentVerifyRequest -> BaseVerifyRequest -> response: NeMoGymResponse -> output: List[NeMoGymResponseOutputItem], and NeMoGymResponseOutputItem is hardcoded inside openai_utils.py.

I see no way to add our MegatronResponseOutputMessageForTraining into NeMoGymResponseOutputItem. So the extra data we are adding into the mixin gets dropped on the server-side. Other than replicating 100 lines of SimpleAgent code just to change one line inside it.

What would you suggest: how should this problem be solved?

@tdene tdene force-pushed the tene/megatron-inference branch 3 times, most recently from 61e29a9 to f7c246f Compare April 6, 2026 19:17
@tdene tdene marked this pull request as draft April 6, 2026 20:14
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 6, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@tdene tdene force-pushed the tene/megatron-inference branch 3 times, most recently from 57061c5 to 02f5796 Compare April 7, 2026 09:48
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
@tdene tdene force-pushed the tene/megatron-inference branch from 02f5796 to 202cc12 Compare April 7, 2026 09:55
tdene added 3 commits April 7, 2026 05:03
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
@tdene tdene force-pushed the tene/megatron-inference branch from 202cc12 to 91a585a Compare April 7, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants