Add Megatron Inference model backend by tdene · Pull Request #1001 · NVIDIA-NeMo/Gym

tdene · 2026-04-03T17:24:39Z

Megatron Inference is an inference back-end that is used for RL. We have been using it through Megatron RL and NeMo RL.

As it has a near-identical schema with vLLM, we have been doing RL through Gym by pretending it is vLLM.

However, Megatron Inference includes 3 extra fields that track per-token staleness and are important to RL. We have been monkey-patching these into Gym. That is not a long-term solution.

This PR establishes official support for Megatron Inference. Currently that consists of a mixin with 3 extra fields. In the future, Megatron Inference may evolve further.

cmunley1 · 2026-04-04T09:29:26Z

could you share some more context on this?

tdene · 2026-04-04T13:11:47Z

could you share some more context on this?

Absolutely, sorry about that! I will also edit the main body of the PR.

Megatron Inference largely follows the same schema as vLLM, but it has 3 extra fields that are used to track per-token staleness. So while it is possible to use NeMo Gym + Megatron Inference by pretending it is just vLLM, doing so drops these extra 3 fields.

For Megatron RL, those 3 fields are important today. We can just monkey-patch them into the vLLM schema inside Gym, but Megatron Inference may gain even more functionality in the future, so the right thing to do is to officially add a new inference class with its own mixin.

cmunley1 · 2026-04-06T07:25:13Z

+policy_model:
+  responses_api_models:
+    megatron_inference:
+      entrypoint: app.py


shouldnt this have more fields like in https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_models/vllm_model/configs/vllm_model.yaml

and potentially a megatron_inference_for_training.yaml (I guess the default is probably for training here)

Resolved; I'm following the vllm convention now with configs/megatron_inference.yaml and configs/megatron_inference_for_training.yaml.

cmunley1 · 2026-04-06T07:33:49Z

+
+
+MEGATRON_RESPONSES_TO_TRAIN = {
+    base: type(f"Megatron{train.__name__}", (base, MegatronTokenIDLogProbMixin), {})


should (base, MegatronTokenIDLogProbMixin) be (train, MegatronTokenIDLogProbMixin) instead? I guess it doesnt matter at the moment but seems like maybe

cmunley1 · 2026-04-06T07:41:43Z

+
+
+class MegatronTokenIDLogProbMixin(TokenIDLogProbMixin):
+    policy_epoch: list[list[tuple[int, int]]]


https://github.com/NVIDIA-NeMo/Gym/blob/main/nemo_gym/openai_utils.py#L101-L104

TokenIDLogProbMixin uses typing.List while this is using lowercase built ins. Probably fine, but maybe better to be consistent, what do you think

Addressed.

Thank you for the review by the way!

tdene · 2026-04-06T13:01:37Z

@cmunley1 please take a look at my last commit:

class TokenIDLogProbMixin(BaseModel, extra="allow"):

It's hacky, but I could find no other way to handle it.

The MegatronTokenIDLogProbMixin subclass works on the client side. But on the server side, we do SimpleAgentVerifyRequest -> BaseVerifyRequest -> response: NeMoGymResponse -> output: List[NeMoGymResponseOutputItem], and NeMoGymResponseOutputItem is hardcoded inside openai_utils.py.

I see no way to add our MegatronResponseOutputMessageForTraining into NeMoGymResponseOutputItem. So the extra data we are adding into the mixin gets dropped on the server-side. Other than replicating 100 lines of SimpleAgent code just to change one line inside it.

What would you suggest: how should this problem be solved?

copy-pr-bot · 2026-04-06T20:14:24Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

tdene force-pushed the tene/megatron-inference branch from 3ec25b5 to 525abdf Compare April 3, 2026 17:25

tdene requested a review from a team as a code owner April 5, 2026 20:44

tdene force-pushed the tene/megatron-inference branch 2 times, most recently from 0ff5ca9 to 41d239d Compare April 5, 2026 21:50

cmunley1 requested changes Apr 6, 2026

View reviewed changes

Comment thread responses_api_models/megatron_inference/tests/test_app.py

cmunley1 reviewed Apr 6, 2026

View reviewed changes

tdene force-pushed the tene/megatron-inference branch 2 times, most recently from babe604 to e4ee97c Compare April 6, 2026 07:57

tdene force-pushed the tene/megatron-inference branch 3 times, most recently from 61e29a9 to f7c246f Compare April 6, 2026 19:17

tdene marked this pull request as draft April 6, 2026 20:14

tdene force-pushed the tene/megatron-inference branch 3 times, most recently from 57061c5 to 02f5796 Compare April 7, 2026 09:48

Reorganize to allow for custom mixins

8878134

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

tdene force-pushed the tene/megatron-inference branch from 02f5796 to 202cc12 Compare April 7, 2026 09:55

tdene added 3 commits April 7, 2026 05:03

Add Megatron Inference model backend

51ab55a

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

Add Megatron Inference README/configs

72e3793

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

Pass through parameters

91a585a

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

tdene force-pushed the tene/megatron-inference branch from 202cc12 to 91a585a Compare April 7, 2026 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Megatron Inference model backend#1001

Add Megatron Inference model backend#1001
tdene wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
tdene:tene/megatron-inference

tdene commented Apr 3, 2026 •

edited

Loading

Uh oh!

cmunley1 commented Apr 4, 2026

Uh oh!

tdene commented Apr 4, 2026

Uh oh!

Uh oh!

cmunley1 Apr 6, 2026

Uh oh!

tdene Apr 6, 2026

Uh oh!

cmunley1 Apr 6, 2026

Uh oh!

tdene Apr 6, 2026

Uh oh!

cmunley1 Apr 6, 2026

Uh oh!

tdene Apr 6, 2026

Uh oh!

tdene commented Apr 6, 2026

Uh oh!

copy-pr-bot bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		MEGATRON_RESPONSES_TO_TRAIN = {
		base: type(f"Megatron{train.__name__}", (base, MegatronTokenIDLogProbMixin), {})



		class MegatronTokenIDLogProbMixin(TokenIDLogProbMixin):
		policy_epoch: list[list[tuple[int, int]]]

Conversation

tdene commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmunley1 commented Apr 4, 2026

Uh oh!

tdene commented Apr 4, 2026

Uh oh!

Uh oh!

cmunley1 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tdene Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tdene commented Apr 6, 2026

Uh oh!

copy-pr-bot bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tdene commented Apr 3, 2026 •

edited

Loading