Add Megatron Inference model backend#1001
Conversation
3ec25b5 to
525abdf
Compare
|
could you share some more context on this? |
Absolutely, sorry about that! I will also edit the main body of the PR. Megatron Inference largely follows the same schema as vLLM, but it has 3 extra fields that are used to track per-token staleness. So while it is possible to use NeMo Gym + Megatron Inference by pretending it is just vLLM, doing so drops these extra 3 fields. For Megatron RL, those 3 fields are important today. We can just monkey-patch them into the vLLM schema inside Gym, but Megatron Inference may gain even more functionality in the future, so the right thing to do is to officially add a new inference class with its own mixin. |
0ff5ca9 to
41d239d
Compare
| policy_model: | ||
| responses_api_models: | ||
| megatron_inference: | ||
| entrypoint: app.py |
There was a problem hiding this comment.
shouldnt this have more fields like in https://github.com/NVIDIA-NeMo/Gym/blob/main/responses_api_models/vllm_model/configs/vllm_model.yaml
and potentially a megatron_inference_for_training.yaml (I guess the default is probably for training here)
There was a problem hiding this comment.
Resolved; I'm following the vllm convention now with configs/megatron_inference.yaml and configs/megatron_inference_for_training.yaml.
|
|
||
|
|
||
| MEGATRON_RESPONSES_TO_TRAIN = { | ||
| base: type(f"Megatron{train.__name__}", (base, MegatronTokenIDLogProbMixin), {}) |
There was a problem hiding this comment.
should (base, MegatronTokenIDLogProbMixin) be (train, MegatronTokenIDLogProbMixin) instead? I guess it doesnt matter at the moment but seems like maybe
|
|
||
|
|
||
| class MegatronTokenIDLogProbMixin(TokenIDLogProbMixin): | ||
| policy_epoch: list[list[tuple[int, int]]] |
There was a problem hiding this comment.
https://github.com/NVIDIA-NeMo/Gym/blob/main/nemo_gym/openai_utils.py#L101-L104
TokenIDLogProbMixin uses typing.List while this is using lowercase built ins. Probably fine, but maybe better to be consistent, what do you think
There was a problem hiding this comment.
Addressed.
Thank you for the review by the way!
babe604 to
e4ee97c
Compare
|
@cmunley1 please take a look at my last commit: It's hacky, but I could find no other way to handle it. The I see no way to add our What would you suggest: how should this problem be solved? |
61e29a9 to
f7c246f
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
57061c5 to
02f5796
Compare
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
02f5796 to
202cc12
Compare
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
202cc12 to
91a585a
Compare
Megatron Inference is an inference back-end that is used for RL. We have been using it through Megatron RL and NeMo RL.
As it has a near-identical schema with vLLM, we have been doing RL through Gym by pretending it is vLLM.
However, Megatron Inference includes 3 extra fields that track per-token staleness and are important to RL. We have been monkey-patching these into Gym. That is not a long-term solution.
This PR establishes official support for Megatron Inference. Currently that consists of a mixin with 3 extra fields. In the future, Megatron Inference may evolve further.