Merge mbridge distillation for any_model#1036
Merge mbridge distillation for any_model#1036danielkorzekwa wants to merge 10 commits intodkorzekwa/anymodel_tutorialfrom
Conversation
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?
There was a problem hiding this comment.
We are not confident, e.g., we would need to talk to mbrdige/megatron-lm people on that first, align with their plans for heterogenous support. Let's think about it once puzzletron is in main.
We also have to do support for gpt-oss and mamba, so it is not the best time to merge it to mcore
There was a problem hiding this comment.
nemo:26.04 container code freeze is in 2 weeks. Lets make sure we raise a PR for required changes to M-Bridge before that so we can see what can and cannot be upstreamed
There was a problem hiding this comment.
unlikely have time for it in next 2 weeks
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
| "--master-addr", | ||
| "127.0.0.1", # Explicitly set master address | ||
| "--master-port", | ||
| str(get_free_port()), # Pass port directly to torchrun to avoid conflicts |
There was a problem hiding this comment.
is this necessary? I've never had the need to manually set these
There was a problem hiding this comment.
needed, somehow on interactive cluster node, the default port is already in use
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
| "--hf-export-path", | ||
| str(hf_export_dir), | ||
| "--hf-model", | ||
| "meta-llama/Llama-3.1-8B-Instruct", |
There was a problem hiding this comment.
can we change these argparse arguments to be underscore format as well so we have consistenct with other arguments?
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Merge anymodel mbridge distillation