Add test coverage for Muon muon_lr/adam_lr overrides#8047
Conversation
| ### Evaluation Results | ||
|
|
||
| | Optimizer | Learning Rate | adam_lr (for Muon) | MBPP | MBPP+ | MMLU | GSM8K | | ||
| | Optimizer | muon_lr | adam_lr | MBPP | MBPP+ | MMLU | GSM8K | |
There was a problem hiding this comment.
It looks like change the title this way does not consistent with table content down below, i.e. AdamW learning rate is not muon learning rate. Does this file have to be modified?
|
I removed the README table change and updated the PR to keep it focused on the test coverage for the existing muon_lr / adam_lr behavior. |
|
Hi @sowndappan5 thanks for your new test cases. Can you address the comments in README.md and also fix the DCO tests by sign-off your commits? Thanks! |
|
Thanks, I’ve addressed the README feedback and pushed the update. I’m fixing the DCO issue now by signing off the commits. |
Signed-off-by: Sowndappan S <147894621+sowndappan5@users.noreply.github.com>
Signed-off-by: Sowndappan S <147894621+sowndappan5@users.noreply.github.com>
|
I’ve addressed the README feedback and force-pushed signed-off commits to fix the DCO issue. The PR is now updated and pending review/CI. |
Head branch was pushed to by a user without write access
…date README contributors section Signed-off-by: Sowndappan S <147894621+sowndappan5@users.noreply.github.com>
Summary
Add coverage for separate learning rate overrides in the Muon optimizer path and fix the related Muon blog documentation.
Background
Muon parameters and non-Muon parameters are automatically split into separate optimizer groups. The intended behavior is:
muon_lrapplies to Muon parameter groupsadam_lrapplies to Adam parameter groupslrremains the fallback for both groups when overrides are not providedChanges
lrfallback behaviormuon_lr/adam_lroverride behaviormuon_lrandadam_lrcorrectlyValidation
Ran:
python -m pytest DeepSpeed/tests/unit/ops/muon/test_muon_partial_training.py -k learning_rate_overrides -q -rsResult: