Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,18 @@
NVIDIA Model Optimizer Changelog
================================

0.43 (2026-03-xx)
0.44 (2026-05-xx)
^^^^^^^^^^^^^^^^^

**New Features**

- Support full Transformer Engine spec for Minitron pruning (``mcore_minitron``). Now we no longer need to use custom ModelOpt spec. Note that this does not affect the usage of the pruning workflow but makes pruning slightly faster and may result in slightly different pruned model because of different kernel and numerics.

**Bug Fixes**

- Fix Minitron pruning (``mcore_minitron``) for MoE models. Importance estimation hooks were incorrectly registered for MoE modules and NAS step was hanging before this.

0.43 (2026-04-09)
^^^^^^^^^^^^^^^^^

**Bug Fixes**
Expand Down Expand Up @@ -39,7 +50,7 @@ NVIDIA Model Optimizer Changelog
- Migrated project metadata from ``setup.py`` to a fully declarative ``pyproject.toml``.
- Enable experimental Python 3.13 wheel support and unit tests in CI/CD.

0.42 (2026-02-xx)
0.42 (2026-03-10)
^^^^^^^^^^^^^^^^^

**Bug Fixes**
Expand Down
6 changes: 3 additions & 3 deletions examples/megatron_bridge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ This directory contains examples of using Model Optimizer with [NeMo Megatron-Br

## Pre-Requisites

Running these examples requires many additional dependencies to be installed (e.g., Megatron-Bridge, Megatron-core, etc.), hence we strongly recommend directly using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`) which has all the dependencies installed.
Running these examples requires many additional dependencies to be installed (e.g., Megatron-Bridge, Megatron-core, etc.), hence we strongly recommend directly using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02.01`) which has all the dependencies installed.

To get the latest ModelOpt features and examples scripts, mount your Model-Optimizer repo to the container.
To get the ModelOpt examples scripts, mount your Model-Optimizer repo to the container as follows:

```bash
export MODELOPT_DIR=${PWD}/Model-Optimizer # or set to your local Model-Optimizer repository path if you have cloned it
if [ ! -d "${MODELOPT_DIR}" ]; then
git clone https://github.com/NVIDIA/Model-Optimizer.git ${MODELOPT_DIR}
fi

export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02.01
docker run \
--gpus all \
--shm-size=16GB \
Expand Down
8 changes: 5 additions & 3 deletions examples/megatron_bridge/prune_minitron.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,9 @@ def main(args: argparse.Namespace):
"seq_length": args.seq_length,
},
init_model_parallel=True,
moe_grouped_gemm=False,
)
print_rank_0(f"\nPruning {unwrapped_model=}")
print_rank_0(f"\nPruning model (showing PP rank0): {unwrapped_model}")
print_rank_0(
f"Original model params: {num2hrb(mtp.mcore_minitron.get_mcore_param_count(unwrapped_model))}"
)
Expand All @@ -264,10 +265,11 @@ def main(args: argparse.Namespace):
}
if args.prune_target_params is not None:
# Restrict search space to a smaller set of candidates
# Allow more choices for MoE FFN as they are generally smaller
# NOTE: You can reduce the divisors and increase config['top_k'] to potentially find a better model.
ss_config = mtp.mcore_minitron.get_mcore_minitron_config(
hidden_size_divisor=256,
ffn_hidden_size_divisor=512,
ffn_hidden_size_divisor=256 if (provider.num_moe_experts or 0) > 0 else 512,
mamba_head_dim_divisor=8,
num_moe_experts_divisor=8,
num_layers_divisor=2,
Expand Down Expand Up @@ -317,7 +319,7 @@ def score_func_mmlu(m):
else "hybrid_layer_pattern"
)
setattr(provider, hybrid_key, getattr(unwrapped_model, hybrid_key))
print_rank_0(f"\nPruned {unwrapped_model=}")
print_rank_0(f"\nPruned model (showing PP rank0): {unwrapped_model}")
print_rank_0(
f"Pruned model params: {num2hrb(mtp.mcore_minitron.get_mcore_param_count(unwrapped_model))}"
)
Expand Down
1 change: 1 addition & 0 deletions examples/pruning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ bridge, provider, model, unwrapped_model, tokenizer = load_mbridge_model_from_hf
"pipeline_dtype": torch.bfloat16,
"seq_length": 4096,
},
moe_grouped_gemm=False,
)

# Set up the forward loop to run on 1024 train samples
Expand Down
Loading
Loading