update transformers version by nithinraok · Pull Request #15365 · NVIDIA-NeMo/NeMo

nithinraok · 2026-02-06T11:26:18Z

What does this PR do ?

Update transformers version

Collection: All

Changelog

Changes Summary

transformers: Unpinned from ~=4.57.0 — now allows any version (no upper bound)
protobuf: Upgraded from ~=5.29.5 to >=6.33
datasets: Added minimum version >=3.2.0
fsspec: Relaxed from ==2024.12.0 to >=2024.12.0

Core / Common

HuggingFace Hub model filter (nemo/core/classes/mixins/hf_io_mixin.py): Updated get_hf_model_filter() to use the new filter list parameter instead of deprecated library, language, task, tags kwargs (aligns with huggingface_hub API changes)
AutoTokenizer (nemo/collections/common/tokenizers/huggingface/auto_tokenizer.py): Added fallback logic for vocab_file — in transformers >= 5.0, from_pretrained may ignore the vocab_file kwarg, so the tokenizer now detects vocab size mismatch and re-loads from the vocab file directly

ASR

Aggregate tokenizer vocab size tests: Updated expected vocab size from 254 to 264 across four test files (test_asr_ctc_encoder_model_bpe.py, test_asr_hybrid_rnnt_ctc_model_bpe.py, test_asr_hybrid_rnnt_ctc_model_bpe_prompt.py, test_asr_rnnt_encoder_model_bpe.py) — reflects new tokenizer behavior with updated transformers
Parallel chunking test (test_asr_multitask_model_bpe.py::test_aed_parallel_chunking): Relaxed exact text match to a >95% word similarity check (timestamps=True/False use different merge algorithms that may produce slight differences at chunk boundaries). Removed hardcoded expected values for final word/offset assertions

TTS

T5 tokenizer vocab_size fix (magpietts.py): In transformers v5+, T5Tokenizer is a fast tokenizer whose vocab_size now includes extra_id sentinel tokens (e.g. 32100 = 32000 + 100). Added logic to subtract _extra_ids so the embedding size matches legacy checkpoints

SpeechLM2

test_duplex_eartts.py: Fixed CI cached path check — now checks for the specific model subdirectory (/home/TestData/nvidia--NVIDIA-Nemotron-Nano-9B-v2/) instead of the broad /home/TestData/ directory
test_salm.py: Fixed expected tokenized output — removed extra trailing space before [/INST] token (whitespace handling change in updated tokenizers)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

pzelasko · 2026-03-09T18:13:01Z

            author=None,
-            library='nemo',
-            language=None,
+            filter=['nemo'],


What is this change doing? Why is it needed? Where are we still using this mixin?

Change is for updating args to match latest version.

This mixin provides API for fetching nemo models, pushing to hf hub or for getting hf_model_card. It was previously used for pushing nemo models, however now we do it manually. This file as I can see is now only used in tutorials but not in nemo/collections code. IMO we can remove this file during refactoring.

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

blisc

Looks fine from TTS

github-actions · 2026-03-11T00:12:22Z

[🤖]: Hi @nithinraok 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

pzelasko · 2026-03-11T13:50:49Z


    conversations = (
        guess_parse_cutset(cfg.inputs)
+        .map(


why was this change needed? 👀 @nithinraok

It was failing due to non presence of cut. So I had to change the order.

nithinraok added Run CICD r2.7.0 Cherry-pick to r2.7.0 release branch labels Feb 6, 2026

nithinraok temporarily deployed to test February 6, 2026 11:27 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Feb 6, 2026

nithinraok force-pushed the update_transformers branch from f5782b6 to 3b7af27 Compare February 9, 2026 19:23

nithinraok added the Run CICD label Feb 9, 2026

nithinraok had a problem deploying to test February 9, 2026 19:26 — with GitHub Actions Error

chtruong814 added Run CICD and removed Run CICD labels Feb 9, 2026

chtruong814 temporarily deployed to test February 9, 2026 19:28 — with GitHub Actions Inactive

nithinraok added Run CICD and removed Run CICD labels Feb 10, 2026

nithinraok temporarily deployed to test February 10, 2026 03:24 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Feb 10, 2026

nithinraok added the Run CICD label Feb 10, 2026

nithinraok temporarily deployed to test February 10, 2026 14:16 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Feb 10, 2026

nithinraok force-pushed the update_transformers branch from 99483fd to 3525883 Compare February 18, 2026 14:34

nithinraok added Run CICD and removed r2.7.0 Cherry-pick to r2.7.0 release branch labels Feb 18, 2026

nithinraok temporarily deployed to test February 18, 2026 14:55 — with GitHub Actions Inactive

github-actions Bot added core Changes to NeMo Core common and removed Run CICD labels Feb 18, 2026

nithinraok added skip-linting Run CICD labels Feb 18, 2026

nithinraok temporarily deployed to test February 19, 2026 05:03 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Feb 19, 2026

nithinraok added 2 commits March 9, 2026 07:12

update transformers version

6f6d385

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

relax pinning

11f7410

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

nithinraok force-pushed the update_transformers branch from 05ab977 to 2e06c40 Compare March 9, 2026 18:05

github-actions Bot added the ASR label Mar 9, 2026

nithinraok requested a review from pzelasko March 9, 2026 18:06

nithinraok added the Run CICD label Mar 9, 2026

nithinraok temporarily deployed to test March 9, 2026 18:07 — with GitHub Actions Inactive

pzelasko reviewed Mar 9, 2026

View reviewed changes

pzelasko previously approved these changes Mar 9, 2026

View reviewed changes

github-actions Bot removed the Run CICD label Mar 9, 2026

update tokenizers code

6e92eeb

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

nithinraok dismissed pzelasko’s stale review via 6e92eeb March 9, 2026 20:59

github-actions Bot added the TTS label Mar 9, 2026

nithinraok requested a review from blisc March 9, 2026 21:14

nithinraok added the Run CICD label Mar 9, 2026

nithinraok temporarily deployed to test March 9, 2026 21:16 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Mar 10, 2026

update rest of salm files

422ab58

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

nithinraok added the Run CICD label Mar 10, 2026

nithinraok temporarily deployed to test March 10, 2026 12:24 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Mar 10, 2026

update signature for HFHub

64a2905

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

nithinraok added the Run CICD label Mar 10, 2026

nithinraok temporarily deployed to test March 10, 2026 20:36 — with GitHub Actions Inactive

blisc approved these changes Mar 10, 2026

View reviewed changes

github-actions Bot removed the Run CICD label Mar 11, 2026

nithinraok requested a review from pzelasko March 11, 2026 06:22

nithinraok merged commit 037573f into main Mar 11, 2026
127 checks passed

nithinraok deleted the update_transformers branch March 11, 2026 13:37

pzelasko reviewed Mar 11, 2026

View reviewed changes

nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026

update transformers version (NVIDIA-NeMo#15365)

d05d7f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update transformers version#15365

update transformers version#15365
nithinraok merged 10 commits intomainfrom
update_transformers

nithinraok commented Feb 6, 2026 •

edited

Loading

Uh oh!

pzelasko Mar 9, 2026

Uh oh!

nithinraok Mar 9, 2026

Uh oh!

blisc left a comment

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

Uh oh!

pzelasko Mar 11, 2026

Uh oh!

nithinraok Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nithinraok commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Changes Summary

Core / Common

ASR

TTS

SpeechLM2

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

pzelasko Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

blisc left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

Uh oh!

pzelasko Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nithinraok commented Feb 6, 2026 •

edited

Loading