Conversation
f5782b6 to
3b7af27
Compare
99483fd to
3525883
Compare
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
05ab977 to
2e06c40
Compare
| author=None, | ||
| library='nemo', | ||
| language=None, | ||
| filter=['nemo'], |
There was a problem hiding this comment.
What is this change doing? Why is it needed? Where are we still using this mixin?
There was a problem hiding this comment.
Change is for updating args to match latest version.
This mixin provides API for fetching nemo models, pushing to hf hub or for getting hf_model_card. It was previously used for pushing nemo models, however now we do it manually. This file as I can see is now only used in tutorials but not in nemo/collections code. IMO we can remove this file during refactoring.
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
|
[🤖]: Hi @nithinraok 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
|
||
| conversations = ( | ||
| guess_parse_cutset(cfg.inputs) | ||
| .map( |
There was a problem hiding this comment.
It was failing due to non presence of cut. So I had to change the order.
What does this PR do ?
Update transformers version
Collection: All
Changelog
Changes Summary
~=4.57.0— now allows any version (no upper bound)~=5.29.5to>=6.33>=3.2.0==2024.12.0to>=2024.12.0Core / Common
nemo/core/classes/mixins/hf_io_mixin.py): Updatedget_hf_model_filter()to use the newfilterlist parameter instead of deprecatedlibrary,language,task,tagskwargs (aligns withhuggingface_hubAPI changes)nemo/collections/common/tokenizers/huggingface/auto_tokenizer.py): Added fallback logic forvocab_file— in transformers >= 5.0,from_pretrainedmay ignore thevocab_filekwarg, so the tokenizer now detects vocab size mismatch and re-loads from the vocab file directlyASR
254to264across four test files (test_asr_ctc_encoder_model_bpe.py,test_asr_hybrid_rnnt_ctc_model_bpe.py,test_asr_hybrid_rnnt_ctc_model_bpe_prompt.py,test_asr_rnnt_encoder_model_bpe.py) — reflects new tokenizer behavior with updated transformerstest_asr_multitask_model_bpe.py::test_aed_parallel_chunking): Relaxed exact text match to a >95% word similarity check (timestamps=True/False use different merge algorithms that may produce slight differences at chunk boundaries). Removed hardcoded expected values for final word/offset assertionsTTS
magpietts.py): In transformers v5+,T5Tokenizeris a fast tokenizer whosevocab_sizenow includesextra_idsentinel tokens (e.g. 32100 = 32000 + 100). Added logic to subtract_extra_idsso the embedding size matches legacy checkpointsSpeechLM2
test_duplex_eartts.py: Fixed CI cached path check — now checks for the specific model subdirectory (/home/TestData/nvidia--NVIDIA-Nemotron-Nano-9B-v2/) instead of the broad/home/TestData/directorytest_salm.py: Fixed expected tokenized output — removed extra trailing space before[/INST]token (whitespace handling change in updated tokenizers)GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information