chore(deps): update loader dependencies major (major)#194
Open
dreadnode-renovate-bot[bot] wants to merge 1 commit into
Open
chore(deps): update loader dependencies major (major)#194dreadnode-renovate-bot[bot] wants to merge 1 commit into
dreadnode-renovate-bot[bot] wants to merge 1 commit into
Conversation
07525d6 to
3ac3e72
Compare
3ac3e72 to
4daa5d1
Compare
3e0d62f to
4b95150
Compare
4b95150 to
40a28f1
Compare
85f7052 to
c4f4579
Compare
c4f4579 to
37b26b9
Compare
37b26b9 to
ca4e25e
Compare
ca4e25e to
b5496fe
Compare
b5496fe to
a845574
Compare
a845574 to
f7682ea
Compare
f7682ea to
cfc6d09
Compare
| datasource | package | from | to | | ---------- | ------------ | ------ | ------ | | pypi | psutil | 6.1.1 | 7.2.2 | | pypi | transformers | 4.57.6 | 5.11.0 |
cfc6d09 to
e691441
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==6.1.1→==7.2.2==4.57.6→==5.11.0Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Release Notes
giampaolo/psutil (psutil)
v7.2.2Compare Source
v7.2.1Compare Source
v7.2.0Compare Source
v7.1.3Compare Source
v7.1.2Compare Source
v7.1.1Compare Source
v7.1.0Compare Source
v7.0.0Compare Source
huggingface/transformers (transformers)
v5.11.0Compare Source
Release v5.11.0
New Model additions
DiffusionGemma
DiffusionGemma is engineered to reduce the sequential bottlenecks of standard causal language models by employing an encoder-decoder architecture specifically optimized for inference speed. During inference, DiffusionGemma leverages multi-canvas sampling, where rather than generating one token at a time, the model iteratively denoises a full block of tokens using a diffusion sampler. This block-autoregressive approach facilitates text generation at higher speeds compared to traditional sequential generation methods.
Links: Documentation
DeepSeek-V3.2
DeepSeek-V3.2-Exp is an experimental model from DeepSeek-AI that introduces DeepSeek Sparse Attention (DSA), a trainable, fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios. Built on top of DeepSeek-V3.1-Terminus with a 685B-parameter Mixture-of-Experts backbone, it reduces the quadratic cost of attention over long sequences by attending only to a selected subset of past tokens while maintaining virtually identical benchmark performance. The work was extended in DeepSeek-V3.2 which pairs DSA with scalable reinforcement learning and achieves gold-medal level results on competition math and competitive programming benchmarks.
Links: Documentation | Paper
Kernels
The
KernelConfigAPI was extended to support n-to-1 module fusion and parameter transformation, simplifying how custom kernels are integrated with Transformers modules. Additional fixes include resolving a dtype mismatch in the Mamba2 CUDA kernel path for NemotronH/Zamba2, adding fine-grained fp8/fp4 Triton kernel support, and correcting the FalconMamba fast-path warning to recommendpip install kernelsinstead ofmamba-ssm.out_proj) (#46487) by @yuekaizhang in [#46487]pip install kernelsin fast-path warning (#46343) by @Anai-Guo in [#46343]Parallelization
Fixed model parallel beam search bugs in the Qwen2-VL, Qwen2.5-VL, and Qwen3-VL MoE model families, and added documentation for tensor parallelism support with continuous batching.
Bugfixes and improvements
pr-ci-caller.yml(#46505) by @ydshieh in [#46505].github/workflows/pr-ci-post-dashboard-link.yml(#46499) by @ydshieh in [#46499]no_inherit_decoratorsand fixup wrong RoPE related inheritances (#46440) by @Bissmella in [#46440]pipeline_tutorial.md,pipeline_gradio.md,pipeline_webserver.mdandadd_new_pipeline.md. (#46388) by @filipinescu in [#46388]fast_tokenizers.md,custom_tokenizers.md,tokenizer_summary.md,image_processors.mdandvideo_processors.md. (#46356) by @filipinescu in [#46356]Significant community contributions
The following contributors have made significant changes to the library over the last release:
pipeline_tutorial.md,pipeline_gradio.md,pipeline_webserver.mdandadd_new_pipeline.md. (#46388)fast_tokenizers.md,custom_tokenizers.md,tokenizer_summary.md,image_processors.mdandvideo_processors.md. (#46356)v5.10.2: Patch release v5.10.2Compare Source
Patch release v5.10.2
There was a big bug in the model conversion of models related to clip, this affected models like sam3 and others. Please make sure to update 🙏
Full Changelog: huggingface/transformers@v5.10.1...v5.10.2
v5.10.1Compare Source
Release v5.10.1
v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!!
New Model additions
Gemma4 unified+ Gemma4 MTP
Gemma 4 12B Unified is an encoder-free multimodal model with pretrained and instruction-tuned variants. Unlike standard Gemma 4, which uses dedicated encoder towers, Gemma 4 12B Unified projects raw inputs directly into the language model's embedding space through lightweight linear pipelines. This results in a simpler architecture while maintaining strong multimodal performance.
Key differences from standard Gemma 4:
Dense + LayerNormpipeline with factorized 2D positional embeddings, replacing the vision encoder.RMSNorm → Linearpipeline, replacing the mel spectrogram + Conformer encoder.Gemma4UnifiedMultimodalEmbedder(RMSNorm → Linear) for the final projection to text hidden space.You can find the original Gemma 4 12B Unified checkpoints under the Gemma 4 release.
Sapiens2
Sapiens2 is a family of high-resolution vision transformers pretrained on ~1 billion curated human images, designed for human-centric computer vision tasks including pose estimation, body-part segmentation, surface normal estimation, and pointmap estimation. The models scale from 0.4B to 5B parameters and train at native 1K resolution, with hierarchical 4K variants for extended spatial reasoning. Sapiens2 achieves substantial improvements over its predecessor with +4 mAP in pose estimation, +24.3 mIoU in body-part segmentation, and 45.6% error reduction in normal estimation.
Links: Documentation | Paper
DeepSeek-OCR-2
DeepSeek-OCR-2 is an OCR-specialized vision-language model built on a distinctive architecture that combines a SAM ViT-B vision encoder with a Qwen2 hybrid attention encoder, connected through an MLP projector to a DeepSeek-V2 Mixture-of-Experts (MoE) language model. The model features a hybrid attention mechanism that applies bidirectional attention over image tokens and causal attention over query tokens, enabling efficient and accurate document understanding. It supports both plain OCR tasks and grounding capabilities with coordinate-aware output for document conversion to markdown format.
Links: Documentation
Mellum
Mellum is a code-focused Mixture-of-Experts language model developed by JetBrains. It is derived from the Qwen3-MoE architecture with per-layer-type RoPE and interleaved sliding window attention. The model has 12B total parameters with 2.5B active parameters per token, using 64 routed experts with 8 activated per token across 28 layers.
Links: Documentation
Mellumv2 code generation model (#46112) by @shadeMe in #46112Breaking changes
The Gemma4 vision pooler now casts inputs to float32 before scaling to prevent float16 overflow (inf saturation) with large checkpoints, which may cause minor numerical differences in outputs for users running Gemma-4 vision models in float16.
Audio Language Models (ALMs) now have a dedicated base model class without a language modeling head, aligning them with the design of Vision Language Models (VLMs); users relying on the previous model class structure should update their code to use the new base model class where appropriate.
Parallelization
This release includes numerous bug fixes for model parallelism across multiple models (Gemma4, AltCLIP, ChineseClip, Blip-2, Whisper, Ovis2, Moshi) and parallel execution strategies, including fixes for tensor parallelism (TP), expert parallelism (EP), beam search under model parallel settings, and loss over-counting under TP/EP configurations. The continuous batching manager was also reworked for clearer control flow and improved TP race condition handling, and FSDP initialization via
from_pretrainedwas introduced.Revert] FSDP+Dtensor refactor related changes (#46246) by @vasqu in [#46246]create_bidirectional_mask(#46221) by @kaixuanliu in [#46221]Cache
Fixed a regression in encoder-decoder cache initialization where the decoder config was incorrectly applied to the cross-attention cache, and resolved a
RuntimeErrorcaused by buffer size limits when warming up the cache on MPS devices. Additional test infrastructure improvements were made to support read-only cache environments used in CI.RuntimeErroron mps (#46239) by @McPatate in [#46239]Quantization
Added support for DeepGEMM BF16, mixed FP8/FP4, and MegaMoE quantization via a grouped linear refactor, while fixing two bugs: an FP8 MoE reverse substring issue affecting DSv4 initialization, and a BitsAndBytes 4-bit/8-bit quantization bug that silently dropped chunked tensors from one-to-many weight converters.
Bugfixes and improvements
contributing.md,modular_transformers.md,multimodal_processing.md,add_vision_processing_components.md,add_audio_processing_components.md,modeling_rules.md,model_output_tracing.md,auto_docstring.md,testing.md,pr_checks.mdandadd_new_model.md. (#46345) by @filipinescu in [#46345]weightconverter.md,models.md,custom_models.md,monkey_patching.md,fusion_mapping.md,how_to_hack_models.md,model_sharing.mdandserialization.md. (#46309) by @filipinescu in [#46309]StaticCachebuilding an empty layer list whennum_kv_shared_layers == 0(#46235) by @tengomucho in [#46235]Configs] Fix layer type validation to include its mlp counterpart (#46220) by @vasqu in [#46220]num_items_in_batchover-counting for causal LM losses (#46204) by @qgallouedec in [#46204]maininstead of commit SHA for now (#46241) by @ydshieh in [#46241]Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.
This PR has been generated by Mend Renovate.