Skip to content

chore(deps): update loader dependencies major (major)#194

Open
dreadnode-renovate-bot[bot] wants to merge 1 commit into
mainfrom
renovate/major-loader-deps-major
Open

chore(deps): update loader dependencies major (major)#194
dreadnode-renovate-bot[bot] wants to merge 1 commit into
mainfrom
renovate/major-loader-deps-major

Conversation

@dreadnode-renovate-bot

@dreadnode-renovate-bot dreadnode-renovate-bot Bot commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Confidence
psutil ==6.1.1==7.2.2 age confidence
transformers ==4.57.6==5.11.0 age confidence

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

giampaolo/psutil (psutil)

v7.2.2

Compare Source

v7.2.1

Compare Source

v7.2.0

Compare Source

v7.1.3

Compare Source

v7.1.2

Compare Source

v7.1.1

Compare Source

v7.1.0

Compare Source

v7.0.0

Compare Source

huggingface/transformers (transformers)

v5.11.0

Compare Source

Release v5.11.0

New Model additions

DiffusionGemma
image

DiffusionGemma is engineered to reduce the sequential bottlenecks of standard causal language models by employing an encoder-decoder architecture specifically optimized for inference speed. During inference, DiffusionGemma leverages multi-canvas sampling, where rather than generating one token at a time, the model iteratively denoises a full block of tokens using a diffusion sampler. This block-autoregressive approach facilitates text generation at higher speeds compared to traditional sequential generation methods.

Links: Documentation

DeepSeek-V3.2
image

DeepSeek-V3.2-Exp is an experimental model from DeepSeek-AI that introduces DeepSeek Sparse Attention (DSA), a trainable, fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios. Built on top of DeepSeek-V3.1-Terminus with a 685B-parameter Mixture-of-Experts backbone, it reduces the quadratic cost of attention over long sequences by attending only to a selected subset of past tokens while maintaining virtually identical benchmark performance. The work was extended in DeepSeek-V3.2 which pairs DSA with scalable reinforcement learning and achieves gold-medal level results on competition math and competitive programming benchmarks.

Links: Documentation | Paper

Kernels

The KernelConfig API was extended to support n-to-1 module fusion and parameter transformation, simplifying how custom kernels are integrated with Transformers modules. Additional fixes include resolving a dtype mismatch in the Mamba2 CUDA kernel path for NemotronH/Zamba2, adding fine-grained fp8/fp4 Triton kernel support, and correcting the FalconMamba fast-path warning to recommend pip install kernels instead of mamba-ssm.

Parallelization

Fixed model parallel beam search bugs in the Qwen2-VL, Qwen2.5-VL, and Qwen3-VL MoE model families, and added documentation for tensor parallelism support with continuous batching.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

v5.10.2: Patch release v5.10.2

Compare Source

Patch release v5.10.2

There was a big bug in the model conversion of models related to clip, this affected models like sam3 and others. Please make sure to update 🙏

Full Changelog: huggingface/transformers@v5.10.1...v5.10.2

v5.10.1

Compare Source

Release v5.10.1

v5.10.0 was yanked as we publish on a corrupted branch. Sorry everyone, this happens when we rush a release!!!

New Model additions

Gemma4 unified+ Gemma4 MTP
image

Gemma 4 12B Unified is an encoder-free multimodal model with pretrained and instruction-tuned variants. Unlike standard Gemma 4, which uses dedicated encoder towers, Gemma 4 12B Unified projects raw inputs directly into the language model's embedding space through lightweight linear pipelines. This results in a simpler architecture while maintaining strong multimodal performance.

Key differences from standard Gemma 4:

  • No Vision Tower: Raw pixel patches are projected directly into LM space via a Dense + LayerNorm pipeline with factorized 2D positional embeddings, replacing the vision encoder.
  • No Audio Tower: Raw 16 kHz waveform samples are chunked into fixed-length frames and projected through a simple RMSNorm → Linear pipeline, replacing the mel spectrogram + Conformer encoder.
  • Shared Multimodal Pipeline: Both vision and audio use the same Gemma4UnifiedMultimodalEmbedder (RMSNorm → Linear) for the final projection to text hidden space.

You can find the original Gemma 4 12B Unified checkpoints under the Gemma 4 release.

Sapiens2

Sapiens2 is a family of high-resolution vision transformers pretrained on ~1 billion curated human images, designed for human-centric computer vision tasks including pose estimation, body-part segmentation, surface normal estimation, and pointmap estimation. The models scale from 0.4B to 5B parameters and train at native 1K resolution, with hierarchical 4K variants for extended spatial reasoning. Sapiens2 achieves substantial improvements over its predecessor with +4 mAP in pose estimation, +24.3 mIoU in body-part segmentation, and 45.6% error reduction in normal estimation.

Links: Documentation | Paper

DeepSeek-OCR-2

DeepSeek-OCR-2 is an OCR-specialized vision-language model built on a distinctive architecture that combines a SAM ViT-B vision encoder with a Qwen2 hybrid attention encoder, connected through an MLP projector to a DeepSeek-V2 Mixture-of-Experts (MoE) language model. The model features a hybrid attention mechanism that applies bidirectional attention over image tokens and causal attention over query tokens, enabling efficient and accurate document understanding. It supports both plain OCR tasks and grounding capabilities with coordinate-aware output for document conversion to markdown format.

Links: Documentation

Mellum

Mellum is a code-focused Mixture-of-Experts language model developed by JetBrains. It is derived from the Qwen3-MoE architecture with per-layer-type RoPE and interleaved sliding window attention. The model has 12B total parameters with 2.5B active parameters per token, using 64 routed experts with 8 activated per token across 28 layers.

Links: Documentation

Breaking changes

The Gemma4 vision pooler now casts inputs to float32 before scaling to prevent float16 overflow (inf saturation) with large checkpoints, which may cause minor numerical differences in outputs for users running Gemma-4 vision models in float16.

Audio Language Models (ALMs) now have a dedicated base model class without a language modeling head, aligning them with the design of Vision Language Models (VLMs); users relying on the previous model class structure should update their code to use the new base model class where appropriate.

Parallelization

This release includes numerous bug fixes for model parallelism across multiple models (Gemma4, AltCLIP, ChineseClip, Blip-2, Whisper, Ovis2, Moshi) and parallel execution strategies, including fixes for tensor parallelism (TP), expert parallelism (EP), beam search under model parallel settings, and loss over-counting under TP/EP configurations. The continuous batching manager was also reworked for clearer control flow and improved TP race condition handling, and FSDP initialization via from_pretrained was introduced.

Cache

Fixed a regression in encoder-decoder cache initialization where the decoder config was incorrectly applied to the cross-attention cache, and resolved a RuntimeError caused by buffer size limits when warming up the cache on MPS devices. Additional test infrastructure improvements were made to support read-only cache environments used in CI.

Quantization

Added support for DeepGEMM BF16, mixed FP8/FP4, and MegaMoE quantization via a grouped linear refactor, while fixing two bugs: an FP8 MoE reverse substring issue affecting DSv4 initialization, and a BitsAndBytes 4-bit/8-bit quantization bug that silently dropped chunked tensors from one-to-many weight converters.

Bugfixes and improvements

Note

PR body was truncated to here.


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate.

@dreadnode-renovate-bot dreadnode-renovate-bot Bot added the type/digest Dependency digest updates label Feb 24, 2026
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch 3 times, most recently from 07525d6 to 3ac3e72 Compare March 1, 2026 00:53
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from 3ac3e72 to 4daa5d1 Compare March 8, 2026 00:48
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch 2 times, most recently from 3e0d62f to 4b95150 Compare April 1, 2026 00:57
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from 4b95150 to 40a28f1 Compare April 8, 2026 00:52
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch 2 times, most recently from 85f7052 to c4f4579 Compare April 19, 2026 00:59
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from c4f4579 to 37b26b9 Compare April 26, 2026 01:01
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from 37b26b9 to ca4e25e Compare May 3, 2026 01:07
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from ca4e25e to b5496fe Compare May 10, 2026 01:09
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from b5496fe to a845574 Compare May 17, 2026 01:11
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from a845574 to f7682ea Compare May 24, 2026 01:12
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from f7682ea to cfc6d09 Compare June 7, 2026 01:19
| datasource | package      | from   | to     |
| ---------- | ------------ | ------ | ------ |
| pypi       | psutil       | 6.1.1  | 7.2.2  |
| pypi       | transformers | 4.57.6 | 5.11.0 |
@dreadnode-renovate-bot dreadnode-renovate-bot Bot force-pushed the renovate/major-loader-deps-major branch from cfc6d09 to e691441 Compare June 14, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/digest Dependency digest updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants