Skip to content

build: update trl requirement from <=0.21.0 to <=1.1.0#625

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0
Open

build: update trl requirement from <=0.21.0 to <=1.1.0#625
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 13, 2026

Updates the requirements on trl to permit the latest version.

Release notes

Sourced from trl's releases.

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Read the blog post: https://huggingface.co/spaces/HuggingFaceTB/trl-distillation-trainer

off_vs_on_policy_distillation yePX-mwe_1umXK5

The new DistillationTrainer implements on-policy knowledge distillation as described in On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes. It extends the ideas from the GKDTrainer with three key optimizations: a generation buffer that decouples the training microbatch size from the generation batch size (up to 40x speedup), external teacher server support so the teacher doesn't need to fit on training GPUs, and binary-encoded logprob payloads that shrink transfer payloads by ~5x.

from datasets import load_dataset
from trl.experimental.distillation import DistillationConfig, DistillationTrainer
dataset = load_dataset("openai/gsm8k", "main", split="train")
dataset = dataset.map(
lambda x: {"messages": [{"role": "user", "content": x["question"]}]},
remove_columns=dataset.column_names,
)
trainer = DistillationTrainer(
model="Qwen/Qwen2.5-1.5B-Instruct",
teacher_model="Qwen/Qwen2.5-7B-Instruct",
args=DistillationConfig(
output_dir="results/distill-qwen-gsm8k",
lmbda=1.0,                   # fully on-policy (student generates)
beta=1.0,                    # reverse KL
teacher_model_init_kwargs={"torch_dtype": "bfloat16"},
),
train_dataset=dataset,
)
trainer.train()

by @​cmpatino in huggingface/trl#5407, huggingface/trl#5500 and huggingface/trl#5501

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

AsyncGRPOTrainer now supports a chunked LM-head path that computes per-token log-probs and entropy via online logsumexp without materializing the full [N, V] logits tensor. Combined with completion_mask filtering to skip prompt tokens, this brings massive memory savings on long sequences — up to 44x lower peak-allocated memory on an 8192-token sequence:

chunk_lm_head_size Peak Alloc (GB) Reduction Wall Time (ms)
None (baseline) 18.55 1.00x 808.7
4096 0.42 44.32x 459.0
8192 0.76 24.34x 393.0

Enable it via the new chunk_lm_head_size option in AsyncGRPOConfig:

</tr></table> 

... (truncated)

Commits
  • 3179965 Release: v1.1 (#5524)
  • d6d5efc feat: add Qwen2.5 training chat template with generation markers (#5522)
  • ca995b4 Add docs and good defaults for DistillationTrainer (#5500)
  • c73c2ec Add Qwen3-VL tool calling support (#5469)
  • 9c8e191 Add GLM-4-MoE tool calling support (#5463)
  • dbd3fac feat: add Llama 3 training chat template with generation markers (#5493)
  • f2925a8 Add trackio support to DistillationTrainer (#5501)
  • d4caab8 Fix prepare_multimodal_messages not normalizing empty string content for assi...
  • b48c788 [docs] Add code example for completion_only_loss in SFT trainer docs (#5494)
  • d4e8354 Update GitHub Action to use specific version of github-script (#5491)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version.
- [Release notes](https://github.com/huggingface/trl/releases)
- [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md)
- [Commits](huggingface/trl@v0.2.0...v1.1.0)

---
updated-dependencies:
- dependency-name: trl
  dependency-version: 1.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants