build: update trl requirement from <=0.21.0 to <=1.1.0 by dependabot[bot] · Pull Request #625 · PrunaAI/pruna

dependabot · 2026-04-13T22:28:02Z

Updates the requirements on trl to permit the latest version.

Release notes

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Read the blog post: https://huggingface.co/spaces/HuggingFaceTB/trl-distillation-trainer

The new DistillationTrainer implements on-policy knowledge distillation as described in On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes. It extends the ideas from the GKDTrainer with three key optimizations: a generation buffer that decouples the training microbatch size from the generation batch size (up to 40x speedup), external teacher server support so the teacher doesn't need to fit on training GPUs, and binary-encoded logprob payloads that shrink transfer payloads by ~5x.
from datasets import load_dataset
from trl.experimental.distillation import DistillationConfig, DistillationTrainer
dataset = load_dataset("openai/gsm8k", "main", split="train")
dataset = dataset.map(
lambda x: {"messages": [{"role": "user", "content": x["question"]}]},
remove_columns=dataset.column_names,
)
trainer = DistillationTrainer(
model="Qwen/Qwen2.5-1.5B-Instruct",
teacher_model="Qwen/Qwen2.5-7B-Instruct",
args=DistillationConfig(
output_dir="results/distill-qwen-gsm8k",
lmbda=1.0,                   # fully on-policy (student generates)
beta=1.0,                    # reverse KL
teacher_model_init_kwargs={"torch_dtype": "bfloat16"},
),
train_dataset=dataset,
)
trainer.train()
by @cmpatino in huggingface/trl#5407, huggingface/trl#5500 and huggingface/trl#5501

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

AsyncGRPOTrainer now supports a chunked LM-head path that computes per-token log-probs and entropy via online logsumexp without materializing the full [N, V] logits tensor. Combined with completion_mask filtering to skip prompt tokens, this brings massive memory savings on long sequences — up to 44x lower peak-allocated memory on an 8192-token sequence:

chunk_lm_head_size Peak Alloc (GB) Reduction Wall Time (ms)

None (baseline) 18.55 1.00x 808.7

4096 0.42 44.32x 459.0

8192 0.76 24.34x 393.0

Enable it via the new chunk_lm_head_size option in AsyncGRPOConfig:
</tr></table> 

... (truncated)

Commits

3179965 Release: v1.1 (#5524)
d6d5efc feat: add Qwen2.5 training chat template with generation markers (#5522)
ca995b4 Add docs and good defaults for DistillationTrainer (#5500)
c73c2ec Add Qwen3-VL tool calling support (#5469)
9c8e191 Add GLM-4-MoE tool calling support (#5463)
dbd3fac feat: add Llama 3 training chat template with generation markers (#5493)
f2925a8 Add trackio support to DistillationTrainer (#5501)
d4caab8 Fix prepare_multimodal_messages not normalizing empty string content for assi...
b48c788 [docs] Add code example for completion_only_loss in SFT trainer docs (#5494)
d4e8354 Update GitHub Action to use specific version of github-script (#5491)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version. - [Release notes](https://github.com/huggingface/trl/releases) - [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md) - [Commits](huggingface/trl@v0.2.0...v1.1.0) --- updated-dependencies: - dependency-name: trl dependency-version: 1.1.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

codacy-production · 2026-04-13T22:29:22Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

dependabot bot added the python-dependencies label Apr 13, 2026

dependabot bot mentioned this pull request Apr 13, 2026

build: update trl requirement from <=0.21.0 to <=1.0.0 #604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: update trl requirement from <=0.21.0 to <=1.1.0#625

build: update trl requirement from <=0.21.0 to <=1.1.0#625
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0

dependabot bot commented on behalf of github Apr 13, 2026

Uh oh!

codacy-production bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

`chunk_lm_head_size`	Peak Alloc (GB)	Reduction	Wall Time (ms)
`None` (baseline)	18.55	1.00x	808.7
`4096`	0.42	44.32x	459.0
`8192`	0.76	24.34x	393.0

Conversation

dependabot bot commented on behalf of github Apr 13, 2026

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

Uh oh!

codacy-production bot commented Apr 13, 2026

Up to standards ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

`DistillationTrainer` for efficient on-policy distillation

Chunked LM head for memory-efficient log-prob computation in `AsyncGRPOTrainer`