[ML] Randomize train/test cluster boundary assignment in RDataLoader by siliataider · Pull Request #22196 · root-project/root

siliataider · 2026-05-08T16:17:23Z

This Pull request:

Previously RDataLoader always assigned the first fraction of each cluster to training and the last fraction to validation. This meant that across different runs, the train/val split was always identical regardless of the seed.

This PR fixes the issue by using the shuffle seed to randomly decide, per cluster, whether training takes the prefix or suffix of that cluster.

This PR fixes #22194

github-actions · 2026-05-08T19:08:17Z

Test Results

20 files 20 suites 2d 23h 17m 56s ⏱️
3 829 tests 3 829 ✅ 0 💤 0 ❌
69 072 runs 69 072 ✅ 0 💤 0 ❌

Results for commit 87db451.

♻️ This comment has been updated with latest results.

vepadulano

Nice! See minor considerations from my side.

siliataider requested a review from vepadulano as a code owner May 8, 2026 16:17

siliataider self-assigned this May 8, 2026

siliataider added the in:ML Everything under ROOT/ML label May 8, 2026

siliataider force-pushed the rdataloader branch from 2c18f68 to aaf5b8f Compare May 15, 2026 08:44

siliataider requested a review from guitargeek as a code owner May 15, 2026 10:18

siliataider force-pushed the rdataloader branch from 9aa65e9 to d293e1a Compare May 15, 2026 10:24

vepadulano approved these changes May 15, 2026

View reviewed changes

Comment thread bindings/pyroot/pythonizations/test/ml_dataloader.py Outdated

Comment thread tree/ml/inc/ROOT/ML/RClusterLoader.hxx Outdated

siliataider force-pushed the rdataloader branch from d293e1a to ef1b198 Compare May 15, 2026 14:34

siliataider added 2 commits May 15, 2026 16:43

[ML] Randomize train/test cluster boundary assignment in RDataLoader

64bbbd7

[ML] Add tests for random train/val splitting

87db451

siliataider force-pushed the rdataloader branch from ef1b198 to 87db451 Compare May 15, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Randomize train/test cluster boundary assignment in RDataLoader#22196

[ML] Randomize train/test cluster boundary assignment in RDataLoader#22196
siliataider wants to merge 2 commits into
root-project:masterfrom
siliataider:rdataloader

siliataider commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

vepadulano left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

siliataider commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull request:

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

siliataider commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading