Allow I/O of interpreted class if the underlying data format allows by vepadulano · Pull Request #22234 · root-project/root

vepadulano · 2026-05-11T16:30:16Z

RDataFrame needs the type_info of input column types to create the column readers. This is searched first in a map of known simple types, then through TClass. Sometimes TClass might not know about a type_info even though it's available to the interpreter. This is the case for example of runtime-generated classes that are declared to the interpreter but do not have a dictionary, in which case tclass->IsLoaded() will return false and the TClass will not expose the type_info but the interpreter knows about it.

This commit proposes to extend the search in RDataFrame to include also a call to the interpreter in case the RTTI cannot be found through TClass.

This removes a long-standing safeguard introduced in RDataFrame that would prevent users from trying to Snapshot a column if the type had no dictionary. But both TTree and RNTuple support this to different degrees. By removing the safeguard and querying the interpreter for the RTTI RDataFrame delegates responsability to the underlying I/O technology.

As a side effect, this also helps in the integration with awkward arrays, where a C++ type is generated per different awkward array layout at runtime in the Python application. The awkward data source is still responsible to communicate correctly to TTree or RNTuple what they should store to disk when a Snapshot is called.

FYI @ianna

Fixes #22233

github-actions · 2026-05-11T21:31:57Z

Test Results

22 files 22 suites 3d 11h 20m 3s ⏱️
3 854 tests 3 853 ✅ 0 💤 1 ❌
77 017 runs 77 016 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit e7b6af3.

♻️ This comment has been updated with latest results.

Change the implementation of the generated RDataSource class to use the newer signature that returns one column reader. This, together with the changes at root-project/root#22234, allow to restore the behaviour that awkward relied upon to store arrays to disk via RDataFrame. Irrespective of these changes, the current in-memory layout of the awkward array is not fit for storage, leading to writing silently garbage data to disk. Fixes scikit-hep#3885

Change the implementation of the generated RDataSource class to use the newer signature that returns one column reader. This, together with the changes at root-project/root#22234, allow to restore the behaviour that awkward relied upon to store arrays to disk via RDataFrame. Irrespective of these changes, the current in-memory layout of the awkward RecordArray is not fit for storage, leading to writing silently garbage data to disk. Fixes scikit-hep#3885

hageboeck

LGTM in general, but I agree with @pcanal's comments.

And if you could move the cleanup to the read test, we could save a source file and a test.

RDataFrame needs the type_info of input column types to create the column readers. This is searched first in a map of known simple types, then through TClass. Sometimes TClass might not know about a type_info even though it's available to the interpreter. This is the case for example of runtime-generated classes that are declared to the interpreter but do not have a dictionary, in which case `tclass->IsLoaded()` will return false and the TClass will not expose the type_info but the interpreter knows about it. This commit proposes to extend the search in RDataFrame to include also a call to the interpreter in case the RTTI cannot be found through TClass. This removes a long-standing safeguard introduced in RDataFrame that would prevent users from trying to Snapshot a column if the type had no dictionary. But both TTree and RNTuple support this to different degrees. By removing the safeguard and searching querying the interpreter for the RTTI RDataFrame delegates responsability to the underlying I/O technology. As a side effect, this also helps in the integration with awkward arrays, where a C++ type is generated per different awkward array layout at runtime in the Python application. The awkward data source is still responsible to communicate correctly to TTree or RNTuple what they should store to disk when a Snapshot is called.

vepadulano · 2026-05-15T13:14:35Z

/backport to 6.40

root-project-bot · 2026-05-15T13:15:39Z

Preparing to backport PR #22234 to branch 6.40 requested by vepadulano

root-project-bot · 2026-05-15T13:16:22Z

This PR has been backported to branch 6.40: #22303

vepadulano requested a review from hageboeck May 11, 2026 16:30

vepadulano self-assigned this May 11, 2026

vepadulano requested a review from martamaja10 as a code owner May 11, 2026 16:30

vepadulano added in:Python Interface in:RDataFrame labels May 11, 2026

ferdymercury reviewed May 11, 2026

View reviewed changes

Comment thread tree/dataframe/src/RDFUtils.cxx Outdated

vepadulano force-pushed the rdf-awkward-snapshot branch from fa0c418 to 93748f1 Compare May 11, 2026 16:41

ianna mentioned this pull request May 11, 2026

refactor: Migrate to C++ fixed-width types and std:: namespace scikit-hep/awkward#4030

Draft

vepadulano force-pushed the rdf-awkward-snapshot branch 2 times, most recently from ed725fc to fb44bbe Compare May 12, 2026 14:06

vepadulano requested a review from bellenot as a code owner May 12, 2026 14:06

vepadulano changed the title ~~Search more thoroughly for the RTTI of interpreted types~~ Allow I/O of interpreted class if the underlying data format allows May 12, 2026

vepadulano requested review from jblomer and pcanal May 12, 2026 14:06

vepadulano mentioned this pull request May 13, 2026

fix: migrate to new RDataSource column reader API scikit-hep/awkward#4048

Open

pcanal reviewed May 13, 2026

View reviewed changes

Comment thread tree/dataframe/src/RDFUtils.cxx

pcanal reviewed May 13, 2026

View reviewed changes

Comment thread tree/dataframe/test/dataframe_snapshot_interpreted_class_read.cxx Outdated

hageboeck approved these changes May 14, 2026

View reviewed changes

Comment thread tree/dataframe/test/dataframe_snapshot_interpreted_class_cleanup.cxx Outdated

Comment thread tree/dataframe/test/dataframe_snapshot_interpreted_class_read.cxx Outdated

Comment thread tree/dataframe/test/dataframe_snapshot_interpreted_class_read.cxx

vepadulano force-pushed the rdf-awkward-snapshot branch from fb44bbe to e7b6af3 Compare May 15, 2026 08:53

vepadulano merged commit 33b7a3c into root-project:master May 15, 2026
29 of 31 checks passed

root-project-bot mentioned this pull request May 15, 2026

[6.40] Allow I/O of interpreted class if the underlying data format allows #22303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow I/O of interpreted class if the underlying data format allows#22234

Allow I/O of interpreted class if the underlying data format allows#22234
vepadulano merged 1 commit into
root-project:masterfrom
vepadulano:rdf-awkward-snapshot

vepadulano commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

hageboeck left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vepadulano commented May 15, 2026

Uh oh!

root-project-bot commented May 15, 2026

Uh oh!

root-project-bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vepadulano commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

Uh oh!

hageboeck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vepadulano commented May 15, 2026

Uh oh!

root-project-bot commented May 15, 2026

Uh oh!

root-project-bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vepadulano commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading