Skip to content

GH-3969: Native stream() implementations for DatasetGraph#3970

Open
arne-bdt wants to merge 1 commit into
apache:mainfrom
arne-bdt:GH-3969-native-stream-for-DatasetGraph
Open

GH-3969: Native stream() implementations for DatasetGraph#3970
arne-bdt wants to merge 1 commit into
apache:mainfrom
arne-bdt:GH-3969-native-stream-for-DatasetGraph

Conversation

@arne-bdt

@arne-bdt arne-bdt commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

GitHub issue resolved #3969

Pull request Description:

PR description

Replaces the find()-wrapping default with native streaming across the DatasetGraph hierarchy.

  • DatasetGraphBaseFind: stream() path (streamNG / streamAny / streamQuadsInUnionGraph) mirroring find(), backed by new primitives streamInDftGraph / streamInSpecificNamedGraph / streamInAnyNamedGraphs, implemented by each dataset (in-memory, map, one, null, collection, buffering, dyadic, ordered, storage, TDB1).
  • GraphView.stream(...) routes graph-over-dataset access through DatasetGraph#stream.
  • New G stream helpers (quads2triples, triples2quads, triples2quadsDftGraph).
  • IteratorTxnTracker.forEachRemaining now checks the transaction, keeping stream bulk operations (forEach/count/collect) inside their originating transaction.

Compatibility: DatasetGraph.stream(g,s,p,o) remains a default method — no break for external implementors.

Tests: stream()==find() parity across access patterns for the in-memory / map / one / filtered-view / storage / TDB1 datasets and GraphView, plus buffered-overlay parity tests for BufferingDatasetGraph.

Note: I did not expose the native stream support of the StorageRDF implementations to DatasetGraphStorage. If one plans to to that, the streams would need to be transaction isolated like in the IteratorTxnTracker.


AI Disclaimer

The productive code is written by hand, only with AI assisted code completion.
Most of the new tests are written by an AI Coding assistant.
Issue description, PR description and commit comment is AI generated.


  • Tests are included.
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@arne-bdt arne-bdt marked this pull request as ready for review June 3, 2026 00:06

@rvesse rvesse left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though probably want @afs to sign off on this as well

Is there a reason TDB2 isn't included in this PR since you covered TDB1

@afs afs self-requested a review June 3, 2026 10:44
@arne-bdt arne-bdt marked this pull request as draft June 3, 2026 12:19
@arne-bdt

arne-bdt commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

LGTM though probably want @afs to sign off on this as well

Is there a reason TDB2 isn't included in this PR since you covered TDB1

I took an extra look at TBD2, not much to do there, since it mostly builds on org.apache.jena.dboe.storage.system.* and "org.apache.jena.sparql.core.*".

But you hinted me towards org.apache.jena.tdb2.store.GraphViewSwitchable, which needed to override #stream.
Thanks!

Replace the find()-wrapping default of DatasetGraph.stream(g,s,p,o) with
first-class stream support across the hierarchy. DatasetGraphBaseFind gets a
stream() path mirroring the find() default-/named-/union-graph split, backed
by streamInDftGraph / streamInSpecificNamedGraph / streamInAnyNamedGraphs
primitives implemented per dataset. GraphView streams graph-over-dataset
through DatasetGraph#stream; adds G stream helpers. The interface method stays
default, so implementors are not broken. Includes stream()==find() parity
tests and a forEachRemaining check in IteratorTxnTracker so stream bulk
operations stay inside their transaction.
@arne-bdt arne-bdt force-pushed the GH-3969-native-stream-for-DatasetGraph branch from 24a3587 to 4bc8494 Compare June 3, 2026 13:41
@arne-bdt arne-bdt marked this pull request as ready for review June 3, 2026 13:42
@afs

afs commented Jun 4, 2026

Copy link
Copy Markdown
Member

In principle, the PR is a "good thing" ™️

LGTM though probably want @afs to sign off on this as well

I will review this PR ... there is quite a lot of reviewing at the moment, so there is a bit of a queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native stream() implementations for DatasetGraph

4 participants