[FnApi Java] Add support for separate named data streams to provide bundle isolation by scwhittle · Pull Request #38863 · apache/beam

scwhittle · 2026-06-09T13:08:42Z

This is advertised to the runner via a new NAMED_DATA_STREAMS protocol capability. The runner is then free to assign bundles to named data streams as it chooses to isolate bundle processing from each other. Instead of single data stream from the sdk, the sdk will create a data stream for each name. The benefit of doing so is that the multiplexing currently performed on data stream messages being received allows a slow bundle to fill up buffers and block the shared stream. With separate named streams, bundles on other data streams have separate grpc flow control from the blocked stream and are not affected.

While making changes:

remove some nullness suppressions
document the existing metadata headers used on fnapi streams

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

gemini-code-assist · 2026-06-12T13:38:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for named data streams in the FnApi, enabling runners to isolate bundle processing by utilizing separate gRPC streams. This architecture prevents slow bundles from causing head-of-line blocking on shared streams by providing dedicated flow control. The changes include protocol updates, a refactoring of the outbound data aggregator to handle instruction lifecycles, and improved documentation for stream metadata.

Highlights

Protocol Changes: Added the NAMED_DATA_STREAMS protocol capability and a data_stream_id field to the FnApi to support bundle isolation.
API Refactoring: Refactored BeamFnDataOutboundAggregator to explicitly manage instruction lifecycles using prepareForInstruction and finishInstruction methods.
Code Cleanup and Documentation: Removed several nullness suppressions and added comprehensive documentation for metadata headers used in FnApi streams.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for named data streams in the Apache Beam Fn API, allowing the runner to request specific data stream IDs during bundle processing. It refactors BeamFnDataOutboundAggregator to be reusable across multiple bundles using explicit instruction lifecycle methods, and updates BeamFnDataGrpcClient to multiplex connections based on both the API service descriptor and the data stream ID. The review feedback identifies several critical issues: a potential NullPointerException in the background flushing thread of BeamFnDataOutboundAggregator if a flush occurs after an instruction finishes, the need to reset bytesWrittenSinceFlush at the end of each bundle, and the importance of normalizing dataStreamId in MultiplexerKey to prevent duplicate multiplexers. Additionally, minor documentation copy-paste errors in beam_fn_api.proto and grammatical issues in FnHarness.java should be addressed.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

github-actions · 2026-06-12T16:32:46Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

scwhittle · 2026-06-15T08:49:11Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for named data streams in the Beam Fn API, allowing runners to request specific data stream IDs in bundle processing. Key changes include updating protobuf definitions, refactoring BeamFnDataOutboundAggregator to manage its lifecycle per instruction, and updating BeamFnDataGrpcClient to cache multiplexers using a composite key of ApiServiceDescriptor and dataStreamId. The review feedback identifies critical issues, including potential concurrent NullPointerExceptions and thread-safety violations on gRPC stream observers in BeamFnDataOutboundAggregator. Additionally, it recommends using empty JSON brackets instead of an empty string for empty pipeline options to avoid parsing errors, and normalizing dataStreamId in MultiplexerKey to prevent duplicate multiplexers and simplify the code.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T08:51:29Z

+    PipelineOptions options =
+        PipelineOptionsTranslation.fromJson(pipelineOptionsJson == null ? "" : pipelineOptionsJson);


If pipelineOptionsJson is null, passing "" (empty string) to PipelineOptionsTranslation.fromJson will likely throw a JSON parsing exception because an empty string is not valid JSON. Passing "{}" (empty JSON object) is the standard way to represent empty options and is safe to parse.

Suggested change

PipelineOptions options =

PipelineOptionsTranslation.fromJson(pipelineOptionsJson == null ? "" : pipelineOptionsJson);

PipelineOptions options =

PipelineOptionsTranslation.fromJson(pipelineOptionsJson == null ? "{}" : pipelineOptionsJson);

…undle isolation. This is advertised to the runner via a new NAMED_DATA_STREAMS protocol capability. The runner is then free to assign bundles to named data streams as it chooses to isolate bundle processing from each other. Instead of single data stream from the sdk, the sdk will create a data stream for each name. The benefit of doing so is that the multiplexing currently performed on data stream messages being received allows a slow bundle to fill up buffers and block the shared stream. With separate named streams, bundles on other data streams have separate grpc flow control from the blocked stream and are not affected.

scwhittle · 2026-06-15T13:37:50Z

previous snapshots had all tests pass but some flakes, going to assign reviewers (just made some minor fixups)

scwhittle · 2026-06-15T13:37:53Z

assign set of reviewers

github-actions · 2026-06-15T13:39:07Z

Assigning reviewers:

R: @kennknowles for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

scwhittle · 2026-06-15T13:39:47Z

-  "nullness" // TODO(https://github.com/apache/beam/issues/20497)
-})
-// The calling thread that invokes sendBufferedDataAndFinishOutboundStreams synchronizes on
+// The calling thread that invokes sendOrCollectBufferedDataAndFinishOutboundStreams synchronizes on


the locking in this class is complicated, the synchronization is just needed if there is the background flush future. With a single implementation the duplicated code is at least co-located but we could possibly restructure to two separate implementations to simplify the non-background flushing implementation.

github-actions Bot added java model labels Jun 9, 2026

scwhittle force-pushed the named_streams branch from 87cad22 to 574ee56 Compare June 9, 2026 13:31

scwhittle force-pushed the named_streams branch from 574ee56 to d6ee8f6 Compare June 12, 2026 11:57

scwhittle changed the title ~~[FnApi] Add support for a sending header metadata when connecting to ApiServiceDescriptor under StandardProtocols.API_SERVICE_DESCRIPTOR_HEADER_METADATA~~ [FnApi Java] Add support for separate named data streams to provide bundle isolation Jun 12, 2026

scwhittle force-pushed the named_streams branch 2 times, most recently from e3734bd to 86dfa9f Compare June 12, 2026 12:07

github-actions Bot added runners fn-execution labels Jun 12, 2026

scwhittle marked this pull request as ready for review June 12, 2026 13:37

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

scwhittle force-pushed the named_streams branch from a12f159 to a7c3d51 Compare June 15, 2026 08:44

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

scwhittle force-pushed the named_streams branch from a7c3d51 to c84c5b5 Compare June 15, 2026 10:29

scwhittle force-pushed the named_streams branch from c84c5b5 to 335fbde Compare June 15, 2026 13:37

github-actions Bot added the Next Action: Reviewers label Jun 15, 2026

scwhittle commented Jun 15, 2026

View reviewed changes

		PipelineOptions options =
		PipelineOptionsTranslation.fromJson(pipelineOptionsJson == null ? "" : pipelineOptionsJson);

Conversation

scwhittle commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

gemini-code-assist Bot commented Jun 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

scwhittle commented Jun 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scwhittle commented Jun 15, 2026

Uh oh!

scwhittle commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

scwhittle Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scwhittle commented Jun 9, 2026 •

edited

Loading