Skip to content

feat: add exclude sources clone query mode#19530

Open
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:exclude-sources-clone-query-mode
Open

feat: add exclude sources clone query mode#19530
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:exclude-sources-clone-query-mode

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented May 28, 2026

Description

This adds a case to the cloneQueryMode query context which only mirrors queries to cloned servers. While this can result in partial query results, this allows for things like mirroring traffic/load testing on cloned servers.

Release note

Add excludeSources option to cloneQueryMode query context parameter.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Documentation Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 GHA labels May 28, 2026
Comment thread server/src/test/java/org/apache/druid/client/selector/ServerSelectorTest.java Dismissed
@jtuglu1 jtuglu1 force-pushed the exclude-sources-clone-query-mode branch from 5bbbd52 to f069444 Compare May 29, 2026 02:36
@jtuglu1 jtuglu1 marked this pull request as ready for review May 29, 2026 02:47
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 1
P2 0
P3 0
Total 1

excludeSource queries can return cached data for segments that are not served by clone targets when broker segment cache has entries from prior normal queries.

Reviewed 9 of 9 changed files.

Findings that could not be attached inline:

  • server/src/main/java/org/apache/druid/client/CachingClusteredClient.java:364 - [P1] excludeSource can reuse source-backed cache entries. With broker segment cache enabled, pruneSegmentsWithCachedResults runs before groupSegmentsByServer applies ServerSelector.pick(..., EXCLUDESOURCE). A prior non-excludeSource query can populate the same segment cache key for a segment that is not on any clone; the excludeSource query then removes that segment from remote querying and merges the cached rows, even though the new mode says segments missing from clones should be omitted. This defeats clone-only load tests and can turn expected partial results into full/source-backed results. Disable broker cache for EXCLUDESOURCE or make cache lookup/keying depend on clone availability/mode.

This is an automated review by Codex GPT-5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 GHA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants