feat: Early query cancellation based on per-segment sampling by mshahid6 · Pull Request #19235 · apache/druid

mshahid6 · 2026-03-30T21:55:07Z

Description

Added early query cancellation on historicals by sampling segment completion times and extrapolating whether the query will exceed its timeout. After a configurable number of segments complete, elapsed time is measured and used to project total query time. If the projection exceeds the remaining timeout, all pending segment futures are cancelled immediately.

New query context parameter is perSegmentSamplingWindow (number of segments to complete before extrapolating) with a default value of 0 i.e. disabled.

Can be set per-query via query context, or as a system default via druid.query.default.context.perSegmentSamplingWindow.

Example

  {
    "context": {
      "timeout": 30000,
      "perSegmentSamplingWindow": 5
    }
  }

After 5 segments complete, if wall-clock extrapolation suggests the query will exceed 30 seconds, it fails fast instead of waiting for the full timeout.

Key changed/added classes in this PR

ChainedExecutionQueryRunner
QueryContexts/QueryContext
ChainedExecutionQueryRunnerExtrapolationTest

Limitations

GroupByMergingQueryRunner is not covered yet (follow-up)
Extrapolation assumes roughly uniform segment cost; skewed first segments could cause false positives or misses

This PR has:

processing/src/main/java/org/apache/druid/query/ChainedExecutionQueryRunner.java

-              log.noStackTrace().warn(e, "Query interrupted, cancelling pending results for query [%s]", query.getId());
              GuavaUtils.cancelAll(true, future, futures);
+              if (extrapolationCancelled != null && extrapolationCancelled.get()) {
+                int completed = completedSegments.get();


jtuglu1

Few comments, I would like to actually test this under load to understand whether we will actually benefit from something like this. I've noticed lots of skew/delay in processing handling if the thread pools are overloaded.

jtuglu1 · 2026-04-01T04:17:59Z

processing/src/main/java/org/apache/druid/query/ChainedExecutionQueryRunner.java

+                        }
+                      }
+                    },
+                    Execs.directExecutor()


I have some concerns over noisy-neighbor/variability that might cause thing to fire more often than not. Another concern is since this is operating on the servicing Jetty thread, this might cause interrupts to occur on the thread blocking on .get():

I think we should be careful about scheduling async tasks on the main executor aside from the primary timeout. The main thread's job is to wait for completion of all futures, but if there are other competing tasks it needs to service, I want to make sure that:

a) Under contention we cannot possibly get InterruptedException (to go service a callback) and bail the processing on a valid query.
b) We don't delay processing of the future group because we are busy servicing a callback for one processing future.

Do we have a way of validating this won't happen?

jtuglu1 · 2026-04-01T18:04:20Z

processing/src/main/java/org/apache/druid/query/ChainedExecutionQueryRunner.java

+                      int completed = completedSegments.get();
+                      if (completed >= samplingWindow) {
+                        long elapsedMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - queryStartNanos);
+                        long extrapolatedMs = elapsedMs * totalSegments / completed;


I'm also worried about this calculation is not accounting for parallelism correctly and being "overkill." It's approximating of the time taken to complete a query as effectively avg segment time * total segments, which is not usually the case(it is assuming a purely serial execution). We have 100+ threads processing segments, and queries can sometimes query 100-1000s of segments per historical.

For example, you could have a query that queries 300 segments each taking 1100ms with a total query timeout of 300s, we would preemptively kill this query even though assuming moderate contention it would complete in ~84s.

We should instead incorporate the parallelism of the threadpool into the calculation to avoid killing ok queries. For example, something like (sum of extrapolated segment times) / (thread pool parallelism).

early query cancellation based on per-segment sampling

ce18b42

github-advanced-security bot found potential problems Mar 30, 2026

View reviewed changes

jtuglu1 reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Early query cancellation based on per-segment sampling#19235

feat: Early query cancellation based on per-segment sampling#19235
mshahid6 wants to merge 1 commit intoapache:masterfrom
mshahid6:per-segment-fail-fast

mshahid6 commented Mar 30, 2026

Uh oh!

Check warning

jtuglu1 left a comment •

edited

Loading

Uh oh!

jtuglu1 Apr 1, 2026

Uh oh!

jtuglu1 Apr 1, 2026 •

edited

Loading

Uh oh!

jtuglu1 Apr 1, 2026

Uh oh!

jtuglu1 Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mshahid6 commented Mar 30, 2026

Description

Key changed/added classes in this PR

Uh oh!

Check warning

Uh oh!

Uh oh!

jtuglu1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jtuglu1 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jtuglu1 left a comment •

edited

Loading

jtuglu1 Apr 1, 2026 •

edited

Loading