Skip to content

[improve][pip] PIP-461: Add queued latency metrics for offloader executors#25322

Open
BewareMyPower wants to merge 2 commits intoapache:masterfrom
BewareMyPower:bewaremypower/pip-461-tiered-storage-queued-metrics
Open

[improve][pip] PIP-461: Add queued latency metrics for offloader executors#25322
BewareMyPower wants to merge 2 commits intoapache:masterfrom
BewareMyPower:bewaremypower/pip-461-tiered-storage-queued-metrics

Conversation

@BewareMyPower
Copy link
Contributor

Motivation

The default read threads for tiered storage is 2, when there are many concurrent read operations, many will be queued for long but there is no metric for it, so it's hard to tune the threads.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added PIP doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Mar 14, 2026
The helper is used in these paths:

- `BlobStoreManagedLedgerOffloader.offload(...)`
- `BlobStoreManagedLedgerOffloader.streamingOffload(...)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can leave streamingOffload() out of scope for now, since this path does not seem to be wired into the current production offload flow yet. If you would still like to include it here, could we also note that there is an intentional delayed resubmission in BlobStoreManagedLedgerOffloader.streamingOffloadLoop() and subtract that delay from the queue-latency metric?

scheduler.chooseThread(segmentInfo)
        .schedule(() -> {
            streamingOffloadLoop(partId, dataObjectLength);
        }, 100, TimeUnit.MILLISECONDS);

Otherwise the metric may end up including this built-in 100ms delay, instead of only the actual executor queue time.

2. `brk_ledgeroffloader_read_offload_executor_queue_latency`
- Description: Time that a blocking tiered-storage read task spends waiting in the offloader read executor queue
before starting execution
- Attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be helpful to add one more attribute here, something like operation (or caller). These executors seem to handle more than just one kind of work, so separating values such as offload, delete, open, read, and close might make the queue-latency metrics easier to interpret in practice. For example, it would be easier to tell whether the latency is coming from read traffic itself or from extra work scheduled on the same executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. PIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants