Skip to content

design: Add freshness histograms design doc#35795

Draft
antiguru wants to merge 1 commit intoMaterializeInc:mainfrom
antiguru:moritz/freshness-histograms-design
Draft

design: Add freshness histograms design doc#35795
antiguru wants to merge 1 commit intoMaterializeInc:mainfrom
antiguru:moritz/freshness-histograms-design

Conversation

@antiguru
Copy link
Copy Markdown
Member

Summary

  • Proposes replacing per-second raw wallclock lag samples with sparse exponential histograms for freshness measurement.
  • Defines a freshness model decomposing end-to-end latency into input freshness, processing delay, and reporting delay.
  • Enables sub-second precision, percentile queries (up to p99.999 per week), and reduced storage costs (~20x cheaper in steady state).

Motivation

With per-object tick rates moving to sub-second (e.g., 100ms), the current 1s-rounded wallclock lag measurement loses meaningful precision. Raw per-second samples also don't support distributional queries and scale poorly with faster tick rates.

Key design decisions

  • Exponential bucket scheme with N "whole" bits and M "subdivision" bits, giving uniform relative resolution across scales.
  • Sparse histogram storage: only non-zero (bucket_i, count) pairs are persisted, keeping steady-state storage at ~3-5 rows per object per window.
  • Controller-side measurement initially, with the model supporting replica-side decomposition as a follow-up.
  • Controller reporting delay tracked as a Prometheus histogram (internal only) to establish measurement confidence.

This is a design doc only — no code changes.

🤖 Generated with Claude Code

Proposes replacing per-second raw wallclock lag samples with sparse
exponential histograms, enabling sub-second freshness measurement and
percentile queries (up to p99.999) while reducing storage costs.

Defines a freshness model that decomposes end-to-end latency into
input freshness, processing delay, and reporting delay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant