Skip to content

alexesDev/height_interval

Repository files navigation

height_interval

Slices the unprocessed block-height frontier into batches, hands them out to a fleet of workers — claiming each batch the moment it's dispatched — and stays the single, compact source of truth for how complete the dataset is.

A coordinator calls PopGaps in a loop: each call carves off the next batch (newest-first, sized to a budget) and marks it claimed, so dozens of workers can pull batches concurrently without ever fetching the same height twice. As ranges land, the queue merges them into one authoritative, always-current picture of what's done and what's still missing.

Jobs it's hired to do

The queue serves a blockchain extractor that fans block fetching out across dozens of workers and must keep an authoritative, up-to-date view of which heights are done. Every method maps to a concrete job:

1. "Remember what I've already fetched — compactly."

Push(interval) inserts a processed range and merges any touching neighbors ([1,2) + [2,3) → [1,3)), so a fully-synced chain collapses to a single interval instead of millions of entries. Overlapping pushes are rejected (returns false), keeping the set strictly sorted and non-overlapping.

2. "Tell me what to fetch next — newest first — and stop two workers grabbing the same range."

PopGaps(bound, maxSize, maxCount) is the heart of the library. In a single backward sweep it:

  • computes the complement of the processed set within bound (the unprocessed gaps),
  • returns them newest-first — highest heights come back first, because fresh blocks matter most,
  • caps the batch at maxSize values (and optionally maxCount gaps) — this is the unit of work one worker receives,
  • and writes each returned gap back into the queue as claimed, so the next call — even from another worker — resumes where this one stopped and never hands out the same range twice.

See PopGaps in detail below.

3. "Don't lose the historical holes I can't reach yet."

PopGaps only looks inside bound. A gap below the current window (history the node hasn't exposed yet) is never returned — but never dropped either. Widen bound later and the hole surfaces on the next call. No gap is permanently lost. See Gaps outside available are preserved.

4. "Let me hand a range back, or carve one out."

Cut(interval) removes a range, splitting an interval in two if the cut lands in its middle. Use it to release a claim or invalidate a bad fetch.

5. "Tell me how much work is left."

RemainingSize(available) reports uncovered values inside a window; Size() totals every value tracked across all intervals.

Core types

Interval[Earliest, Latest)

Method Description
Size() Number of values in the interval
Valid() Size() > 0
Chunks(n) Split into sub-intervals of length n
Clamp(bound) Restrict to lie within bound
LeftLimit(n) Shrink from the left to at most n values, keeping Latest fixed
Contains(other) Reports full containment
Overlaped(other) Reports any overlap

IntervalQueue[]Interval

Invariant: sorted by Earliest, no two items overlap or touch.

Method Description
Push(it) Insert, merging touching intervals. Returns false on invalid or overlap.
PopGaps(bound, maxSize, maxCount) Return unprocessed gaps newest-first; simultaneously claim them
PeekGaps(bound, maxSize, maxCount) Same as PopGaps but read-only — nothing is claimed
Cut(it) Remove a range, splitting items as needed
Merge(other) Push every interval from another queue into this one
ContainsHeight(h) Binary search: is height h covered?
FullyCovered(bound) Is every value in bound covered?
Frontier() [earliest.Earliest, latest.Latest) spanning the whole queue
GapCount(bound) Number of unprocessed gaps within bound (read-only)
RemainingSize(available) Uncovered value count within available
Len() Number of distinct intervals (not total values)
Size() Total values across all intervals
AllHeights() Expand to flat []int64 slice
Copy() Deep copy
SortByLatest() Sort by .Latest ascending

PopGaps in detail

processed = IntervalQueue{ [100, 200), [300, 400) }
bound     = [1, 500)
maxSize   = 50

gaps → [ [450, 500), [250, 300) ]   (newest-first, ≤50 values total)

processed is now:
          { [100, 200), [250, 300), [300, 400), [450, 500) }
       →  { [100, 200), [250, 400), [450, 500) }  (touching merged)

Calling PopGaps again continues from where it left off.

Gaps outside available are preserved

PopGaps only finds and claims ranges within bound. If a gap exists at a height that is currently outside available (e.g. the node hasn't synced that far yet, or the window starts above some historical hole), it will never be returned — but it is also never lost.

processed = IntervalQueue{ [500, 600) }
available = [500, 700)     ← window doesn't reach heights 1..499

gaps → [ [600, 700) ]      ← only the gap inside available is returned
       hole [1, 500) is untouched

When available is later extended to cover those heights, PopGaps will find them naturally on the next call:

available = [1, 700)       ← extended window

gaps → [ [600, 700), ... [1, 500) ]   ← historical hole now visible

This means no gap is permanently lost — it just waits until available grows to include it. The typical pattern is to move available.Earliest backwards over time as historical data becomes accessible, or to move available.Latest forward as the chain tip advances.

Go usage

import hi "github.com/alexes.dev/height_interval"

processed := hi.IntervalQueue{}
available := hi.Interval{Earliest: 1_000_000, Latest: 2_000_000}

// claim up to 128 blocks, newest-first
gaps := processed.PopGaps(available, 128, 0)
for _, g := range gaps {
    fetchRange(g.Earliest, g.Latest) // [earliest, latest)
}

Python usage

from height_interval import Interval, IntervalQueue

processed = IntervalQueue()
available = Interval(1_000_000, 2_000_000)

gaps = processed.pop_gaps(available, max_size=128)
for g in gaps:
    fetch_range(g.earliest, g.latest)

TypeScript usage

import { Interval, IntervalQueue } from "./height_interval";

const processed = new IntervalQueue();
const available = new Interval(1_000_000, 2_000_000);

const gaps = processed.popGaps(available, 128, 0);
for (const g of gaps) {
    fetchRange(g.earliest, g.latest); // [earliest, latest)
}

Running tests

# Go (native only)
go test ./...

# All three implementations (Go + Python + TypeScript) via Docker
docker compose up --abort-on-container-exit --exit-code-from go-tests

The Docker run starts Python and TypeScript HTTP servers, waits for them to be healthy, then runs the Go conformance suite against all three backends in parallel. Exit code mirrors the test result.

License

MIT — see LICENSE.md.

About

Sorted interval queue with gap-aware batch claiming for parallel block-height extraction. Go, Python, TypeScript.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors