Skip to content

moq-net(stats): aggregate per-node into a single gzipped broadcast#1517

Open
kixelated wants to merge 4 commits into
mainfrom
claude/youthful-goldberg-990757
Open

moq-net(stats): aggregate per-node into a single gzipped broadcast#1517
kixelated wants to merge 4 commits into
mainfrom
claude/youthful-goldberg-990757

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

Summary

  • Collapses the per-(level × node) stats fan-out into a single .stats/node/{name} broadcast per relay (or .stats/node when unset). The four tier × role tracks remain, renamed to .json.gz.
  • Each frame is a gzipped JSON map of broadcast path to cumulative counter snapshot for every broadcast active in the last retention_ticks ticks. Tier / role / node are implied by the track and broadcast paths, not repeated in the payload.
  • --stats-levels is gone; level aggregation moves to a separate downstream binary. New flags: --stats-tick-secs (default 1) and --stats-retention-ticks (default 10), both Option<T> per the TOML-merge rule.

Why

Per-(level × node) broadcasts grow as (N + K) × M (broadcasts × levels × relays), and a browser dashboard subscribing to all of them hit the 100-subscription ceiling almost immediately. Per-frame payload now grows linearly in active-broadcast count instead, and gzip handles the repetitive JSON shape so the wire cost stays bounded.

Wire format

// .stats/node/sjc/1, track publisher.json.gz, gunzipped
{
  \"demo/bbb\": { \"broadcasts\": 1, \"broadcasts_closed\": 0, \"subscriptions\": 5,
                 \"subscriptions_closed\": 2, \"bytes\": 12345, \"frames\": 678, \"groups\": 9 },
  \"anon/foo\": { \"broadcasts\": 1, \"broadcasts_closed\": 0, \"subscriptions\": 2,
                 \"subscriptions_closed\": 0, \"bytes\": 234,   \"frames\": 12,  \"groups\": 1 }
}

A broadcast appears in the frame for a given (tier, role) while it has at least one active subscription, and lingers for retention_ticks ticks after the last one drops. Counters are cumulative; downstream computes rates.

Implementation notes

  • BroadcastEntry holds four Counters and four last_active_tick atomics (one per tier × role). The single snapshot task ticks every tick_secs, builds per-slot frames, writes them gzipped, and GCs entries that have no outstanding RAII guards (Arc::strong_count == 1) and are past the retention window. The strong-count check is the safety hatch: a bump on an orphaned Arc after GC can't be silently lost because we never GC while a guard is still alive.
  • The task lazily spawns on the first broadcast() call and exits after 2 × retention_ticks of an empty entry map (next bump respawns it). One broadcast and four tracks per node lifetime.
  • Idle-frame skipping is preserved by byte-comparing the per-slot JSON payload to the last emitted one.

Reviewer notes

  • Stats::new signature changed (levels: u32tick: Duration, retention_ticks: u32). Only moq-relay constructs Stats so the rest of moq-net's BroadcastStats / guard surface is unchanged.
  • js/net doesn't consume the stats wire format today, so no JS sync is needed per the Cross-Package Sync table; doc/bin/relay/config.md is updated with the new schema, flags, env vars.

Test plan

  • cargo test --workspace --all-targets (sans moq-gst, which needs system libs locally) — 11 new stats tests cover per-broadcast isolation, tier independence, single-broadcast announce, gzip round-trip, retention-keeps and retention-evicts, prefix/disabled no-ops, plus existing 78 moq-relay tests including the updated TOML/CLI regression test for tick_secs / retention_ticks.
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all --check
  • RUSTDOCFLAGS=-D warnings cargo doc --no-deps --workspace
  • cargo shear / cargo sort --workspace --check
  • End-to-end against a local relay: run moq-relay --stats-enabled --stats-node sjc/1, publish a couple broadcasts, subscribe to .stats/node/sjc/1/publisher.json.gz through gunzip, drop a publisher and confirm the entry lingers ~10 ticks then disappears.

(Written by Claude)

Per-broadcast level fan-out produced K*N stats broadcasts (K aggregation
levels times N served broadcasts times M relays) and quickly hit the
browser's 100-subscription ceiling. Collapse to one .stats/node/{name}
broadcast per relay carrying a gzipped JSON map of broadcast path to
cumulative counters for every broadcast active in the last
retention_ticks ticks. Dashboard-shape aggregation moves to a separate
downstream binary.

- Drop the Level / level_keys / advertised_path fan-out machinery; replace
  with one BroadcastEntry per path and a single snapshot task that ticks
  every tick_secs (default 1s) and writes gzipped JSON frames per the
  four existing tier x role tracks (renamed .json.gz).
- GC entries that have no outstanding guards (Arc::strong_count == 1) and
  haven't been observed active for retention_ticks (default 10) ticks.
- Replace --stats-levels with --stats-tick-secs and
  --stats-retention-ticks. Both Option<T> per the TOML-merge rule;
  regression test updated to cover them.
- Document the new wire format, broadcast path, tracks, and flags in
  doc/bin/relay/config.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 67ad88c1-d83b-4805-8c0b-0fda5d8e4bab

📥 Commits

Reviewing files that changed from the base of the PR and between 2790272 and 73536b7.

📒 Files selected for processing (4)
  • demo/relay/localhost.toml
  • doc/bin/relay/config.md
  • rs/moq-net/src/stats.rs
  • rs/moq-relay/src/stats.rs
✅ Files skipped from review due to trivial changes (1)
  • doc/bin/relay/config.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • demo/relay/localhost.toml
  • rs/moq-relay/src/stats.rs

Walkthrough

This PR refactors stats aggregation from a multi-level bucketing model to per-broadcast-path publishing with configurable tick intervals and retention windows. The Stats::new() signature changes to accept tick: Duration and retention_ticks: u32 instead of levels: u32. A single shared publisher task now builds four gzipped JSON frames per tick—one for each (tier, role) combination—mapping broadcast paths to cumulative counter snapshots. Idle frames are skipped when JSON bytes match the previously emitted frame. Entries are garbage-collected when they have no active subscriptions and fall outside the retention window. Configuration is added to relay's TOML/CLI, documentation updated, and demo configs migrated to the new tick/retention model.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: collapsing per-(level × node) stats fan-out into a single per-node broadcast. It directly reflects the core objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the summary, rationale, wire format, implementation details, and test plan for the stats aggregation changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/youthful-goldberg-990757

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rs/moq-net/src/stats.rs`:
- Around line 687-691: The code advances last_payload (`*last = json`) even when
`track.write_frame(compressed)` fails; change the control flow so `*last = json`
is executed only on success—i.e., call `track.write_frame(compressed)` and if it
returns Err log the error and do not update `last`, otherwise update `last` (for
example, move the `*last = json` into the success branch or use an early
continue/return on error). Ensure you modify the block around
`track.write_frame(compressed)` and the `*last = json` assignment so failed
writes do not advance `last_payload`.
- Around line 635-641: The retention window check is off-by-one: change the
condition that currently uses `current_tick.saturating_sub(last) <
retention_ticks as u64` to use `<=` so it becomes
`current_tick.saturating_sub(last) <= retention_ticks as u64`; this preserves
the intended "linger for N ticks" semantics and ensures `retention_ticks == 0`
still allows emission while `counters.active()` updates `last_tick` (identify
the code around `last_tick`, `current_tick`, `retention_ticks`,
`counters.active()` and the `frame.insert(entry.path.as_str().to_string(),
snap)` insertion).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 698f1649-c811-477a-846e-61356adc8158

📥 Commits

Reviewing files that changed from the base of the PR and between e9a1086 and 07a2584.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • doc/bin/relay/config.md
  • rs/moq-net/Cargo.toml
  • rs/moq-net/src/stats.rs
  • rs/moq-relay/src/config.rs
  • rs/moq-relay/src/stats.rs

Comment thread rs/moq-net/src/stats.rs
Comment thread rs/moq-net/src/stats.rs Outdated
kixelated and others added 3 commits May 27, 2026 12:29
CodeRabbit caught two issues:

1. Retention window was `< retention_ticks`, dropping entries one tick
   earlier than the documented "linger for N ticks after". Also broke
   retention_ticks=0 (the relay CLI clamps to >= 1, but the library API
   accepts u32 raw). Now uses `<=` and a separate active-branch insert
   so retention=0 emits while subs are live and never lingers.

2. `last_payload` advanced even when `track.write_frame()` failed,
   silently dropping the snapshot on the retry. Now updates only on
   success so the next tick re-attempts the same payload.

Adds retention_boundary_keeps_last_idle_tick to lock in the boundary
behaviour precisely (diff==N kept, diff==N+1 GC'd) so a future regression
in either direction is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-node stats refactor in the prior commit removed --stats-levels;
the demo TOMLs still set it and would fail to load under
deny_unknown_fields. Also retire the now-stale comment block describing
.stats/prefix/<level> paths in favor of the new .stats/node/<node>
schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-frame payload isn't large enough yet to justify the flate2 dep, and a
future moq-lite revision is expected to negotiate transparent compression
at the protocol layer. Tracks rename `.json.gz` -> `.json`; subscribers
consume the raw JSON directly. Also lowers the default retention window
to 1 tick to match the typical reconnect cadence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant