moq-net(stats): aggregate per-node into a single gzipped broadcast#1517
moq-net(stats): aggregate per-node into a single gzipped broadcast#1517kixelated wants to merge 4 commits into
Conversation
Per-broadcast level fan-out produced K*N stats broadcasts (K aggregation
levels times N served broadcasts times M relays) and quickly hit the
browser's 100-subscription ceiling. Collapse to one .stats/node/{name}
broadcast per relay carrying a gzipped JSON map of broadcast path to
cumulative counters for every broadcast active in the last
retention_ticks ticks. Dashboard-shape aggregation moves to a separate
downstream binary.
- Drop the Level / level_keys / advertised_path fan-out machinery; replace
with one BroadcastEntry per path and a single snapshot task that ticks
every tick_secs (default 1s) and writes gzipped JSON frames per the
four existing tier x role tracks (renamed .json.gz).
- GC entries that have no outstanding guards (Arc::strong_count == 1) and
haven't been observed active for retention_ticks (default 10) ticks.
- Replace --stats-levels with --stats-tick-secs and
--stats-retention-ticks. Both Option<T> per the TOML-merge rule;
regression test updated to cover them.
- Document the new wire format, broadcast path, tracks, and flags in
doc/bin/relay/config.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThis PR refactors stats aggregation from a multi-level bucketing model to per-broadcast-path publishing with configurable tick intervals and retention windows. The 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rs/moq-net/src/stats.rs`:
- Around line 687-691: The code advances last_payload (`*last = json`) even when
`track.write_frame(compressed)` fails; change the control flow so `*last = json`
is executed only on success—i.e., call `track.write_frame(compressed)` and if it
returns Err log the error and do not update `last`, otherwise update `last` (for
example, move the `*last = json` into the success branch or use an early
continue/return on error). Ensure you modify the block around
`track.write_frame(compressed)` and the `*last = json` assignment so failed
writes do not advance `last_payload`.
- Around line 635-641: The retention window check is off-by-one: change the
condition that currently uses `current_tick.saturating_sub(last) <
retention_ticks as u64` to use `<=` so it becomes
`current_tick.saturating_sub(last) <= retention_ticks as u64`; this preserves
the intended "linger for N ticks" semantics and ensures `retention_ticks == 0`
still allows emission while `counters.active()` updates `last_tick` (identify
the code around `last_tick`, `current_tick`, `retention_ticks`,
`counters.active()` and the `frame.insert(entry.path.as_str().to_string(),
snap)` insertion).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 698f1649-c811-477a-846e-61356adc8158
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
doc/bin/relay/config.mdrs/moq-net/Cargo.tomlrs/moq-net/src/stats.rsrs/moq-relay/src/config.rsrs/moq-relay/src/stats.rs
CodeRabbit caught two issues: 1. Retention window was `< retention_ticks`, dropping entries one tick earlier than the documented "linger for N ticks after". Also broke retention_ticks=0 (the relay CLI clamps to >= 1, but the library API accepts u32 raw). Now uses `<=` and a separate active-branch insert so retention=0 emits while subs are live and never lingers. 2. `last_payload` advanced even when `track.write_frame()` failed, silently dropping the snapshot on the retry. Now updates only on success so the next tick re-attempts the same payload. Adds retention_boundary_keeps_last_idle_tick to lock in the boundary behaviour precisely (diff==N kept, diff==N+1 GC'd) so a future regression in either direction is caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-node stats refactor in the prior commit removed --stats-levels; the demo TOMLs still set it and would fail to load under deny_unknown_fields. Also retire the now-stale comment block describing .stats/prefix/<level> paths in favor of the new .stats/node/<node> schema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-frame payload isn't large enough yet to justify the flate2 dep, and a future moq-lite revision is expected to negotiate transparent compression at the protocol layer. Tracks rename `.json.gz` -> `.json`; subscribers consume the raw JSON directly. Also lowers the default retention window to 1 tick to match the typical reconnect cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Actionable comments posted: 0 |
Summary
.stats/node/{name}broadcast per relay (or.stats/nodewhen unset). The fourtier × roletracks remain, renamed to.json.gz.retention_ticksticks. Tier / role / node are implied by the track and broadcast paths, not repeated in the payload.--stats-levelsis gone; level aggregation moves to a separate downstream binary. New flags:--stats-tick-secs(default1) and--stats-retention-ticks(default10), bothOption<T>per the TOML-merge rule.Why
Per-(level × node) broadcasts grow as
(N + K) × M(broadcasts × levels × relays), and a browser dashboard subscribing to all of them hit the 100-subscription ceiling almost immediately. Per-frame payload now grows linearly in active-broadcast count instead, and gzip handles the repetitive JSON shape so the wire cost stays bounded.Wire format
A broadcast appears in the frame for a given (tier, role) while it has at least one active subscription, and lingers for
retention_ticksticks after the last one drops. Counters are cumulative; downstream computes rates.Implementation notes
BroadcastEntryholds fourCountersand fourlast_active_tickatomics (one pertier × role). The single snapshot task ticks everytick_secs, builds per-slot frames, writes them gzipped, and GCs entries that have no outstanding RAII guards (Arc::strong_count == 1) and are past the retention window. The strong-count check is the safety hatch: a bump on an orphanedArcafter GC can't be silently lost because we never GC while a guard is still alive.broadcast()call and exits after2 × retention_ticksof an empty entry map (next bump respawns it). One broadcast and four tracks per node lifetime.Reviewer notes
Stats::newsignature changed (levels: u32→tick: Duration, retention_ticks: u32). Only moq-relay constructsStatsso the rest of moq-net'sBroadcastStats/ guard surface is unchanged.js/netdoesn't consume the stats wire format today, so no JS sync is needed per the Cross-Package Sync table;doc/bin/relay/config.mdis updated with the new schema, flags, env vars.Test plan
cargo test --workspace --all-targets(sansmoq-gst, which needs system libs locally) — 11 new stats tests cover per-broadcast isolation, tier independence, single-broadcast announce, gzip round-trip, retention-keeps and retention-evicts, prefix/disabled no-ops, plus existing 78 moq-relay tests including the updated TOML/CLI regression test fortick_secs/retention_ticks.cargo clippy --workspace --all-targets -- -D warningscargo fmt --all --checkRUSTDOCFLAGS=-D warnings cargo doc --no-deps --workspacecargo shear/cargo sort --workspace --checkmoq-relay --stats-enabled --stats-node sjc/1, publish a couple broadcasts, subscribe to.stats/node/sjc/1/publisher.json.gzthroughgunzip, drop a publisher and confirm the entry lingers ~10 ticks then disappears.(Written by Claude)