feat(config): nano-replica memory profile — run the replica in 512 MiB–1 GiB VMs#10491
Draft
Dfinity-Bjoern wants to merge 14 commits into
Draft
feat(config): nano-replica memory profile — run the replica in 512 MiB–1 GiB VMs#10491Dfinity-Bjoern wants to merge 14 commits into
Dfinity-Bjoern wants to merge 14 commits into
Conversation
Scale down the replica's memory capacities, reservations and limits so it can run on a 512 MiB–1 GiB VM (down from the 512 GiB mainnet footprint), accepting a substantially reduced subnet capacity. execution_environment.rs: - subnet memory capacity 2 TiB -> 512 MiB, threshold -> 384 MiB - guaranteed-response msg mem 15 GiB -> 64 MiB, best-effort 5 GiB -> 32 MiB - ingress history 4 GiB -> 32 MiB, wasm custom sections 2 GiB -> 16 MiB - execution threads 4 -> 1, query threads 4 -> 1 - subnet memory reservation 2560 -> 64 MiB per thread - callback soft limit 1,000,000 -> 4,096 - subnet heap delta capacity 140 GiB -> 96 MiB - query cache 200 -> 16 MiB, compilation cache 10 GiB -> 64 MiB embedders.rs (OOM-cliff fix — bound a single execution's resident set): - stable dirty/accessed page limits 1-8 GiB -> 32/128 MiB - max dirty pages without optimization 1 GiB -> 32 MiB - sandbox count 10,000 -> 32, idle time 30m -> 2m - rayon compilation/page-allocator threads 10/8 -> 2/2 - query threads per canister 2 -> 1 subnet_config.rs: - heap delta initial reserve 32 GiB -> 32 MiB (must be <= capacity) - max paused (DTS) executions 4 -> 1 - per-canister heap delta rate limit 75 -> 32 MiB sandboxed_execution_controller.rs: - decouple max sandbox RSS from heap delta via a 128 MiB floor (MIN_SANDBOXES_RSS), so a tiny heap delta no longer starves sandboxes - eviction batch 1 GiB -> 64 MiB message_routing.rs: - XNet stream target size 10 -> 2 MiB, max stream messages 10,000 -> 1,000 Verified: rustfmt, clippy (clean), cargo test -p ic-config (19 passed), bazel build //rs/config:config //rs/canister_sandbox:backend_lib. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DTS scheduler computes allocatable compute capacity as `(scheduler_cores - 1) * 100%` (round_schedule::compute_capacity_percent). With NUMBER_OF_EXECUTION_THREADS = 1 this is 0%, so the invariant `total_compute_allocation + 1% <= compute_capacity` fails on every round and the replica panics in the MR Batch Processor on restart. Bump to 2 (the scheduler floor). Memory cost is negligible: the extra execution thread's Wasm address space is virtual, resident usage stays bounded by the per-message dirty-page limits and the shared sandbox-RSS budget, and SUBNET_MEMORY_RESERVATION is 64 MiB x 2 = 128 MiB (< the 512 MiB subnet cap). Found by running a 4-node local-net subnet with the nano profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone stress driver for a local subnet, driven over the public
endpoint with the in-repo ic-canister-client Agent (no dfx needed):
deploys N universal canisters via provisional_create_canister_with_cycles,
then runs throughput / compute / dirty-page / memory-growth phases and
reports throughput, latency and error classes.
Run:
UNIVERSAL_CANISTER_WASM_PATH=/path/to/universal_canister.wasm \
cargo run -p ic-canister-client --example hammer -- http://localhost:8080
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Disable the storage cycle-reservation mechanism on the nano profile so canisters can freely allocate up to the subnet memory capacity: - SUBNET_MEMORY_THRESHOLD = SUBNET_MEMORY_CAPACITY (512 MiB). When the threshold is >= capacity the subnet is never "high usage", so growth never triggers cycle reservations (whose mainnet-calibrated pricing otherwise rejects growth on a tiny subnet, hitting the reserved-cycles limit). - SUBNET_MEMORY_RESERVATION = 8 MiB/thread (was 64), so the response- callback reservation no longer caps usable storage well below capacity. Also bake the matching hypervisor override into dev/local-net/prep.sh so the local 4-node net inherits it across resets. Verified on the local-net: with reservation disabled, a single message writing 24 MiB of stable memory succeeds while 48 MiB traps with "Exceeded the limit for the number of accessed pages ... limit 32768 KB" (the nano 32 MiB per-message stable limit), and the subnet keeps finalizing with no replica panic — i.e. the per-message limit, not an OOM-kill, bounds a single execution's working set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- HAMMER_MODE=probe runs only the per-message dirty/accessed-page-limit probe (skips the throughput/compute/growth storms). - Grow stable memory in its own committed message, then fill 24 MiB (under the 32 MiB limit, expect OK) and 48 MiB (over, expect trap), so the limit is isolated from subnet-capacity effects. - Widen error-class output so full canister reject reasons are visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The nano heap-delta capacity (96 MiB) is small relative to the default ~500-round checkpoint interval, so a memory-write-heavy workload fills the heap delta in a few rounds and then execution stalls until the next checkpoint flushes it (consensus keeps finalizing throughout — graceful, but execution duty-cycle collapses). Pass --dkg-interval-length 49 to ic-prep so checkpoints happen every ~50 rounds. Measured effect under the same hammer workload: heap-delta round-skips during the run: ~880 -> ~150 compute phase drains ~3x faster; execution advances in short bursts instead of multi-minute stalls. Checkpoint cadence follows the DKG interval (CUP heights); cheap here because the nano subnet state is only a few hundred MB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HAMMER_MODE=read populates N canisters with large stable state, then runs read-heavy 24 MiB stable_read calls — updates on all-but-one canister and queries on the last — concurrently, plus a 48 MiB single-execution read probe to exercise the per-message/query stable accessed-page limit. storm() gains an is_query flag to drive query calls via execute_query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HAMMER_MODE=heap mirrors the stable-memory tests on Wasm heap memory: per-message heap-write probe (24/48/96 MiB in one message), heap-write storm (8 MiB/call), and a heap-read storm (40 MiB get_global_data reads, updates + queries). Demonstrates that heap has no per-execution dirty/accessed cap (the 32 MiB limits are stable-only): all three single-message heap writes and the 40 MiB heap reads succeed, whereas the stable equivalents trap at 32 MiB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- read mode: cycle read offsets across the FULL populated range (not 4 fixed windows) and error-check the populate, so reads pull distinct state and all canisters are actually large. - heapread mode: build a large per-canister heap global via append_to_global_data and query-read it (96 MiB/read). Surfaces that large heap state is ~2.5x more expensive than stable (wasm heap never shrinks + realloc on build), so 3x96 MiB heap globals OOM the 512 MiB subnet while 3x128 MiB stable fits, and that large heap reads via update OOM (the get_global_data copy grows heap). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- read storm now queries ALL canisters cycling the full populated range (clean read pressure; queries don't replicate or dirty). - populate grows+fills in 24 MiB increments (a single 128 MiB grow can be rejected; small incremental grows reliably build the state). Used to measure read memory/perf under a container RAM cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HAMMER_MODE=calls: each ingress makes the target canister start a HAMMER_CALL_DEPTH-hop chain of update calls around the canister ring (nested via call_args().other_side), generating ~2*depth inter-canister messages per ingress. Used to stress message routing, callbacks and the guaranteed-response memory reservation under the nano profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iplier) HAMMER_MODE=fanout: each ingress fires N parallel fire-and-forget update calls (no-op callbacks), leaving N outstanding inter-canister calls per in-flight ingress to stress the guaranteed-response memory reservation and callback limits. HAMMER_FANOUT_MULT repeats the fan-out so a single message issues N*mult calls (all reservations taken before any drain), which exposes the 64 MiB guaranteed-response cap (~32 simultaneous calls). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HAMMER_MODE=hybrid runs three storms concurrently over the canister pool: query reads (24 MiB stable_read), update writes (8 MiB stable_fill), and 3-hop inter-canister call chains — splitting the concurrency budget. Shows read/update path isolation and update-path contention under mixed load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MAX_HEAP_DELTA_PER_ITERATION was 200 MB > SUBNET_HEAP_DELTA_CAPACITY (96 MiB), so a single execution round could push the in-memory heap delta far past the cap before the next round's skip-check — a transient spike of unreclaimable (anonymous) resident memory (~200-300 MB) that threatens a 512 MiB VM under write load. Lower it to 64 MB so one round cannot overshoot the cap, tightening the anonymous-memory ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Experimental “nano-replica” configuration intended to shrink replica memory footprints (targeting 512 MiB–1 GiB VMs) by aggressively reducing subnet memory caps, heap-delta limits, sandbox/resource limits, and XNet stream sizes, plus adding a standalone Rust “hammer” example to stress the local 4-node network.
Changes:
- Reduce multiple replica/subnet memory and concurrency limits (heap delta, message routing streams, sandbox resources, query/execution threads).
- Add
hammer.rsload driver example and wire in universal canister dependency for it. - Adjust
dev/local-netprep to bake nano hypervisor overrides and shorten DKG interval.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| rs/config/src/subnet_config.rs | Shrinks heap-delta iteration cap, reserve, paused DTS executions, and per-canister heap-delta rate limit. |
| rs/config/src/message_routing.rs | Reduces XNet stream target size and max messages per stream. |
| rs/config/src/execution_environment.rs | Cuts subnet memory/message capacities and execution/query parallelism; lowers caches and reservations. |
| rs/config/src/embedders.rs | Lowers stable-memory per-message dirty/accessed limits, sandbox counts/idle time, and compilation/page-copying parallelism. |
| rs/canister_sandbox/src/replica_controller/sandboxed_execution_controller.rs | Adds a minimum sandbox RSS floor and reduces eviction RSS batch size. |
| rs/canister_client/examples/hammer.rs | New stress-test tool for deploying universal canisters and generating mixed load patterns. |
| rs/canister_client/Cargo.toml | Adds ic-universal-canister as a dev-dependency for the new example. |
| dev/local-net/prep.sh | Applies nano hypervisor overrides and sets a shorter DKG interval for local-net. |
| Cargo.lock | Lockfile update for the added dev-dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+106
to
+110
| // Nano-replica profile: keep a single round's heap-delta production below the | ||
| // SUBNET_HEAP_DELTA_CAPACITY (96 MiB) so one round cannot overshoot the cap and | ||
| // spike unreclaimable (anonymous) resident memory. This bounds the per-round | ||
| // dirty working set so writes stay safe on a 512 MiB - 1 GiB VM. | ||
| const MAX_HEAP_DELTA_PER_ITERATION: NumBytes = NumBytes::new(64 * M); |
Comment on lines
33
to
+41
| /// This specifies the threshold in bytes at which the subnet memory usage is | ||
| /// considered to be high. If this value is greater or equal to the subnet | ||
| /// capacity, then the subnet is never considered to have high usage. | ||
| const SUBNET_MEMORY_THRESHOLD: NumBytes = NumBytes::new(750 * GIB); | ||
| // Nano-replica profile: set equal to the subnet memory capacity so the subnet | ||
| // is never considered "high usage" and the storage cycle-reservation mechanism | ||
| // stays disabled — canisters can allocate freely up to the subnet capacity | ||
| // without reserving cycles (reservation pricing is calibrated for mainnet and | ||
| // would otherwise reject growth on a tiny subnet). | ||
| const SUBNET_MEMORY_THRESHOLD: NumBytes = NumBytes::new(512 * MIB); |
Comment on lines
+593
to
+596
| // ---- Heap-read storm (analogue of the stable READ test) ---- | ||
| // get_global_data reads the whole 40 MiB global in one execution — more | ||
| // than the 32 MiB stable per-message accessed limit would ever allow. | ||
| let qry_cans = Arc::new(vec![canisters[canisters.len() - 1]]); |
Comment on lines
+606
to
+607
| us.report("HEAP-READ-UPDATE (40 MiB heap read)", t.elapsed()); | ||
| qs.report("HEAP-READ-QUERY (40 MiB heap read)", t.elapsed()); |
Comment on lines
+439
to
+442
| // Populate each canister with ~120 MiB of real stable data (written in | ||
| // <=24 MiB chunks to respect the 32 MiB per-message dirty limit). | ||
| const BIG_MIB: u32 = 128; | ||
| let chunk: u32 = 24 * MIB; |
Comment on lines
+666
to
+668
| println!("\n[5/5] MEMORY-GROWTH storm: grow 16 MiB + fill per call across all canisters until rejected"); | ||
| let grow = Arc::new(Stats::default()); | ||
| let total_mib = Arc::new(AtomicU64::new(0)); |
| for h in handles { | ||
| let _ = h.await; | ||
| } | ||
| grow.report("MEMORY-GROWTH", Duration::from_secs(1)); |
Comment on lines
68
to
73
| /// The number of sandbox processes to evict in one go in order to amortize | ||
| /// for the eviction cost. A large number could lead to the eviction | ||
| /// of many sandboxes and increased system load. The number was chosen | ||
| /// based on the assumption of 800 canister executions per round | ||
| /// distributed across 4 execution cores. | ||
| const SANDBOX_PROCESSES_TO_EVICT: usize = 200; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
An experiment to run the ICP replica in 512 MiB–1 GiB VMs (mainnet uses ~512 GiB), accepting a much smaller subnet: 2 execution threads, 1 query thread, a few-hundred-MB subnet state, best-effort-leaning messaging. The premise is that the 512 GiB figure is mostly worst-case capacity bounds and reservations, not steady-state resident memory — so the work is shrinking every pool/limit by ~100–1000× and fixing the few places where a single execution could OOM a tiny node.
All changes are gated to the nano profile's constants in
rs/config; nothing here is meant for mainnet defaults as-is.Config changes (
rs/config)execution_environment.rs: subnet memory capacity 2 TiB→512 MiB; exec threads 4→2, query threads 4→1; heap-delta capacity 140 GiB→96 MiB; guaranteed-response msg mem 15 GiB→64 MiB, best-effort 5 GiB→32 MiB; ingress history/custom sections/caches shrunk; callback soft-limit 1M→4096; memory reservation 2560→8 MiB/thread;SUBNET_MEMORY_THRESHOLD= capacity (disables storage cycle-reservation so canisters can use the full cap).embedders.rs(the OOM-cliff fix): per-message stable dirty/accessed page limits 1–8 GiB → 32/128 MiB; sandbox count 10000→32, idle 30m→2m; rayon threads 10/8→2/2.subnet_config.rs:MAX_HEAP_DELTA_PER_ITERATION200 MB→64 MiB (so a single round can't overshoot the 96 MiB heap-delta cap — bounds the unreclaimable resident spike under writes); heap-delta initial reserve 32 GiB→32 MiB; max paused (DTS) execs 4→1; per-canister heap-delta rate-limit 75→32 MiB.canister_sandbox/.../sandboxed_execution_controller.rs: decoupled max sandbox RSS from heap-delta via a 128 MiB floor (so shrinking heap delta doesn't starve sandboxes); eviction batch 1 GiB→64 MiB.message_routing.rs: XNet stream size 10→2 MiB, max stream messages 10000→1000.dev/local-net/prep.sh: bakes the nanohypervisoroverrides into generated configs and shortens the DKG/checkpoint interval to ~50 rounds.Two correctness fixes were found by actually running it: the DTS scheduler requires ≥2 cores (
(cores-1)*100%capacity → 1 core trips an invariant), andMAX_HEAP_DELTA_PER_ITERATIONmust stay ≤ the heap-delta cap.Load driver (
rs/canister_client/examples/hammer.rs)A self-contained stress tool driven over the public endpoint via the in-repo
ic-canister-client(no dfx). Deploys universal canisters and hammers them. Modes:read(stable reads),heap/heapread(heap memory),calls(inter-canister chains),fanout(parallel calls),hybrid(reads+writes+messaging at once), plus a per-message dirty/accessed-limit probe. Run e.g.:Key findings (from
dev/local-net, container RAM hard-capped)MAX_HEAP_DELTA_PER_ITERATIONfix keeps anon bounded (no 200 MB overshoot).Not done / caveats
bazel testsweep has not been run (CI-scale). Compile +ic-configunit tests + targeted bazel builds pass.🤖 Generated with Claude Code