Supervoxel splitting with base+fork support and locks#534
Open
akhileshh wants to merge 74 commits into
Open
Conversation
Drop the inline propagate_to_coarser_scales call from write_seg; coarser mips are now the async downsample worker's responsibility. write_seg is back to a single base-scale tensorstore write, so SV splits no longer block on the full pyramid update. TestWriteSeg updated to assert the coarser scales stay zero after write_seg (propagation tested separately via TestPropagateToCoarserScales).
split_supervoxel now returns its base-resolution bbox. split_with_sv_splits collects one per call and attaches the list to Result.seg_bbox (new optional field). publish_edit includes the list in the payload and sets downsample="true" so the worker only runs on edits that touched base seg. List kept unmerged — lets the worker skip tiles outside the actual change region.
workers/downsample_worker.py consumes edits-exchange messages flagged downsample="true" and writes each non-base mip within the SV-split bbox. graph/downsample.py splits the region into pyramid_blocks (sized so no two blocks share a storage chunk at any mip), then either tinybrain'd in one call (fast path, typical small edits) or per-mip (fallback when base read exceeds memory budget). Write filtering keeps OCDBT delta proportional to the actual change. DownsampleBlockLock serializes overlapping jobs via kvdbclient's row-key lock API; 26-byte hash-prefixed keys avoid tablet hot-spots. Depends on kvdbclient lock_by_row_key / unlock_by_row_key / renew_lock_by_row_key landing first.
Used by the async downsample worker as the mip-pyramid kernel.
Closes a concurrency gap where two SV splits on overlapping L2 chunks but distinct roots can't be serialized by root locks — they acquire disjoint root-lock sets and race on seg state. L2ChunkLock serializes them via the kvdbclient row-key lock primitive. Row key = 2-byte blake2b hash + 8-byte uint64 chunk_id (10 bytes). Hash prefix keeps spatially-clustered L2 chunks from hot-spotting a single bigtable tablet under concurrent load. Primitive only — callers land separately. RowKeyLockRegistry helper moved to tests/helpers.py so L2ChunkLock and DownsampleBlockLock tests share it instead of duplicating.
Callers now get Cut(atomic_edges) | PreviewCut(ccs, illegal_split) | SvSplitRequired(sv_remapping) instead of unwrapping-by-convention or catching SupervoxelSplitRequiredError. The exception still unwinds inside LocalMincutGraph — cheapest way to bail out of deep path code — but it's caught once at the run_multicut boundary and never escapes, so callers don't use raise/catch for control flow.
MulticutOperation._apply dispatches on the tagged multicut result and, when an SV split is needed, calls the new edits_sv.split_supervoxels under its surrounding RootLock, refreshes source/sink SV IDs from seg, and retries multicut against the post-split graph. Root lock spans the whole critical section; L2ChunkLock held only around the split loop. This closes two races that existed when split_with_sv_splits handled the flow outside any lock: same-root (root lock now never released between multicut and commit) and cross-root (L2ChunkLock serializes overlapping split regions). split_with_sv_splits is deleted; handle_split calls cg.remove_edges directly.
Pre-compute each rep's bbox from the chunk coords of its CC members in sv_remapping (no coord-padding, no resolution-axis assumption). split_supervoxels builds the union lock set across reps — sparse chunks plus one L2-chunk margin for update_edges's 1-voxel overlap read — acquires once, then loops per-rep splits. _update_chunks surfaces the change_chunks that actually got new SV IDs; write_seg_chunks fires one tensorstore future per change-chunk and awaits together, so only chunks with real label changes hit OCDBT. Gap chunks between CC pieces and neighbor chunks read for the overlap never get rewritten, keeping the delta proportional to the edit. split_supervoxels also threads back the fresh source/sink SV IDs from the in-memory new_seg block (same bytes that just landed on storage), so the retry multicut sees current IDs without an extra seg read. Drops _get_whole_sv (dead since the sv_remapping switch). Adds a high-level architecture doc covering the end-to-end flow, concurrency design, and durable invariants.
Enable opening a CG's OCDBT at a prior commit via the driver's `version` spec field (int generation or ISO-8601 commit_time upper bound). Groundwork for operator-driven replay of failed SV splits, which needs clean pre-op reads against append-only storage. OCDBT stamps commits from absl::Now() with no caller-override hook, so pins will use OperationTimeStamp captured under the L2 chunk lock rather than aligning OCDBT commit times to operation time.
This comment was marked as spam.
This comment was marked as spam.
split_supervoxels is now a pure planner returning a SplitResult (seg_bboxes, source_ids_fresh, sink_ids_fresh, seg_writes, bigtable_rows). No lock acquisition, no writes. The caller (MulticutOperation._apply) holds the L2 chunk locks and fires the consolidated persist — OCDBT chunks + bigtable rows — inside an inner lock scope. seg_writes is a flat list of (voxel_slices, data) pairs across all reps so write_seg_chunks fires every chunk write as one parallel tensorstore batch. Removes the per-rep serialization in the old write_seg_chunks loop. get_seg_source_and_destination_ocdbt gains a pinned_at kwarg, forwarded to build_cg_ocdbt_spec — used later by the recovery path.
New L2 chunk counterpart to IndefiniteRootLock, keyed by chunk row. L2ChunkLock now acquires via lock_by_row_key_with_indefinite so a temporal acquire sees a crashed op's indefinite cell and refuses. IndefiniteL2ChunkLock records its chunk scope on the op-log row's L2ChunkLockScope column at __enter__ and clears it on clean exit, giving recovery a durable scope without a bigtable-wide scan. Both indefinite locks (root and L2 chunk) now short-circuit __exit__ when an exception is propagating: cells stay held, scope stays set. Partial writes may exist after an exception; leaving the cells forces subsequent ops to refuse at lock-acquire and the operator to run recovery explicitly. privileged_mode=True on either lock is the operator recovery escape hatch: skips acquire, pre-populates acquired_keys so __exit__'s value-matched release deletes the crashed op's cells. RowKeyLockRegistry (test helper) gains the three new kvdbclient primitives.
Operator recovery for SV-split ops that crashed mid-write. A worker death inside IndefiniteL2ChunkLock leaves per-chunk indefinite cells set and records the chunk scope on the op-log row. Recovery reverts partial OCDBT writes using a version-pinned read of pre-op voxels, then replays the op normally. list_stuck scans OperationLogs for ops still at CREATED past a min-age threshold. replay(cg, op_id) runs cleanup_partial_writes followed by repair.edits.repair_operation(..., unlock=True); IndefiniteL2ChunkLock's privileged-mode __exit__ deletes the crashed op's pre-existing cells after the replay's writes land. Architecture-level operator guide at docs/sv_splitting_recovery.md, linked from docs/sv_splitting.md's Concurrency section.
Pass the op's timestamp from execute → _apply → split_supervoxels → split_supervoxel → copy_parents_and_add_lineage / add_new_edges so every new-SV bigtable mutation lands at the op's logical write time. Gets atomic visibility under a parent_ts filter and makes replay's override_ts actually control what time-filtered readers see after a repair_operation. Parent-copy and Child-list writes deliberately keep the old cell's timestamp so pre-op readers still see the old hierarchy. Replace the seven-tuple in/out soup in split_supervoxels with named dataclasses: SvSplitTask (plan_sv_splits → split_supervoxel input) and SvSplitOutcome (split_supervoxel's per-task output). Drop the two unused return fields on split_supervoxel.
list_stuck filter switches from Status==CREATED to "L2ChunkLockScope set and Status != SUCCESS past min_age". The authoritative signal for stuck-ness is "scope recorded, not cleared" — worker crash (Status stays CREATED) and Python exception during persist (Status=EXCEPTION after Fix 1) both fall under it. Ops without scope aren't blocking other ops and are outside stuck_ops' concern. replay now verifies each chunk in the recorded scope actually has Concurrency.IndefiniteLock held by this op_id before running cleanup or repair. If cells are missing or held by a different op, raises with a clear error. Protects against double-replay (first run already released cells) and out-of-band clearing (manual bigtable edit, buggy release path) — both would have cleanup_partial_writes revert chunks that aren't ours.
fork_base_manifest is now an explicit step — invoked from the ingest CLI's --ocdbt path or the seg_ocdbt notebook — rather than being auto-triggered on first ws_ocdbt_scales access. ws_ocdbt_scales asserts fork_exists() so a missing fork fails with a clear pointer instead of a tensorstore mismatch/not-found error.
Stuck-op detection keys off L2ChunkLockScope being populated, not Status=CREATED — that filter also covers caught-exception paths (Status=FAILED) where cells are held but the row isn't CREATED. Recovery now verifies each chunk's IndefiniteLock is actually held by the op before cleaning up, so a stale scope can't have us revert chunks another op owns. Reflect both in docs and the list CLI help. Drop "mode-downsample" from the SV-split diagram — tinybrain owns the algorithm; the doc shouldn't pin it.
`_rep_bbox` enveloped every piece of the cross-chunk-connected rep — for physical SVs split into many pieces across chunks, the bbox grew far wider than the cut surface needs. Replace with `_coords_bbox`: envelope of the user-placed source/sink coords plus a one-chunk margin (matches the existing L2 lock margin and 1-voxel shell). After the seg read, `cut_supervoxels` is intersected with the IDs present in seg, so the "whole sv" set names only the rep pieces the bbox actually touches. Pieces of the rep outside the bbox keep their existing IDs — their cross-chunk edges to in-bbox split fragments are routed via the 1-voxel shell, edges between two unsplit pieces don't change. Adds `TestCoordsBbox` covering envelope+margin, volume-bound clipping, and that `plan_sv_splits` returns a tight bbox regardless of how distant the rep's other pieces sit.
handle_supervoxel_id_lookup and id_helpers.get_atomic_ids_from_coords both short-circuit to lookup_svs_from_seg whenever ocdbt_seg is true, regardless of node-id layer. 2D slice clicks send L1 IDs from a view that may be stale after an SV split; 3D mesh clicks send a root and no L1 at all. Either way the current SV is what matters, so we read seg at the click coords and let downstream same/different-root checks surface any staleness with the sv_id->root diagnostic. Also vectorize lookup_svs_from_seg's per-coord indexing into a single advanced-index op.
`_schema_from_src` was passing both `domain` and `shape` to ts.open when cloning the source schema to the destination. For sources with a non-zero `voxel_offset`, `domain` carries absolute bounds (e.g. [17756, 62244)) while `shape` implies an origin of 0, and tensorstore refuses to merge them — base creation fails on any precomputed source that doesn't start at the origin. `domain` already encodes both extent and offset, so passing only it is sufficient and avoids the conflict.
cloudbuild now uses BuildKit registry cache (`:buildcache`) so unchanged stages reuse the prior build's layer artifacts and already-warm nodes skip re-downloading them on pull. The fresh-ingest CLI no longer infers `ocdbt_populate_base` from `base_exists` — manifest presence didn't reflect whether chunks were actually copied. Replaced with an explicit `--populate-base` flag; operator sets it on first ingest and omits on subsequent runs.
In-repo notes on every OCDBT spec field, config sub-field, ocdbt_coordinator resource field, DistributedCoordinatorServer constructor field, and runtime defaults — verified empirically against the venv's tensorstore binary by probing with intentional-bad-value plus spec round-trips. Captures the distributed-mode constraints: atomic transactions forbidden, cooperator-forwarded RPC bounded at gRPC's 4 MiB default with no Python knob to raise it, leases held per btree node so disjoint user-key writes still cross-forward. Catalogs the levers that do exist (max_decoded_node_bytes, max_inline_value_bytes, lease_duration) and ones that look promising but aren't fields.
9b86434 to
5873aa8
Compare
Reading tensorstore's distributed btree_writer.cc StagePending: values ≤ max_inline_value_bytes are carried inline in the encoded mutation, so the cooperator's per-leaf WriteRequest packs all of them together. With the previous 1 MiB threshold, every compressed_segmentation chunk that landed below 1 MiB stayed inline and contributed its full bytes to the gRPC forward — which blows past tensorstore's hardcoded 4 MiB gRPC max-receive whenever a few inline chunks land on the same btree leaf. Dropping to 4 KiB pushes every chunk value through WriteData into a d/ file and replaces it in the mutation with an IndirectDataReference, so forwarded RPCs carry only small refs regardless of value size. Small metadata (info JSON, populate-marker files) still stays inline. Tradeoff: the old comment cited "7× GCS bloat" at the 100-byte default — that ratio came from per-chunk independent zstd framing dominating tiny payloads, not chunk-sized ones. At 4 KiB+ payloads the per-chunk zstd overhead is in the single-digit-percent range. Reference doc updated to credit max_inline_value_bytes as the actual RPC-size lever and correct the cooperator-batching section.
tensorstore raises ValueError with an absl status-code prefix on transport-level failures. A DNS hiccup or a GCS 5xx in is_chunk_populated or mark_chunk_populated was killing the whole populate task. Wrap the four marker / populate-meta helpers in a tenacity retry that matches UNAVAILABLE / DEADLINE_EXCEEDED / ABORTED / INTERNAL prefixes only — NOT_FOUND, INVALID_ARGUMENT, RESOURCE_EXHAUSTED still propagate.
…ask-body mode flags
`ingest layer N` now owns the whole OCDBT lifecycle when N matches
ocdbt_populate_layer: idempotently runs setup_base (base + fork +
config reconcile), starts the coordinator, queues tasks. Moves
setup_base out of `ingest graph` so there's a single command that
manages OCDBT. Re-pickles the IngestionManager so workers see the
resolved config.
Adds three flags to `ingest layer`:
--queue-only/-q : skip the coordinator (one is running elsewhere).
--ocdbt-only/-o : workers run only OCDBT populate, skipping
add_parent_chunk. For re-populating a freshly created/wiped OCDBT
base against an already-built bigtable graph.
--ingest-only/-i : workers run only add_parent_chunk, skipping
OCDBT populate. For rebuilding bigtable against an existing
OCDBT base.
Refactors create_parent_chunk to a single function with a `mode`
parameter bound at queue time via functools.partial — no duplicated
OCDBT-populate block. _post_task_completion runs unconditionally so
layer-progress tracking in redis stays consistent across modes.
Adds IngestionManager.is_ocdbt_populate_layer(layer) so the compound
guard isn't repeated at every call site.
`setup_base` calls `fork_base_manifest` once at graph creation, before populate has committed most of its writes to base. Any base commit after fork-creation doesn't propagate into the fork's manifest, so kvstack-routed reads through the fork miss every chunk written to base afterwards — meshing (and any other OCDBT seg read) returns zeros. Adds `ensure_fork_synced(ws, graph_id)` in graph/ocdbt/main.py and calls it from `ChunkedGraphMeta.ws_ocdbt_scales` lazy init. The helper re-snapshots the fork manifest from base when the fork's data prefix is empty (no edits) and the manifests differ. Edit-free is the safety guard: an SV-split writes value/btree files under <graph_id>_d/ before updating the fork manifest, so a non-empty prefix means an edit either landed or is in flight and must be preserved. The check is race-free vs concurrent populate (which writes only to base). Re-enables the OCDBT path in `get_local_segmentation` that was temporarily disabled while the kvstack reads returned zeros.
The unsharded-dynamic-mesh subdir name has been a load-bearing
constant ("dynamic") shared implicitly between mesh_worker (reads
`custom_data["mesh"].get("dynamic_mesh_dir", "dynamic")`) and
neuroglancer (hardcodes the same literal in graphene/backend.ts).
Surface it through:
- `handle_info` (the Flask `/info` endpoint NG fetches) — so
`info["dynamic_mesh_dir"]` is exposed alongside `mesh_dir`.
- `get_json_info` (used internally by meshing modules to build
CloudVolume `info=` kwargs) — for consistency.
Default preserves existing datasets. Requires a matching NG-side
patch (graphene backend) to read this field instead of the
hardcoded literal.
After a bigtable row-copy clones an existing graph into a new
graph_id, two graph_id-bearing pieces of runtime state must be
reconciled before the new table is usable:
1. `custom_data["mesh"]["dynamic_mesh_dir"]` carries forward the
source table's value, which means the new table's edited mesh
fragments would write into the source's dynamic mesh dir and
overwrite each other at the same fragment-id keys. Extends the
existing `graph_id != self.graph_id` shim in ChunkedGraph.__init__
to also rewrite this field — only when the key is present in the
copied custom_data — using the new graph_id directly:
custom_data["mesh"]["dynamic_mesh_dir"] = f"dynamic_{graph_id}"
`mesh.dir` (initial sharded meshes) is dataset-scoped and
intentionally shared, so it's not touched.
2. The per-CG OCDBT fork directory doesn't exist for the new
graph_id, so `ws_ocdbt_scales` would fire `assert fork_exists`
and any OCDBT-side read (meshing, sv-lookup, edits) would
blow up. Replaces the assertion with `if not fork_exists:
fork_base_manifest(...)`. Idempotent and race-safe: concurrent
opens write identical base-manifest bytes to the same path;
can't race with an edit because edits pre-suppose the fork.
`ensure_fork_synced` immediately after keeps it in step with
base.
Net effect: copy the bigtable rows, then `ChunkedGraph(graph_id=...)`
once. The shim updates meta in a single update_meta call and the
ws_ocdbt_scales lazy-init creates+syncs the fork. No operator step.
Two symptoms of the same root cause — upstream `tiangolo/uwsgi-nginx-flask:python3.12` was rebuilt, its digest changed, and that broke us two ways: 1) Stage-1 cache invalidated → builds went from ~1.5min to 7+min (full `conda env create` re-runs on every CI build because the FROM layer hash now changes whenever upstream rebuilds the tag). 2) The new base image carried a newer transitive Python package set into the conda env, leaving `importlib_metadata` in a partial-overlay state in `/app/venv` — `__init__.py` from one version on top of file tree from another, missing `_context.py`. Imports through messagingclient → google.cloud.pubsub_v1 → opentelemetry → opentelemetry.util._importlib_metadata crashed with `ModuleNotFoundError: No module named 'importlib_metadata._context'`. Pin the base image by digest so subsequent builds get the same Python file set and the layer cache stays warm. Bump opentelemetry (via pip-compile --upgrade) to 1.42.x, which dropped the third-party `importlib_metadata` wrapper in favor of stdlib `importlib.metadata` — verified inside the 1.42.1 wheel (opentelemetry/util/_importlib_metadata.py imports `from importlib.metadata import ...`). With opentelemetry no longer pulling in the third-party package, no Python 3.12 consumer remains, so pip-compile correctly removed `importlib-metadata` from the lockfile entirely. Net dep changes: opentelemetry-api/sdk 1.39.1 → 1.42.1 opentelemetry-semantic-conventions 0.60b1 → 0.63b1 google-cloud-pubsub 2.35.0 → 2.38.0 importlib-metadata 8.7.1 → dropped + routine minor bumps across pip-compiled deps requirements.in unchanged. Bump the Dockerfile digest manually when you want a fresher base.
- conftest bootstrap fixtures unpack the OcdbtConfig slot - ws_cv mock assertions accept the loader's two CloudVolume constructions (info-fetch + cached-handle with info=) - ws_ocdbt tests bypass OCDBT meta resolution and the fork-sync side-channel that hit real GCS, and relax the call-args check now that get_seg_source_and_destination_ocdbt takes the OcdbtConfig positionally - cleanup_partial_writes patches _read_source_scales at the call site (ocdbt.main) instead of the package re-export
Opening an existing OCDBT must not embed a `config` block — tensorstore validates every field against the on-disk manifest and raises FAILED_PRECONDITION on any drift. Drift bricks every base that was created with the older default for `max_inline_value_bytes`. Only `create_base_ocdbt` keeps `config`; `build_cg_ocdbt_spec` and `open_base_ocdbt` no longer pass it. Functions still accept the `OcdbtConfig` arg so callers don't change; it's just not asserted into the open-time spec.
…state warning The function's sole job is closing the setup-base race: snapshot the fork manifest from base before the fork's first edit. Once any edit lands the function has nothing to do — base is immutable post-setup, so manifests *will* diverge by design as the fork moves forward. Short-circuit on edit-files first (lists `<graph_id>_d/`, no manifest reads on the hot path) and remove the warning that fired on every runtime open of every edited CG with a misleading "stale vs base" framing.
Writes every mesh.* field a new or freshly-copied graph needs to serve fragments — watershed-cv `info["mesh"]`, sharded mesh spec per layer, `info["mesh_metadata"]` (uniform draco grid + dynamic dir), and the bigtable `custom_data["mesh"]` block including `initial_ts`. Static values come from a `mesh_config:` block in the dataset yaml, parsed via the new `MeshConfig` dataclass (mirrors `OcdbtConfig`). No in-code defaults — yaml must supply every static field; only the graph-id-derived `dynamic_mesh_dir` is optional. `initial_ts` is set once and preserved on re-runs (changing it would reclassify node ids and silently break served manifests). When unset, it's derived from a root id sampled near the volume center. The CLI subcommand is `ingest mesh_meta GRAPH_ID DATASET_YAML`. It does not gate on redis ingest progress — the operator decides when the graph is ready.
…a graph Every cached manifest fragment (initial + dynamic, present fragments and DOES_NOT_EXIST markers) now expires after 3 days. Bounds the cache against stale "X" markers that survived a remesh and against fragments whose path/offset changed under them. New POST /table/<table_id>/clear_manifest_cache (no node_id) wipes every cached fragment under the graph's namespace via SCAN+DEL, batched so a large fragment set doesn't block redis. Reuses the existing admin permission decorator used by the per-node clear route.
profile() gains three opt-in kwargs (with_memory, with_rss, counters)
that capture tracemalloc Python heap peak, psutil RSS peak via a 50 ms
sampler thread, and per-call deltas on a caller-supplied counter dict.
Results land in a new self.blocks list of BlockMetrics; metrics_report()
prints them as a fixed-width table.
Default-off so existing profile("name") callers see no behavior change.
Sampler and tracemalloc state are torn down on context exit.
Adds pychunkedgraph.graph.dry_run.is_dry_run() (strict "1" match on PCG_DRY_RUN) and wires it through every write site in the SV-split edit flow: - operation._write short-circuits via is_dry_run() instead of the cg.meta.READ_ONLY @Property (which stays in place for unrelated callers). - A new GraphEditOperation._persist_rows centralizes BT mutation writes from _apply and _create_log_record paths; the three external cg.client.write callsites in operation.py collapse to one-liners. - ocdbt.write_seg_chunks returns early so OCDBT delta writes are skipped. - RootLock / IndefiniteRootLock / L2ChunkLock / IndefiniteL2ChunkLock __enter__ and __exit__ gain an early-return placed before the existing privileged_mode check, so dry-run wins regardless of replay state. Replaces the harness-side monkey-patch design with a single global toggle: setting PCG_DRY_RUN=1 makes any edit operation a read-only dry run. Default-off; "true" / "0" / "" / unset all leave production behavior unchanged.
split_profile drives an SV split end-to-end under PCG_DRY_RUN=1, records per-stage timing + memory + IO into a HierarchicalProfiler, and snapshots each stage's intermediate value into SplitInputs so the user can replay a single heavy stage standalone after editing its source. The full run is pickled under <tempfile.gettempdir()>/pcg_split_profile/<graph_id>/<payload_sha>/ with a collision check on save and the cache path echoed at the end of run_split_profile so the run can be copied for retention before the OS tmp policy reclaims it. Adds dry_run_scope() context manager next to is_dry_run() so the harness (and any future caller) sets/restores PCG_DRY_RUN through one helper rather than poking os.environ.
metrics_report now prints a compact human-readable table — no banner, auto-scaled time (ms / s) and bytes (KB / MB / GB), thousands-separator counters, signed rss delta, and all-zero counter columns hidden. Format helpers _fmt_time / _fmt_bytes / _fmt_count live module-private. print_report stays unchanged for timing-only callers. split_profile.profile_call(cg, name, fn, *args, **kwargs) wraps one callable under dry_run_scope + count_io + profiler.profile, returning (profiler, result). Lets the user replay a single stage standalone after editing its source — pairs with load_run() to iterate on one heavy stage without redoing the full flow. ocdbt.ensure_fork_synced honors PCG_DRY_RUN — a defensive gate on the one OCDBT write outside the gated edit flow. Steady-state no-op for graphs with prior edits.
Inline get_profiler().profile() blocks now wrap the suspect operations inside split_supervoxel, update_edges, and split_supervoxel_helper — seg_read, binary_seg, geodesic_split, build_coords, kdtrees, get_subgraph, edge_dedup, get_roots_inner, _get_new_edges, connect_seeds, split_growing, etc. Each captures wall time, Python heap peak, RSS peak, and IO counters as one BlockMetrics row. HierarchicalProfiler.__init__ gains with_memory / with_rss instance defaults (both True) and a default_counters attribute, so inline call sites stay short — just `with _prof.profile(name):`. run_split_profile drives the global profiler (reset + enable on entry, disable on exit) so all inline blocks land in one profiler.blocks list. Drops the previous top-level profile() wrappers around the mincut / plan / split_supervoxels / remove_edges stages; the wrappers stay as capture-only hooks for SplitInputs replay. New overwrite=False kwarg on run_split_profile; when True, wipes the cache dir before starting. Profiler stays a no-op when not explicitly enabled.
…a fastremap build_coords_by_label now delegates to fastremap.point_cloud — a single C++ pass that emits uint16 coords grouped by label, treating 0 as background. Before the call, update_edges masks new_seg in place with fastremap.mask_except(new_seg, new_ids ∪ all_edge_svs) so point_cloud only emits coords for labels the downstream readers (kdtrees + coords_by_label.get in _get_new_edges) will actually query. update_edges is reordered so get_subgraph and get_roots_inner run first to supply that label set. The caller does not read seg after update_edges returns, so the in-place mutation is safe. remap_to_root drops fastremap.remap(..., in_place=False) — which allocated a full-size shadow array of seg just to compare against the target root — for fastremap.mask_except(seg, root_labels, in_place=True). Same filter, no shadow. binary_seg replaces np.isin(seg, supervoxel_ids) — which allocates int-array auxiliaries the size of seg — with a per-SV `==` / `|=` loop. Computed over seg[voxel_overlap_crop] directly so the boolean is only as big as the region split_supervoxel_helper consumes. Empty supervoxel_ids returns all-False, matching the prior np.isin contract. split_supervoxel computed fastremap.unique(seg) twice — once for cut_supervoxels narrowing, once in get_roots — on the same unmutated seg. Hoisted into a single seg_unique block whose result both readers reuse, so the sort buffer is paid once at low rss residual. Harness-side: metrics_report and print_report no longer gate on self.enabled (the harness disables the profiler before printing, which was silencing the reports). run_split_profile saves the cache before disabling. The metrics table adds cum_wall (from the first profile() block since reset), rss_start, rss_peak columns; drops the noisy ocdbt_reads counter.
_resolve_label3_touching_vectorized decided each stray label-3 component's side by dilating the label-1 and label-2 masks over the whole volume and intersecting with the component mask — two full-volume binary_dilations that dominated enforce_cc. The label-3 voxels occupy a small region, so dilate the side masks only inside the label-3 bounding box, padded by one voxel so the dilation still reaches the bordering side voxels. The per-component border counts are identical to the full-volume dilation, so the split result is unchanged; only the dilated region shrinks to the label-3 neighborhood. Enforcement stays at full resolution after upsampling; moving it to the downsampled grid fragments the labels under the foreground mask and breaks the single-connected-component guarantee _update_chunks relies on (covered by test_single_cc_guarantee_with_downsample). Profiler-side: metrics_report lays the blocks out as a tree (one column per nesting level, parent-first). Inline profile() blocks remain only on the steps still time- or memory-suspect (>=2s wall or >=~0.5GB rss); the cheap steps earlier profiling cleared are unwrapped to keep the report focused.
03d8c3d to
b3887d4
Compare
The SV-split path called cg.get_roots() without a timestamp in both split_supervoxel (for building sv_root_map) and update_edges (for mapping edge SVs to roots). Every other operation in the codebase pins these reads to parent_ts — the pre-op read snapshot passed to execute() — so a stuck-op replay sees the same graph state as the original attempt. Threads parent_ts from MulticutOperation._apply() → split_supervoxels → split_supervoxel → update_edges, and passes it to both get_roots calls. Normal first-attempt runs pass parent_ts=None, which resolves to current time as before; only replay paths (stuck_ops.replay) supply a pinned snapshot.
Group the supervoxel-split modules under graph/sv_split/ (cutting, edges, edits, profile, debug) and move the five splitting design docs into the same package as README/algorithm/design/edges/recovery, so the split flow and its documentation live behind one boundary instead of being scattered across graph/, debug/, and docs/. sv_split/__init__ exposes only cutting/edges/edits; profile and debug stay out of it to keep their app-layer (Flask) and dry-run deps off the core import path. Promote the hierarchical profiler out of debug/ into a standalone profiler/ package: hierarchical (HierarchicalProfiler, BlockMetrics, RSS sampler), utils (formatters), main (global instance + accessors). operation.py calls plan_sv_splits / split_supervoxels via the sv_split.edits module object rather than a direct symbol import, so profile.py's capture wrappers patch the live attribute at call time. Repoint stuck_ops.py's docstring at the relocated recovery doc.
e212d23 to
6cd5e2e
Compare
Seed snapping ran a cKDTree over every voxel of the supervoxel, twice per split (~47s, ~40% of a split), because the caller disables the boundary-only and downsample candidate reductions for safety. Add a use_bbox lever to snap_seeds_to_segment that restricts the candidate scan to a box around the seeds — the nearest true voxel is always near a seed, so the snapped result is identical while the KDTree shrinks by orders of magnitude. A grow-until-nonempty loop guarantees the window always contains the true nearest voxel, so the box size is never a correctness constraint. Both split_supervoxel_helper call sites opt in. Also: add a plain-terms algorithm overview, and correct the README bbox description (the read/cut region is the seed envelope plus a one-chunk margin, not the rep's full piece-set envelope).
The stray-fragment resolver's first action is a full-volume connected-components labeling of label-3 voxels, after which it early-returns when there are none. Splits that produce no strays still paid for that whole-volume scan just to find nothing to do. Guard the resolver behind a label-3 voxel count so the CC labeling runs only when a fragment was actually stranded; the count is created solely by the single-CC enforcement above, so an empty check is exactly the resolver's own no-op condition. Route cutting.py's per-stage logging through the package logger (logger.debug, with the label-3 count at logger.note) instead of the verbose-gated print helper, and drop the now-unused verbose parameter from the split functions and their callers.
The supervoxel-split step copied the full overlap crop so the fresh IDs lived in a buffer separate from the later in-place masks. Write them straight into seg's crop view instead, and capture the OCDBT write payloads plus the source/sink id lookups from that crop before masking — so they keep the unmasked neighbour SV IDs the write must preserve. The root mask then runs only for update_edges, which reads only its own wanted-label set and so is unaffected by the masked neighbours. Peak memory drops by roughly the crop size; the only remaining copies are per-changed-chunk, proportional to the edit.
…ot mask
Make pinky (nonzero voxel_offset) SV splits work end to end. Three
coupled changes plus the seg_unique optimization, all in the SV-split
path:
- chunks_overlapping_bbox gains an `origin` arg; the SV-split callers
pass meta.voxel_bounds[:, 0] so the chunk lattice is anchored where
PCG's chunks actually start. The origin-0 lattice put new supervoxel
IDs in the wrong chunk on any offset dataset (and could split a grid
block across two real chunks); zero-offset graphs were unaffected,
which is why this stayed hidden.
- Remove the root mask in _route_edges_and_rows. update_edges already
trims seg to {new ids} ∪ {edge-partner svs} internally; the outer
mask, keyed on pre-split ids, deleted the freshly written new-SV
voxels from seg once those ids were written in place — update_edges
then raised KeyError building their kdtrees.
- seg_unique computes the distinct SV set per chunk over the segment-id
field, restoring the chunk bits before the final union, instead of a
full-uint64 unique over the whole read.
Add _assert_same_chunk: every new supervoxel must share a chunk with the
supervoxel it split from. split_supervoxel decomposed into _SplitCtx /
stage helpers. profile.py saves the failing traceback to the run dir
(load_traceback to read it back) and logs only the final exception line.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Supervoxel splits in OCDBT-backed segmentations are now atomic under worker failure, with an operator tool for recovering stuck ops.
Locks
Two indefinite locks, both holding their cells on exception:
On unclean exit, both hold and the L2 scope is written to the op-log row.
Stuck-op signal
Non-empty scope field on the op-log row. Covers both
CREATED(worker crash) andFAILED(caught exception) paths; status alone missedFAILED. Clean exit clears it.Cleanup-then-replay
Concurrent ops on non-overlapping chunks advance OCDBT during the outage, so recovery can't use one pinned view.
OCDBT time-travel
Tensorstore
versionfield in the spec (integer generation or ISO-8601Ztimestamp, interpreted ascommit_time ≤ T). Threaded aspinned_at. Pinned handles are read-only.Reader contract
dataset_infonow publishes the full kvstore spec (kvstack layers + OCDBT config + data prefixes) instead of path fragments, so readers pass it verbatim to tensorstore. The multi-scale open asserts fork presence, failing with a clear message instead of a tensorstore internal.