Summary
Online, an encrypted bucket fails to list when its forest references a HAMT node (or manifest page) that a server-side ipfs repo gc orphaned: the node's storage-key -> CID mapping is gone from the gateway index (gateway returns 404 NoSuchKey for the storage-key), even though the node's block still exists in IPFS by CID. The forest walk fetches that node by storage-key, gets a 404, and the entire walk aborts -> the bucket won't list.
The data is not lost: the manifest carries each node's content CID (walkable-v8 LinkV2 pointer), and the block is fetchable by CID. Offline mode already recovers (the walker's fetch falls back to a gateway-race-by-CID), which is why forcing the app offline lists the affected buckets. The gap is that the online path doesn't engage that recovery on a 404.
Root cause
FulaClient::get_object_with_offline_fallback_known_cid (fula-client/src/client.rs) engages the verified gateway-race-by-CID only on is_master_unreachable_error, never on a 404:
match self.get_object_with_metadata(bucket, key).await {
Ok(result) => { /* cache + return */ }
Err(e) if is_master_unreachable_error(&e) => { // only master-DOWN races the CID
self.try_offline_fallback_with_cid_hint(bucket, key, cid_hint, e).await
}
Err(e) => Err(e), // a 404 dies here -> walk aborts
}
is_master_unreachable_error deliberately excludes legitimate S3 errors (NoSuchKey/404), so an orphaned-node 404 never triggers the CID-race when the master is up.
Fix
Add a forest-scoped wrapper get_forest_object_known_cid that engages the verified CID-race also on a 404 (e.is_not_found()), and route the two forest-infrastructure callers through it:
EncryptedClient::load_manifest_pages (manifest pages)
S3BlobBackend::get_with_cid_hint (HAMT nodes)
The generic get_object_with_offline_fallback_known_cid keeps its existing propagate-404 invariant, and its test test_cid_hint_master_4xx_propagates_without_fallback is left unchanged. Only forest infrastructure opts into 404-recovery (design per advisor review: "the invariant is the asset" -- don't let a future non-forest caller silently inherit hide-404 behavior).
Why this is safe (the CID is the capability)
fetch_verified content-checks the fetched bytes against the manifest-supplied CID, so a gateway cannot serve different/forged bytes.
- The node store re-decrypts (AEAD) and recomputes the storage-key + page-id/seq, binding the bytes to this walk.
- The CID comes from the freshly-decrypted, authoritative manifest, so it is the current node; stale-root reads are already rejected by the seq/version guards.
- Worst case is a benign consistency window, never unauthorized or forged data.
Scope
This is the native path (cfg(not(target_arch = "wasm32"))). The wasm S3BlobBackend::get_with_cid_hint degrades to plain get() (no gateway pool on web), so the web client (pinning-webui) is NOT fixed by this and needs separate work -- it also currently lists via the HEAD-per-object path rather than the forest walk.
Self-heal (deliberately out of scope here)
The recovered node is not re-uploaded on the read path (avoids PUT-on-read permission/latency/consistency issues, per advisor review). The gateway 404 therefore persists and is re-raced on each read until the user's next forest write (flush) re-pins the node and restores the index mapping. Chosen tradeoff.
Verification
- New unit/integration test:
S3BlobBackend::get_with_cid_hint with master -> 404 NoSuchKey + a gateway serving the block by CID -> must return the block. Fails before the fix, passes after. The existing propagate-404 test is retained.
- E2E (not committed; uses real credentials): native fula-client with
gateway_fallback_enabled, a real walkable-v8 bucket with a known orphaned node, master up -> bucket lists end-to-end.
Review
Concept and design reviewed by independent advisors (Gemini + Copilot): approved. Design (scoped wrapper, keep the existing invariant + its test, add a new test) is per their recommendation. (Codex + Cursor were unavailable at review time.)
Summary
Online, an encrypted bucket fails to list when its forest references a HAMT node (or manifest page) that a server-side
ipfs repo gcorphaned: the node's storage-key -> CID mapping is gone from the gateway index (gateway returns404 NoSuchKeyfor the storage-key), even though the node's block still exists in IPFS by CID. The forest walk fetches that node by storage-key, gets a 404, and the entire walk aborts -> the bucket won't list.The data is not lost: the manifest carries each node's content CID (walkable-v8
LinkV2pointer), and the block is fetchable by CID. Offline mode already recovers (the walker's fetch falls back to a gateway-race-by-CID), which is why forcing the app offline lists the affected buckets. The gap is that the online path doesn't engage that recovery on a 404.Root cause
FulaClient::get_object_with_offline_fallback_known_cid(fula-client/src/client.rs) engages the verified gateway-race-by-CID only onis_master_unreachable_error, never on a 404:is_master_unreachable_errordeliberately excludes legitimate S3 errors (NoSuchKey/404), so an orphaned-node 404 never triggers the CID-race when the master is up.Fix
Add a forest-scoped wrapper
get_forest_object_known_cidthat engages the verified CID-race also on a 404 (e.is_not_found()), and route the two forest-infrastructure callers through it:EncryptedClient::load_manifest_pages(manifest pages)S3BlobBackend::get_with_cid_hint(HAMT nodes)The generic
get_object_with_offline_fallback_known_cidkeeps its existing propagate-404 invariant, and its testtest_cid_hint_master_4xx_propagates_without_fallbackis left unchanged. Only forest infrastructure opts into 404-recovery (design per advisor review: "the invariant is the asset" -- don't let a future non-forest caller silently inherit hide-404 behavior).Why this is safe (the CID is the capability)
fetch_verifiedcontent-checks the fetched bytes against the manifest-supplied CID, so a gateway cannot serve different/forged bytes.Scope
This is the native path (
cfg(not(target_arch = "wasm32"))). The wasmS3BlobBackend::get_with_cid_hintdegrades to plainget()(no gateway pool on web), so the web client (pinning-webui) is NOT fixed by this and needs separate work -- it also currently lists via the HEAD-per-object path rather than the forest walk.Self-heal (deliberately out of scope here)
The recovered node is not re-uploaded on the read path (avoids PUT-on-read permission/latency/consistency issues, per advisor review). The gateway 404 therefore persists and is re-raced on each read until the user's next forest write (flush) re-pins the node and restores the index mapping. Chosen tradeoff.
Verification
S3BlobBackend::get_with_cid_hintwith master ->404 NoSuchKey+ a gateway serving the block by CID -> must return the block. Fails before the fix, passes after. The existing propagate-404 test is retained.gateway_fallback_enabled, a real walkable-v8 bucket with a known orphaned node, master up -> bucket lists end-to-end.Review
Concept and design reviewed by independent advisors (Gemini + Copilot): approved. Design (scoped wrapper, keep the existing invariant + its test, add a new test) is per their recommendation. (Codex + Cursor were unavailable at review time.)