Skip to content

Self healing#273

Open
j-rafique wants to merge 7 commits intomasterfrom
self-healing
Open

Self healing#273
j-rafique wants to merge 7 commits intomasterfrom
self-healing

Conversation

@j-rafique
Copy link
Contributor

No description provided.

@roomote
Copy link

roomote bot commented Feb 23, 2026

Rooviewer Clock   See task

Reviewed commit 3bf7e44 ("preserve originals in rebalance"). The IsOriginal guard correctly prevents deletion of original uploads, resolving the previously flagged issue. One new indentation issue introduced by this commit.

  • Rebalance worker can delete original copies (is_original=true) without guarding against it -- consider skipping deletion for original uploads (rebalance_worker.go:176)
  • Indentation mismatch in the new IsOriginal guard and surrounding delete-confirm block -- the added lines are one tab deeper than the enclosing scope, making the brace structure misleading (rebalance_worker.go:176-189)
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces “self-healing” behavior in the P2P DHT by adding a rebalance worker and new probe/status plumbing, while also refining chain-state eligibility gating (routing vs store/write) and bootstrap behavior to better recover from transient chain/API issues.

Changes:

  • Add a rebalance background worker that scans local keys, heals under-replicated keys, and deletes redundant replicas when safe.
  • Introduce side-effect-free local key status APIs (RetrieveBatchLocalStatus, ListLocalKeysPage) plus a new network message (BatchProbeKeys) to probe key presence across peers.
  • Split chain-state eligibility into routing-eligible vs store-eligible allowlists (Active+Postponed vs Active-only), and adjust bootstrap + batch-store selection accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
supernode/adaptors/lumera.go Forces top-supernode selection to be Active-only for write/finalization eligibility.
p2p/kademlia/store/sqlite/replication.go Adds sqlite implementations for local key status + paged key listing.
p2p/kademlia/store/mem/mem.go Adds mem-store implementations for local key status + paged key listing (currently inconsistent with hex-key expectations).
p2p/kademlia/store.go Extends store interface with side-effect-free local key status APIs and defines LocalKeyStatus.
p2p/kademlia/rebalance_worker.go New rebalance worker that probes holders, heals replicas, and deletes redundant local copies after confirmation.
p2p/kademlia/node_activity.go Refines node activity handling to distinguish routing eligibility vs store/replication eligibility.
p2p/kademlia/network.go Adds BatchProbeKeys handler + timeouts/dispatch wiring.
p2p/kademlia/message.go Adds new gob-registered message types for BatchProbeKeys.
p2p/kademlia/dht_batch_store_test.go Updates expected error text for new “eligible store peers” behavior.
p2p/kademlia/dht.go Adds store allowlist, store-eligibility gating in write paths, starts rebalance worker, and adjusts batch-store candidate selection.
p2p/kademlia/bootstrap.go Splits routing vs store allowlists from chain state (Active+Postponed vs Active-only) and makes bootstrap sync more resilient with periodic retries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +176 to +189
// Preserve uploader/original copies even when this node is not in the
// current owner set and replica count is healthy.
if selfStatus.IsOriginal {
delete(deleteConfirm, keyHex)
continue
}

if rebalanceShouldTrackDeleteConfirm(isOwner, holders) {
deleteConfirm[keyHex]++
if rebalanceShouldDelete(deleteConfirm[keyHex], deleted) {
if err := s.store.BatchDeleteRecords([]string{keyHex}); err != nil {
logtrace.Error(ctx, "rebalance: local delete failed", logtrace.Fields{
logtrace.FieldModule: "p2p",
logtrace.FieldError: err.Error(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new IsOriginal guard and the re-indented rebalanceShouldTrackDeleteConfirm block are one tab deeper than the surrounding code (4 tabs vs the expected 3). Because Go compiles on brace matching, not whitespace, this won't break at runtime, but it makes the control flow misleading -- the } else { on line 197 visually looks like it belongs to a different block than it actually closes. The logtrace.Error call inside the if err != nil block also ended up at the same indentation as its opening brace. Running gofmt on this file should fix it in one pass.

Suggested change
// Preserve uploader/original copies even when this node is not in the
// current owner set and replica count is healthy.
if selfStatus.IsOriginal {
delete(deleteConfirm, keyHex)
continue
}
if rebalanceShouldTrackDeleteConfirm(isOwner, holders) {
deleteConfirm[keyHex]++
if rebalanceShouldDelete(deleteConfirm[keyHex], deleted) {
if err := s.store.BatchDeleteRecords([]string{keyHex}); err != nil {
logtrace.Error(ctx, "rebalance: local delete failed", logtrace.Fields{
logtrace.FieldModule: "p2p",
logtrace.FieldError: err.Error(),
// Preserve uploader/original copies even when this node is not in the
// current owner set and replica count is healthy.
if selfStatus.IsOriginal {
delete(deleteConfirm, keyHex)
continue
}
if rebalanceShouldTrackDeleteConfirm(isOwner, holders) {
deleteConfirm[keyHex]++
if rebalanceShouldDelete(deleteConfirm[keyHex], deleted) {
if err := s.store.BatchDeleteRecords([]string{keyHex}); err != nil {
logtrace.Error(ctx, "rebalance: local delete failed", logtrace.Fields{

Fix it with Roo Code or mention @roomote and request a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants