Skip to content

fix(hardlink)!: use (device, inode) pair as deduplication key#347

Draft
Copilot wants to merge 4 commits intomasterfrom
copilot/check-nlink-hardlink-deduplication
Draft

fix(hardlink)!: use (device, inode) pair as deduplication key#347
Copilot wants to merge 4 commits intomasterfrom
copilot/check-nlink-hardlink-deduplication

Conversation

Copy link
Contributor

Copilot AI commented Mar 10, 2026

  • Add (device, inode) pair as deduplication key
  • Add test proving device number is used in deduplication key
  • Fix Reflection doc comment: inode numbers are NOT guaranteed unique (only (dev, ino) pairs are)
  • Add deterministic tie-breaker sort by (ino, dev) before dropping dev in From<HardlinkList>
  • Fix misleading comment: duplicate inode from JSON is rejected (DuplicatedInode), not silently merged

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ross-filesystem false deduplication

Co-authored-by: KSXGitHub <11488886+KSXGitHub@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate nlink usage for hardlink deduplication fix(hardlink): use (device, inode) pair as deduplication key Mar 10, 2026
@KSXGitHub KSXGitHub changed the title fix(hardlink): use (device, inode) pair as deduplication key fix(hardlink)!: use (device, inode) pair as deduplication key Mar 10, 2026
Copy link
Owner

@KSXGitHub KSXGitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Very cool to point this out. But can you check whether you actually use the device number to in deduplication?

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Performance Regression Reports

commit: fe98f44

There are no regressions.

…on key

Co-authored-by: KSXGitHub <11488886+KSXGitHub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 10, 2026

@copilot Very cool to point this out. But can you check whether you actually use the device number to in deduplication?

Yes, the device number is used. HardlinkList is keyed on InodeKey { dev: u64, ino: InodeNumber } (a private struct in hardlink_list.rs), so two files with the same inode number but different device numbers hash to different map entries and are never merged.

Added a dedicated test same_ino_on_different_devices_are_treated_separately (commit 38aacab) that inserts four files — two sharing (dev=1, ino=100) and two sharing (dev=2, ino=100) — and asserts the list has exactly 2 entries with 2 paths each, confirming cross-device isolation.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes hardlink deduplication across multiple filesystems by switching the internal key from inode-only to a (device, inode) pair, preventing false merges when scanning multiple roots on different devices while keeping the public API and JSON schema unchanged.

Changes:

  • Introduce an internal InodeKey { dev, ino } and key the HardlinkList map by it.
  • Thread dev through hardlink recording and update iterators accordingly.
  • Keep reflection/JSON ino-only (with a documented limitation) and update tests to cover same-inode/different-device behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/hardlink/hardlink_list.rs Adds InodeKey and rekeys the internal DashMap; updates add() to accept dev.
src/hardlink/hardlink_list/iter.rs Updates iterator key types to InodeKey; Item::ino() returns key.ino.
src/hardlink/hardlink_list/reflection.rs Strips dev when producing reflection; uses dev=0 placeholder on load; updates docs.
src/hardlink/aware.rs Extracts dev from metadata and passes it into HardlinkList::add().
src/hardlink/hardlink_list/test.rs Updates test table/call sites for dev and adds a cross-device same-inode test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@KSXGitHub
Copy link
Owner

@copilot The following reviews:

Do you agree with these reviews?

If you do, implement them.

If you don't, provide a reason why you disagree and skip it.

…TryFrom

Co-authored-by: KSXGitHub <11488886+KSXGitHub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 10, 2026

@copilot The following reviews:

All three reviews are correct and have been implemented in commit eb92bdc:

  1. Doc comment — updated to state that only (device, inode) pairs are unique, not inode numbers alone, and explicitly notes the multi-filesystem limitation.

  2. Deterministic sort tie-breakerFrom<HardlinkList> now sorts by (ino, dev) before stripping dev, so entries that share an inode number across filesystems get a stable, reproducible order in JSON output.

  3. Misleading "incorrectly merged" comment — corrected to accurately describe the actual behaviour: such duplicates cause ConversionError::DuplicatedInode and are rejected, not silently merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants