Skip to content

MES-710: Add an async icache#31

Merged
markovejnovic merged 57 commits intomainfrom
marko/mes-710-async-icache
Feb 11, 2026
Merged

MES-710: Add an async icache#31
markovejnovic merged 57 commits intomainfrom
marko/mes-710-async-icache

Conversation

@markovejnovic
Copy link
Collaborator

No description provided.

@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Feb 10, 2026

Mesa Description

This pull request introduces a major architectural refactoring of the MesaCloud filesystem, replacing the synchronous inode cache (ICache) with a new, fully asynchronous implementation (AsyncICache). This fundamental change improves concurrency and performance by eliminating blocking I/O during filesystem operations. The new design is built around a generic resolver pattern, decoupling the cache from the logic of how inode metadata is fetched.

Key architectural changes include:

  • Asynchronous Inode Cache (AsyncICache): A new, thread-safe cache that manages the lifecycle of Inode Control Blocks (ICBs) asynchronously. It coalesces concurrent requests for the same inode, preventing redundant network calls and I/O.
  • Resolver Pattern (IcbResolver): Introduced a generic IcbResolver trait to abstract away the data-fetching logic. Filesystem layers (RepoFs, OrgFs, MesaFS) now provide their own resolver implementations, making the system more modular and testable.
  • CompositeFs for Delegation: A new CompositeFs struct was introduced to centralize and deduplicate the logic for managing nested filesystems (e.g., an Org containing multiple Repos). MesaFS and OrgFs are now simplified wrappers around CompositeFs, streamlining inode/file handle translation and FUSE operation delegation.
  • Dedicated FileTable: File handle allocation has been moved from the cache into a new, dedicated FileTable component, improving separation of concerns.
  • Observability: Optional OpenTelemetry (OTLP) tracing support has been added to provide better insight into the performance and behavior of the new concurrent system.

Description generated by Mesa. Update settings

@markovejnovic markovejnovic force-pushed the marko/mes-710-async-icache branch from 3482369 to 9f3fd04 Compare February 10, 2026 18:28
@markovejnovic markovejnovic force-pushed the marko/mes-710-async-icache branch from 9f3fd04 to 4f222d5 Compare February 10, 2026 18:29
Fix all clippy warnings and dead code warnings introduced during the
async icache migration (Tasks 1-7). Zero warnings from cargo clippy.
Also includes prerequisite: make AsyncICache::contains synchronous
and add contains_resolved method.
@markovejnovic markovejnovic force-pushed the marko/mes-710-async-icache branch from 38d1b32 to 1ed29f8 Compare February 11, 2026 03:10
@markovejnovic markovejnovic marked this pull request as ready for review February 11, 2026 03:10
Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of cc7ead7...1ed29f8

Analysis

  1. The new architecture relies heavily on each resolver's needs_resolve semantics, creating a potential point of failure if implementations have inconsistent behaviors.

  2. Premature marking of entries as "resolved" in pre-populated attrs or children (e.g., in readdir) would prevent the resolver from being invoked again, potentially leading to stale or incomplete data.

  3. The bidirectional bridges in CompositeFs are now the single source of truth for inode mappings, creating a tight coupling where inconsistencies between ICB eviction and bridge clearing could leak stale pointers.

  4. Long-running mounts are particularly vulnerable to memory leaks if the ICBs and bridges fall out of sync, requiring careful coordination between these components.

Tip

Help

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

18 files reviewed | 2 comments | Edit Agent SettingsRead Docs

…esolver

Replace unreachable!() with proper error handling in RepoResolver::resolve when
stub is None. Now returns LookupError::InodeNotFound instead of panicking.
readdir was caching FileAttr::RegularFile { size: 0, blocks: 0 } for
every file child. Since needs_resolve() returns false for files with
any attr set, subsequent lookups via get_or_resolve would return the
stale size=0 instead of calling the resolver for the real file size.

Only cache directory attrs in readdir. File attrs are left as None so
that lookup triggers the resolver.
delegated_forget was propagating forget(inner_root_ino, nlookups) to
the inner filesystem. The inner root's rc=1 is an initialization
invariant independent of outer FUSE lookups. When nlookups >= 1, the
inner root was evicted, making the inner filesystem non-functional
on re-access (readdir/lookup would fail with InodeNotFound).

Now child-root inodes (those in child_inodes) skip inner forget
propagation entirely.
evict_zero_rc_children called AsyncICache::forget directly, bypassing
CompositeFs cleanup. The inode_to_slot and bridge inode_map entries for
evicted inodes were never removed, causing unbounded memory growth.

Now returns Vec<Inode> of evicted inodes so delegated_readdir can clean
up the associated CompositeFs state.
attr_backward silently returned the raw inner inode number when a
bridge mapping was missing. Now logs a warning so the issue is visible
in traces.
- Document ensure_child_ino TOCTOU invariant (safe due to &mut self)
- Add MAX_DEPTH=1024 cycle protection to build_repo_path and path_of_inode
- Warn on bridge reset in register_repo_slot (leaks inner icache entries)
@markovejnovic markovejnovic merged commit b6f03ac into main Feb 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant