Skip to content

Unified storage #20

@cgwalters

Description

@cgwalters

Unified storage

The goal

A bootc system has, historically, kept the OS image in one place (ostree, or
the composefs object store) and any other container images — Logically Bound
Images — in a separate bootc-owned containers-storage. "Unified storage" is
about collapsing that split so that the same on-disk layer data is shared
across every consumer
, with each consumer seeing it through its own lens:

  • containers-storage so you can trivially podman run the booted (or any
    deployed) image, and so LBIs share layers with the base.
  • the composefs object store (for the composefs backend)
  • the ostree repo (for ostree-backed deployments)

Key design goal: On a reflink-capable filesystem (xfs, btrfs) we can deduplicate all of the content; we just have 3 different ways to represent metadata.

Concrete user-visible wins this unlocks:

  • podman run <the image you booted> just works.
  • zstd:chunked and other registry-side dedup benefit both the OS and LBIs.
  • Locally-built derived images can be booted without a second fetch.

Status at a glance

Status below is as of PR #2205, which is in flight and not yet merged to
main. Items marked ✅ are implemented on that branch; "🚧 planned" / "🔭
future" describe work that follows.

Capability Backend State
Three-store pipeline (cstor → composefs → ostree, reflinked) ostree ✅ implemented
Unified pull: registry → containers-storage → composefs (reflink) composefs-native ✅ implemented (PR #2205)
Non-unified → unified migration: backfill existing deployments into containers-storage composefs-native ✅ implemented (PR #2205)
bootc image set-unified onboarding both
bootc image sync (manual reconcile) both
bootc internals fsck images [--repair] both
Install-time opt-in ([install.storage]) both
Reflink-sharing of backfilled layer blobs (no recompress on migration) composefs-native 🚧 planned
podman run integration via additional-image-store both 🚧 planned
Native Rust fetcher (replace podman pull stage) both 🔭 future
Global composefs object store shared with podman/flatpak 🔭 future

Unified storage is opt-in today. It's enabled either at install time via the
[install.storage] config key, or post-install via bootc image set-unified.
Whether it's active on a given system is recorded in composefs/bootc.json.

Once enabled, the steady-state pull on composefs-native is the forward reflink
path
— every image is pulled into containers-storage first and then reflinked
into the composefs object store, exactly as on the ostree backend. The reconcile
machinery described below is not the steady-state pull; it is the one-time
bridge for systems that were running non-unified (image only in the composefs
repo) before the flag was set.


Architecture

Three stores, shared blocks

flowchart LR
    subgraph disk["Shared on-disk layer data (FICLONE extents)"]
      CS["containers-storage\n/sysroot/.../bootc/storage\n(overlay; config digest = image ID)"]
      CFR["composefs object store\n/sysroot/composefs/objects/\n(SHA-512 fsverity)"]
      OST["ostree bare repo\n/sysroot/ostree/repo/\n(SHA-256 + per-inode metadata)"]
    end
    CS -. reflink .- CFR
    CFR -. reflink .- OST

    POD["podman run\nLBIs"] --> CS
    BOOT["composefs-boot\nEROFS overlay"] --> CFR
    DEPLOY["deploy / rollback\nfsck / deltas"] --> OST
Loading

Why composefs sits in the middle, and why reflink rather than hardlink between
composefs and ostree, is documented in the crate::store module docs — the
short version: composefs is content-addressed by raw bytes, but ostree stores
SELinux labels per-inode, so two files with identical content but different
labels must be distinct inodes (ruling out hardlinks) while still sharing
extents (reflink).

ostree backend: forward pipeline (implemented)

When unified storage is enabled, ostree-backend upgrades/switches route through
pull_via_composefs:

flowchart LR
    R[OCI registry] -->|"① skopeo pull"| CS["bootc containers-storage"]
    CS -->|"② composefs_oci::pull (ZeroCopy FICLONE)"| CFR["composefs object store"]
    CFR -->|"③ import_from_composefs_repo (FICLONE)"| OST["ostree bare repo"]
Loading

On ext4 (no reflinks) the enabled-with-copy config value permits a byte-copy
fallback at each arrow.

composefs-native backend: unified pull (implemented)

Once unified storage is enabled, the composefs-native pull takes the same
forward path
as the ostree backend: the image is pulled into
containers-storage first and then reflinked into the composefs object store.
This is pull_composefs_unified, selected by the use_unified flag in
pull_composefs_repo:

flowchart LR
    R[OCI registry] -->|"① podman/skopeo pull"| CS["bootc containers-storage"]
    CS -->|"② composefs_oci::pull (ZeroCopy FICLONE)"| CFR["composefs object store"]
Loading

So in steady state the booted image is already in containers-storage as a
side effect of the pull — podman run works, and fsck images has nothing to
repair. (Before unified storage is enabled, the legacy pull_composefs_direct
path fetches straight into composefs, skipping containers-storage.)

Migration: non-unified → unified (the reconcile pass)

The one case where containers-storage is missing an image is the transition: a
system that ran non-unified pulled its deployments straight into composefs,
so its existing booted/rollback/staged images were never written to
containers-storage. Enabling unified storage must backfill them. That's what
reconcile does, taking the reverse direction (composefs → containers-storage):

flowchart LR
    BL["Bootloader entries\n/boot/loader/entries/*.conf"] -->|"list_bootloader_entries\n+ get_imginfo"| PS["Pinned set\n(config digests)"]
    PS -->|"missing from containers-storage?"| CFR
    CFR["composefs object store\n(SplitStream per layer)"] -->|"export_composefs_to_oci_dir\n(byte-exact replay)"| OCI["temp OCI dir"]
    OCI -->|"skopeo copy → import_from_oci_dir"| CS["bootc containers-storage"]
Loading

Two design points make this correct:

  1. The bootloader entries are the source of truth for which images are
    "pinned" (i.e. still bootable). The reconcile set is derived from BLS entries
    via list_bootloader_entries/get_imginfo, not the ostree origin state
    dirs, which can drift.

  2. The OCI config digest is preserved exactly. containers-storage uses the
    config digest as its image ID, and that's what fsck images matches on. The
    export replays each layer's uncompressed tar byte-for-byte from the composefs
    splitstream (SplitStreamReader::cat(), a tested roundtrip that preserves
    diff_ids) and writes the original config JSON verbatim. Layer blobs are
    re-gzipped (so the manifest digest changes), but that's irrelevant to
    identity — only the config digest matters.

Onboarding and consistency

bootc image set-unified pulls the booted image into containers-storage,
backfills the rest via reconcile, and writes the bootc.json flag last
so a failure leaves the system un-marked rather than half-migrated:

flowchart TD
    SU["bootc image set-unified"] --> PB["pull booted image → containers-storage"]
    PB --> REC["reconcile_unified_storage\n(backfill rollback/staged)"]
    REC --> META["write composefs/bootc.json\n(flag written last)"]
    SYNC["bootc image sync"] --> REC2["reconcile_unified_storage"]
    FSCK["bootc internals fsck images"] --> INSP["inspect (read-only)"]
    FSCKR["fsck images --repair"] --> REC2
Loading

set-unified and sync are no-ops where bootc.json is absent (unified
storage not enabled); fsck images reports "unified storage not enabled" and
exits cleanly.


Direction / open work

The forward reflink pull is the steady-state path on both backends today, and
the migration reconcile makes the non-unified → unified transition work. The
remaining gaps are about efficiency and reach, not correctness:

  • Reflink-sharing of backfilled blobs. The steady-state forward pull
    reflinks layers between stores. But the migration reconcile (reverse path)
    currently re-gzips layer blobs when exporting from composefs, so the
    backfilled copy in containers-storage doesn't share extents with composefs.
    A native splitfdstream-based writer will eliminate the recompression. There
    are TODO(unified-storage) markers on the export/import functions.
  • podman run integration. Wiring bootc's store as an additional image
    store in storage.conf so podman run <booted image> resolves without an
    explicit --storage-opt.

Further out:

  • Global composefs object store ("composefs-as-storage"): podman's composefs
    backend writes objects directly to /sysroot/composefs, so podman, flatpak,
    and bootc share a single content-addressed pool. See bootc composefs-native backend #1190.

Open questions

  • Pull progress reporting through the podman/skopeo stage (progress-fd issue tracker #1016).
  • Cross-device copy behavior when the install target and source differ.
  • Migrating the remaining fixme_skip_if_composefs integration tests (e.g.
    test-switch-to-unified) onto the derived-image switch technique now that the
    composefs path passes.

Dependencies

History

  • Original Phase-0 proposal (2022): pull into c/storage via podstorage, import
    into ostree via the containers-storage: transport, then composefs-rs. Draft
    PR WIP: Use podman pull to fetch containers #215.
  • 2026-04 plan: composefs-native unified forward pull
    (pull_composefs_unified), GC protecting live + LBI images, and an
    additional-image-store podman run integration. That plan described the
    forward reflink direction as the target; PR install: reflink-capable filesystems with ostree pull through composefs first #2205 delivers the reconcile path
    first (which the forward pull will later subsume), plus the consistency
    tooling (sync, fsck images).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/unified-storageThis issue relates to unified storage https://github.com/bootc-dev/bootc/issues/20enhancementNew feature or requesttriagedThis issue appears to be valid
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions