Skip to content

Initial LTFS order awarness functionality#1001

Open
hugo-hur wants to merge 8 commits into
RsyncProject:masterfrom
hugo-hur:ltfs-aware
Open

Initial LTFS order awarness functionality#1001
hugo-hur wants to merge 8 commits into
RsyncProject:masterfrom
hugo-hur:ltfs-aware

Conversation

@hugo-hur

Copy link
Copy Markdown

Motivation

I'm running several LTFS formatted LTO tapes for incremental backups. Backupping to tape has been working fine with rsync (using the --whole-file and -t options) but reading from the tape is basically impossible due to the reasons stated below.

Summary

Adds a --ltfs option that makes rsync read an LTFS-tape source in physical
block order
instead of name order, so a restore streams the tape forward in a
single pass instead of seeking back and forth ("shoe-shining").

On an LTFS volume, each file's metadata — name, size, mtime, and the block where
its data begins — lives in the volume index and is cheap to read. File content,
however, requires physically positioning the tape. rsync's normal name-sorted
traversal bears no relation to physical layout, so an unordered restore can take
many times longer than one forward pass.

What it does

  • The sender reads each file's starting block from the ltfs.startblock
    virtual xattr (also honoring a user.ltfs.startblock alias) and transmits it
    as a new optional 64-bit file-list extra.
  • The generator drives the transfer in ascending start-block order. Entries
    with no start block (directories, symlinks) sort first, which front-loads
    creation of the destination directory tree before the bulk data read.

Behavior & safeguards

--ltfs is opinionated about the things that would silently defeat it:

  • Implies --whole-file — a delta transfer would re-read the source off
    tape anyway.
  • Forces --no-inc-recursive — the complete file list is needed before a
    read order can be chosen.
  • Refuses --checksum — it would read every byte of every file off the tape
    just to decide what to transfer.
  • Implies --times (just like --archive) — the whole benefit is that the index
    serves size+mtime for free so unchanged files are skipped without reading
    content; that only holds across runs if mtimes are preserved. A later
    --no-times overrides it but emits a warning.

Compatibility / requirements

  • Reading the start block needs xattr support, so the feature is gated on
    SUPPORT_XATTRS: the file-list extra is only registered when xattrs are
    available, and a build without them refuses --ltfs (like --crtimes)
    rather than silently no-op'ing.
  • Over ssh, both ends must have this functionality (sender reads the start block, generator
    orders the reads); --ltfs is forwarded to the server side.
  • Only the read (restore) direction is affected.

Testing

  • New testsuite/ltfs_test.py exercises ordering on an ordinary filesystem via
    the user.ltfs.startblock alias: round-trip integrity, physical-block read
    order across subdirectories (directories handled first), the implied -t /
    --no-times warning, and that --checksum is refused. It skips cleanly when
    the build lacks xattr support or the platform/filesystem can't set a user.*
    xattr (e.g. Windows, where os.setxattr is absent).
  • Full suite green; builds warning-free both with and without
    --disable-xattr-support.
  • Validated against a real LTFS volume: --ltfs read three files
    whose start blocks ran opposite to name order in correct forward order
    (verified via strace on the drive), all MD5s matched the tape, and a second
    -t pass skipped every file from index metadata alone — zero content reads.

Not in scope (possible follow-ups)

  • Add partition check to sorting for special cases where small files are persisted in the LTFS index partition instead of data partition.

hugo-hur added 8 commits June 12, 2026 01:33
Introduce the machinery for tape-aware transfers without yet changing the
read order.  On an LTFS volume each file's metadata -- including the block
where its data begins -- lives in the volume index and is cheap to read,
while reading content requires physically positioning the tape.

Add a new ltfs.c module that reads a file's starting block from the
ltfs.startblock virtual xattr (also honoring a user.ltfs.startblock alias so
the feature can be exercised on an ordinary filesystem).  The sender records
that block into a new optional 64-bit file-list extra and transmits it, so
the generator -- which on a local copy is chdir'd into the destination and
cannot read the source xattrs itself -- receives each file's physical
position.

Wire up the --ltfs option: it allocates the extra, implies --whole-file (a
delta transfer would re-read the source off tape anyway), forces
--no-inc-recursive (the full list is needed before a read order can be
chosen), and refuses --checksum (which would read the whole tape just to
decide what to transfer).  The option is propagated to the server side so
both ends agree on the file-list extra layout.

Reading the start block requires xattr support, so the whole feature is
gated on SUPPORT_XATTRS: the file-list extra is only registered when xattrs
are available, and a build without them refuses --ltfs (like --crtimes and
friends) rather than silently accepting an inert option.
With the start block of every file now available on the generator side,
order the data-read phase by ascending block instead of by name when --ltfs
is in effect.  The drive then makes a single forward streaming pass rather
than seeking back and forth ("shoe-shining"), which on a real tape can cut a
restore from hours to one pass.

Entries with no start block (directories, symlinks, anything not on tape)
sort first in their original order, which conveniently front-loads creation
of the destination directory tree before the bulk data read begins.  A NULL
ordering (ltfs off, no metadata negotiated, or an empty range) falls back to
the natural low..high sweep.
Add the option summary line and a full description covering what LTFS
ordering does, the options it implies (--whole-file, --no-inc-recursive) and
refuses (--checksum), that the fast index metadata still drives the normal
size+mtime quick check, and that only the read (restore) direction is
optimized.
Exercise --ltfs on an ordinary filesystem via the user.ltfs.startblock alias,
assigning blocks that run opposite to name order.  The test verifies
round-trip integrity, that the itemized output (rsync's observable processing
order) comes out in physical block order across subdirectories with
directories handled first, and that --ltfs --checksum is refused.  Skips
cleanly when the build lacks xattr support or the scratch filesystem rejects
a user.* xattr.
--ltfs's value is that an LTFS source serves size and mtime from the tape
index for free, letting rsync's quick check skip unchanged files without
reading their content off the tape. That only works if the destination keeps
the source mtimes: without -t, every run sees a time mismatch and re-reads
the whole tape, defeating the purpose.

Process --ltfs in the option loop (via OPT_LTFS) and set preserve_mtimes
there, the same way --archive implies -t, so a later --no-times can still
override it in option order. When it is overridden, warn that unchanged
files can no longer be skipped, rather than silently doing the slow thing.

Found while testing against a real LTO-5 LTFS volume: a bare -r --ltfs left
current mtimes on the destination, so the next run wanted to re-read every
file.
Note in the manpage that --ltfs enables mtime preservation (like --archive)
so the index quick-check can skip unchanged files across runs, and that a
later --no-times overrides it with a warning.
Verify that --ltfs preserves source mtimes without an explicit -t (so an
immediate re-run finds nothing to transfer) and that an explicit --no-times
still runs but emits a warning.
Comment thread rsync.1.md
implies [`--whole-file`](#opt) (a delta transfer would re-read the source
file anyway) and forces [`--no-inc-recursive`](#opt) (the complete file
list is needed before the read order can be chosen). It also refuses
[`--checksum`](#opt), which would read every byte of every file off the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I commented offline but I'll mention here as well:

I would not refuse --checksum since there are use cases where one would like to ensure that the contents of e.g. an offsite mirror match what is on tape. The tape drive might be even two orders of magnitude faster than the internet link to an offsite archive it makes sense to
a) ensure matching checksums
b) minimize amount of data transferred over the internet link

an extra linear scan on a fast drive is a lot smaller issue than shoeshining

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants