Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
309 changes: 309 additions & 0 deletions add-optional-cache.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/backends.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ basic operations.

Existing backends are listed below; more might come in the future.

See also :doc:`store_caching` for optional Store-level caching with a secondary backend.

posixfs
-------

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

installation
store
store_caching
backends
servers
changes
Expand Down
33 changes: 30 additions & 3 deletions docs/store.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,39 @@ API can be much simpler:
- defrag: general purpose defragmentation helper (copies blocks to new items)
- quota: return quota limit and usage (-1 if quotas not enabled or not supported)
- stats: API call counters, time spent in API methods, data volume/throughput.
- latency/bandwidth emulator: can emulate higher latency (via BORGSTORE_LATENCY
[us]) and lower bandwidth (via BORGSTORE_BANDWIDTH [bit/s]) than what is
actually provided by the backend.
- latency/bandwidth emulator: see :ref:`store-latency-bandwidth-emulator`.

Store operations (and per-op timing and volume) are logged at DEBUG log level.

See also :doc:`store_caching` for optional Store-level caching with a secondary backend.


.. _store-latency-bandwidth-emulator:

Latency and bandwidth emulator
------------------------------

The Store can emulate slower backend behavior using environment variables:

- ``BORGSTORE_LATENCY``: per-primary-call latency in microseconds (``[us]``).
- ``BORGSTORE_BANDWIDTH``: primary-call bandwidth limit in bits per second
(``[bit/s]``).

Current behavior with Store caching enabled:

- Emulation applies to **primary backend** operations.
- Emulation does **not** apply to **cache backend** operations.

This means:

- On cache miss paths (for example writethrough/mirror reads that load from the
primary backend), emulation affects the primary backend calls.
- On cache hit paths, cached reads avoid primary backend load operations and
therefore do not incur emulated bandwidth delay for the cache backend read.
- Name resolution for Store operations still uses primary backend lookups,
therefore configured latency can still be visible even when data comes from
cache.

Keys
----

Expand Down
93 changes: 93 additions & 0 deletions docs/store_caching.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Store caching
=============

The ``Store`` can optionally use a second backend as a local cache for selected
namespaces, which is especially useful when the primary backend is remote
slower or otherwise more "expensive" than the cache.

Configuration
-------------

- ``cache_url`` or ``cache_backend``: where cached data is stored
- ``cache``: mapping of namespace to cache policy

Each cache policy can be provided either as:

- ``CachePolicy(mode=..., max_age=..., size=...)``
- ``{"mode": ..., "max_age": ..., "size": ...}``

``mode`` accepts ``CacheMode`` values or string aliases:

- ``CacheMode.C_OFF`` or ``"off"``: bypass cache completely.
- ``CacheMode.C_MIRROR`` or ``"mirror"``: always read from primary backend,
but update the cache after successful primary backend reads and writes.
- ``CacheMode.C_WRITETHROUGH`` or ``"writethrough"``: read-through + write-through.
For now, only content-hash addressed namespaces should use this mode.

``max_age`` is optional and expressed in seconds since last access. The default
is ``None`` (no age limit).

``size`` is optional and expressed in bytes. It sets a per-namespace cache size
budget enforced during ``Store.close()`` by evicting least-recently-used items
until the namespace total size is within the configured budget.

Example::

from borgstore.store import Store, CacheMode

store = Store(
url="sftp://user@host/repo",
levels={"data": [2], "meta": [1]},
cache={
"data": {"mode": "writethrough", "max_age": 3600, "size": 4 * 1024**3},
"meta": {"mode": CacheMode.C_MIRROR},
},
cache_url="file:///home/user/.cache/borgstore/repo",
)

Behavior
--------

- Cache keys are identical to primary backend keys (same nesting).
- Soft-deleted items are cached under the same ``.del`` name as primary.
- Soft delete/undelete (``move(delete=True|undelete=True)``) renames cache
entries in lockstep with primary backend names.
- If ``max_age`` is configured and a cache item is expired, it is deleted from
the cache and treated as a cache miss.
- On ``Store.close()``, cache-enabled namespaces are scanned before closing the
cache backend. Cleanup order per namespace is:

1. remove expired cache objects when ``max_age`` is configured,
2. if ``size`` is configured, evict the least-recently-used remaining items
until the namespace total size is ``<= size``.

Expired entries are always removed first, even if total size is already below
the ``size`` limit.
- Cache failures are non-fatal and logged as warnings.

Limitations
-----------

- Eviction is close-time only (on ``Store.close()``), not continuous during
``store()``/``load()`` operations.
- No proactive cache validation/revalidation.
- If an object is deleted in the primary backend by another client, the local
cache will still have a stale object.
- ``max_age`` and LRU-by-``size`` depend on backend ``ItemInfo.atime`` support.
If ``atime`` is 0 (not implemented):

- using ``max_age`` would empty the cache on ``Store.close()``
- using ``size`` would not work in LRU order, because order can't be determined

Statistics
----------

``Store.stats`` includes cache counters:

- ``cache_hits``
- ``cache_misses``
- ``cache_errors``
- ``cache_bytes_read``
- ``cache_bytes_written``
- ``cache_hit_ratio``
- ``cache_disabled``
3 changes: 2 additions & 1 deletion src/borgstore/backends/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@

from ..constants import MAX_NAME_LENGTH, TMP_SUFFIX, HID_SUFFIX

ItemInfo = namedtuple("ItemInfo", "name exists size directory")
# atime is the last read access UNIX timestamp [s] or 0 if not implemented
ItemInfo = namedtuple("ItemInfo", "name exists size directory atime", defaults=(0,))


def validate_name(name):
Expand Down
4 changes: 2 additions & 2 deletions src/borgstore/backends/posixfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ def info(self, name):
return ItemInfo(name=path.name, exists=False, directory=False, size=0)
else:
is_dir = stat.S_ISDIR(st.st_mode)
return ItemInfo(name=path.name, exists=True, directory=is_dir, size=st.st_size)
return ItemInfo(name=path.name, exists=True, directory=is_dir, size=st.st_size, atime=st.st_atime)

def load(self, name, *, size=None, offset=0):
if not self.opened:
Expand Down Expand Up @@ -361,7 +361,7 @@ def list(self, name):
pass
else:
is_dir = stat.S_ISDIR(st.st_mode)
yield ItemInfo(name=p.name, exists=True, size=st.st_size, directory=is_dir)
yield ItemInfo(name=p.name, exists=True, size=st.st_size, directory=is_dir, atime=st.st_atime)

def quota(self) -> dict:
"""Return quota information: limit and usage in bytes. -1 means not set / not tracked."""
Expand Down
Loading
Loading