Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions API-INTERNAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,17 @@ If the requested key is a collection, it will return an object with all the coll
<dd><p>Remove a key from Onyx and update the subscribers</p>
</dd>
<dt><a href="#retryOperation">retryOperation()</a></dt>
<dd><p>Handles storage operation failures based on the error type:</p>
<dd><p>Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:</p>
<ul>
<li>Storage capacity errors: evicts data and retries the operation</li>
<li>Invalid data errors: logs an alert and throws an error</li>
<li>Non-retriable errors: logs an alert and resolves without retrying</li>
<li>Other errors: retries the operation</li>
<li>INVALID_DATA: logs an alert and throws (the same data will always fail).</li>
<li>TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.</li>
<li>CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.</li>
<li>UNKNOWN: bounded retry.</li>
</ul>
</dd>
<dt><a href="#broadcastUpdate">broadcastUpdate()</a></dt>
Expand Down Expand Up @@ -318,11 +323,16 @@ Remove a key from Onyx and update the subscribers
<a name="retryOperation"></a>

## retryOperation()
Handles storage operation failures based on the error type:
- Storage capacity errors: evicts data and retries the operation
- Invalid data errors: logs an alert and throws an error
- Non-retriable errors: logs an alert and resolves without retrying
- Other errors: retries the operation
Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:
- INVALID_DATA: logs an alert and throws (the same data will always fail).
- TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.
- CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.
- UNKNOWN: bounded retry.

**Kind**: global function
<a name="broadcastUpdate"></a>
Expand Down
85 changes: 46 additions & 39 deletions lib/OnyxUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ import * as Logger from './Logger';
import type Onyx from './Onyx';
import cache, {TASK} from './OnyxCache';
import OnyxKeys from './OnyxKeys';
import * as Str from './Str';
import StorageCircuitBreaker from './StorageCircuitBreaker';
import Storage from './storage';
import {StorageErrorClass, classifyStorageError} from './storage/errors';
import type {
CollectionKeyBase,
ConnectOptions,
Expand Down Expand Up @@ -49,26 +50,6 @@ const METHOD = {
CLEAR: 'clear',
} as const;

// IndexedDB errors that indicate storage capacity issues where eviction can help
const IDB_STORAGE_ERRORS = [
'quotaexceedederror', // Browser storage quota exceeded
] as const;

// SQLite errors that indicate storage capacity issues where eviction can help
const SQLITE_STORAGE_ERRORS = [
'database or disk is full', // Device storage is full
] as const;

const STORAGE_ERRORS = [...IDB_STORAGE_ERRORS, ...SQLITE_STORAGE_ERRORS];

// IndexedDB errors where retrying is futile because the underlying connection/store is broken.
// The healing path (separate from retryOperation) is responsible for recovery.
const IDB_NON_RETRIABLE_ERRORS = [
'internal error opening backing store', // LevelDB backing store is broken at the filesystem level
] as const;

const NON_RETRIABLE_ERRORS = [...IDB_NON_RETRIABLE_ERRORS];

// Max number of retries for failed storage operations
const MAX_STORAGE_OPERATION_RETRY_ATTEMPTS = 5;

Expand Down Expand Up @@ -791,8 +772,11 @@ function remove<TKey extends OnyxKey>(key: TKey, isProcessingCollectionUpdate?:
function reportStorageQuota(error?: Error): Promise<void> {
return Storage.getDatabaseSize()
.then(({bytesUsed, bytesRemaining, usageDetails}) => {
// `bytesRemaining` comes from navigator.storage.estimate() and is an ORIGIN-WIDE estimate,
// not headroom for this database. The browser allocates IndexedDB storage dynamically, so a
// QuotaExceededError can legitimately occur even when this number still looks large.
Logger.logInfo(
`Storage Quota Check -- bytesUsed: ${bytesUsed} bytesRemaining: ${bytesRemaining}${
`Storage Quota Check -- bytesUsed: ${bytesUsed} originWideBytesRemaining (estimate, not per-DB headroom): ${bytesRemaining}${
usageDetails ? ` usageDetails: ${JSON.stringify(usageDetails)}` : ''
}. Original error: ${error}`,
);
Expand All @@ -803,43 +787,64 @@ function reportStorageQuota(error?: Error): Promise<void> {
}

/**
* Handles storage operation failures based on the error type:
* - Storage capacity errors: evicts data and retries the operation
* - Invalid data errors: logs an alert and throws an error
* - Non-retriable errors: logs an alert and resolves without retrying
* - Other errors: retries the operation
* Handles storage operation failures based on the error class (see lib/storage/errors.ts).
* The connection layer (createStore) owns connection/transport recovery; this operation layer owns
* capacity recovery (eviction) so that a given failure is retried by exactly one layer:
* - INVALID_DATA: logs an alert and throws (the same data will always fail).
* - TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
* and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.
* - CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
* circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
* progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.
* - UNKNOWN: bounded retry.
*/
function retryOperation<TMethod extends RetriableOnyxOperation>(error: Error, onyxMethod: TMethod, defaultParams: Parameters<TMethod>[0], retryAttempt: number | undefined): Promise<void> {
const currentRetryAttempt = retryAttempt ?? 0;
const nextRetryAttempt = currentRetryAttempt + 1;
const errorClass = classifyStorageError(error);

Logger.logInfo(`Failed to save to storage. Error: ${error}. onyxMethod: ${onyxMethod.name}. retryAttempt: ${currentRetryAttempt}/${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS}`);
// Once the breaker is open, every capacity write is going to fail the same way. Drop it silently —
// the breaker already emitted its single alert, and logging per failed write is exactly the storm
// we are suppressing. (We return before the log line below on purpose.)
if (errorClass === StorageErrorClass.CAPACITY && StorageCircuitBreaker.isTripped()) {
return Promise.resolve();
}

if (error && Str.startsWith(error.message, "Failed to execute 'put' on 'IDBObjectStore'")) {
Logger.logInfo(
`Failed to save to storage. Error: ${error}. class: ${errorClass}. onyxMethod: ${onyxMethod.name}. retryAttempt: ${currentRetryAttempt}/${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS}`,
);

if (errorClass === StorageErrorClass.INVALID_DATA) {
Logger.logAlert(`Attempted to set invalid data set in Onyx. Please ensure all data is serializable. Error: ${error}`);
throw error;
}

const errorMessage = error?.message?.toLowerCase?.();
const errorName = error?.name?.toLowerCase?.();
const isStorageCapacityError = STORAGE_ERRORS.some((storageError) => errorName?.includes(storageError) || errorMessage?.includes(storageError));
const isNonRetriableError = NON_RETRIABLE_ERRORS.some((nonRetriableError) => errorName?.includes(nonRetriableError) || errorMessage?.includes(nonRetriableError));

if (isNonRetriableError) {
Logger.logAlert(`Storage operation skipped retry for non-retriable error. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
if (errorClass === StorageErrorClass.TRANSIENT || errorClass === StorageErrorClass.FATAL) {
Logger.logInfo(`Storage operation skipped retry; ${errorClass} errors are handled by the connection layer. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
return Promise.resolve();
}

if (nextRetryAttempt > MAX_STORAGE_OPERATION_RETRY_ATTEMPTS) {
Logger.logAlert(`Storage operation failed after 5 retries. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
Logger.logAlert(`Storage operation failed after ${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS} retries. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
return Promise.resolve();
}

if (!isStorageCapacityError) {
if (errorClass !== StorageErrorClass.CAPACITY) {
// UNKNOWN error — bounded retry without eviction.
// @ts-expect-error No overload matches this call.
return onyxMethod(defaultParams, nextRetryAttempt);
}

// CAPACITY: feed the session-level circuit breaker before evicting. The per-operation budget above
// cannot stop a session-wide storm — each evicted key triggers an OnyxDerived recompute that spawns
// a fresh write with its own budget — so the breaker is what actually halts the meltdown. (The
// already-open case returned silently at the top of this function.)
StorageCircuitBreaker.recordCapacityFailure();
if (StorageCircuitBreaker.isTripped()) {
// This failure tripped the breaker; it already emitted its single alert. Stop here.
return Promise.resolve();
}

// Find the least recently accessed evictable key that we can remove
const keyForRemoval = cache.getKeyForEviction();
if (!keyForRemoval) {
Expand All @@ -850,9 +855,11 @@ function retryOperation<TMethod extends RetriableOnyxOperation>(error: Error, on
return reportStorageQuota(error);
}

// Remove the least recently accessed key and retry.
// Remove the least recently accessed key and retry. Tell the breaker we evicted so that, if the
// retry comes back as another capacity failure, it counts as a no-progress cycle.
Logger.logInfo(`Out of storage. Evicting least recently accessed key (${keyForRemoval}) and retrying. Error: ${error}`);
reportStorageQuota(error);
StorageCircuitBreaker.recordEviction();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear pending eviction after a successful retry

When a capacity failure evicts a key and the subsequent retry succeeds, this flag is never cleared, so the next quota error within the rolling window is counted as a “no-progress” eviction even though the previous eviction did make progress. With intermittent quota pressure where each eviction successfully frees enough space, five such successful cycles will still trip the breaker and then silently skip later storage writes for 60s; the pending result should be cleared/reset on successful retry instead of only on the next capacity failure.

Useful? React with 👍 / 👎.


// @ts-expect-error No overload matches this call.
return remove(keyForRemoval).then(() => onyxMethod(defaultParams, nextRetryAttempt));
Expand Down
106 changes: 106 additions & 0 deletions lib/StorageCircuitBreaker.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import * as Logger from './Logger';

/**
* Process-scoped circuit breaker for storage CAPACITY failures.
*
* The per-operation retry budget in `OnyxUtils.retryOperation` cannot stop a session-level storm:
* each evict -> OnyxDerived recompute -> new write starts its own fresh budget, so a full disk or
* exhausted quota can drive tens of thousands of evict+retry cycles that never make progress and
* freeze the app. This breaker is the session-level brake — `retryOperation` consults it before
* every eviction.
*
* It trips when EITHER:
* - capacity failures within {@link ROLLING_WINDOW_MS} exceed {@link FAILURE_THRESHOLD}, or
* - {@link NO_PROGRESS_CAP} consecutive evictions are each immediately followed by another capacity
* failure (the eviction freed nothing the next write could use — a no-progress cycle). This is a
* cheap proxy for `getDatabaseSize()`, which is costly and only reports origin-wide usage.
*
* On trip it emits exactly ONE alert and self-resets once the rolling window clears, so a persistent
* condition produces at most one alert per window instead of one log line per failed write.
*/

/** Rolling window over which capacity failures are counted, and how long a trip stays open. */
const ROLLING_WINDOW_MS = 60 * 1000;

/** Capacity failures within the window above which the breaker trips (storm backstop). */
const FAILURE_THRESHOLD = 50;

/** Consecutive no-progress evictions (evict -> still capacity failure) above which the breaker trips. */
const NO_PROGRESS_CAP = 5;

let failureTimestamps: number[] = [];
let consecutiveNoProgressEvictions = 0;
let evictionAwaitingResult = false;
let trippedUntil = 0;

function reset(): void {
failureTimestamps = [];
consecutiveNoProgressEvictions = 0;
evictionAwaitingResult = false;
trippedUntil = 0;
}

/** Whether the breaker is currently open. Self-resets once the window since the trip has cleared. */
function isTripped(): boolean {
if (trippedUntil === 0) {
return false;
}
if (Date.now() >= trippedUntil) {
reset();
return false;
}
return true;
}

function trip(reason: string): void {
trippedUntil = Date.now() + ROLLING_WINDOW_MS;
Logger.logAlert(`Storage circuit breaker tripped: ${reason}. Halting eviction/retry for ${ROLLING_WINDOW_MS / 1000}s to stop a storage failure storm.`);
}

/**
* Record a CAPACITY failure. Call once per capacity failure in `retryOperation`, BEFORE deciding
* whether to evict; then check {@link isTripped} to decide whether to proceed.
*/
function recordCapacityFailure(): void {
// While open, recording is a no-op: no extra timestamps, no second alert, and nothing to keep the
// window from clearing. `isTripped()` self-resets here once the window has elapsed.
if (isTripped()) {
return;
}

const now = Date.now();
failureTimestamps = failureTimestamps.filter((timestamp) => now - timestamp < ROLLING_WINDOW_MS);

// A fresh storm (nothing left in the window) resets the no-progress tracking so a stale eviction
// from an earlier, unrelated incident can't be miscounted as no-progress for this one.
if (failureTimestamps.length === 0) {
consecutiveNoProgressEvictions = 0;
evictionAwaitingResult = false;
}

// We evicted on the previous cycle and we're back here with another capacity failure, so that
// eviction freed no usable space.
if (evictionAwaitingResult) {
consecutiveNoProgressEvictions += 1;
evictionAwaitingResult = false;
}

failureTimestamps.push(now);

if (failureTimestamps.length > FAILURE_THRESHOLD) {
trip(`${failureTimestamps.length} capacity failures within ${ROLLING_WINDOW_MS / 1000}s`);
return;
}
if (consecutiveNoProgressEvictions >= NO_PROGRESS_CAP) {
trip(`${consecutiveNoProgressEvictions} consecutive evictions freed no usable space`);
}
}

/** Record that `retryOperation` just evicted a key, so the next capacity failure counts as no-progress. */
function recordEviction(): void {
evictionAwaitingResult = true;
}

const StorageCircuitBreaker = {recordCapacityFailure, recordEviction, isTripped, reset, ROLLING_WINDOW_MS, FAILURE_THRESHOLD, NO_PROGRESS_CAP};

export default StorageCircuitBreaker;
77 changes: 77 additions & 0 deletions lib/storage/errors.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import type {ValueOf} from 'type-fest';

/**
* Single source of truth for classifying storage (IndexedDB / SQLite) write failures.
*
* Both layers that react to storage errors consult this:
* - the connection layer (`createStore`) recovers TRANSIENT and FATAL errors by reopening the DB, and
* - the operation layer (`OnyxUtils.retryOperation`) recovers CAPACITY by eviction and retries UNKNOWN.
*
* Keeping the matchers here (instead of duplicated string lists in each layer) guarantees the two
* layers agree on what an error *is*, even though they react to it differently. This module has no
* Onyx dependencies so it can live in the storage layer without creating an import cycle.
*/
const StorageErrorClass = {
/** Connection/transport failure (stale connection). Owner: connection layer — reopen + retry once. */
TRANSIENT: 'transient',
/** Quota exceeded / disk full. Owner: operation layer — evict and retry. */
CAPACITY: 'capacity',
/** Non-serializable payload. Never retriable — the same data will always fail. */
INVALID_DATA: 'invalidData',
/** Backing-store corruption. Owner: connection layer — budgeted heal, then give up. */
FATAL: 'fatal',
/** Unmatched. Owner: operation layer — bounded retry. */
UNKNOWN: 'unknown',
} as const;

type StorageErrorClassValue = ValueOf<typeof StorageErrorClass>;

function getErrorParts(error: unknown): {name: string; message: string} {
if (error instanceof Error || error instanceof DOMException) {
return {name: (error.name ?? '').toLowerCase(), message: (error.message ?? '').toLowerCase()};
}
return {name: '', message: String(error ?? '').toLowerCase()};
}

/**
* Classifies a storage write error into one of the {@link StorageErrorClass} buckets.
* Matching is done on the lowercased error name and message.
*/
function classifyStorageError(error: unknown): StorageErrorClassValue {
const {name, message} = getErrorParts(error);

// Non-serializable data passed to IDBObjectStore.put — retrying is futile.
if (message.includes("failed to execute 'put' on 'idbobjectstore'")) {
return StorageErrorClass.INVALID_DATA;
}

// Storage capacity: browser quota exceeded (IDB) or device disk full (SQLite).
if (name.includes('quotaexceedederror') || message.includes('quotaexceedederror') || message.includes('database or disk is full')) {
return StorageErrorClass.CAPACITY;
}

// Backing-store corruption (Chromium LevelDB). Recoverable only via a budgeted reopen.
if (message.includes('internal error opening backing store')) {
return StorageErrorClass.FATAL;
}

// Transient connection/transport failures — the cached connection is stale and a reopen fixes it:
// - InvalidStateError: connection closed between getDB() resolving and db.transaction().
// - AbortError: write transaction aborted (connection close / versionchange / sibling abort).
// - Safari/WebKit IDB server termination for backgrounded tabs.
if (
name.includes('invalidstateerror') ||
name.includes('aborterror') ||
message.includes('connection to indexed database server lost') ||
message.includes('connection is closing') ||
// This is related to https://github.com/Expensify/react-native-onyx/pull/796 — remove this comment when #796 is merged.
message.includes('idb write transaction aborted without an error')
) {
return StorageErrorClass.TRANSIENT;
}

return StorageErrorClass.UNKNOWN;
}

export {StorageErrorClass, classifyStorageError};
export type {StorageErrorClassValue};
Loading
Loading