-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Problem
- ~81% of bucket objects are version-related; ~67% are empty (one R2 object per edit, metadata only, no body).
.da-versionsat org root ({Org}/.da-versions/{FileID}/) is a single huge prefix: slow to list and doesn't scale.- Two concepts mixed: (1) real version snapshots (contentLength > 0, explicit "Save version" or Restore Point), (2) audit-only entries (empty objects created on every PUT for "Collab Parse" and similar). The latter explode object count without adding real versions.
Plan (condensed)
1. Labelled versions only as R2 objects
- Remove "Collab Parse" version: stop creating the automatic first-save snapshot and empty version objects on every PUT. Only create version objects for explicit labelled version (Save version, future preview/publish) or Restore Point.
- New path:
{Org}/{Repo}/.da-versions/{FileID}/{VersionUUID}.{ext}— move under repo so listing is per-repo, not org-wide.
2. Single audit file per file (read-before-write dedupe)
- Path:
{Org}/{Repo}/.da-versions/{FileID}/audit.txt - Format: One line per entry (tab-separated):
timestamp \t users \t path \t versionLabel \t versionId- path: stored without repo prefix (e.g.
/surf-copy.html) so the file is readable. - versionLabel: human-readable name when entry is a labelled version (e.g. "v1", "Restore Point"); empty for edits.
- versionId: snapshot id without extension when entry is a version (e.g. UUID); empty for edits.
- Backward compat: 3-column (path only) and 4-column (path + versionId) lines are still parsed.
- path: stored without repo prefix (e.g.
- Write: On every versionable PUT, append or update
audit.txt. Read-before-write with 30 min window: if last line is same user, within 30 min, and both last and new entries are edits (no version), overwrite that line with new timestamp; else append. Labelled version entries always append and are never replaced — they "interrupt" the dedup window (e.g. edit at 12:23, version at 12:25, edit at 12:40 → three entries). No empty version objects.
3. API behaviour during migration
- List: Prefer new path (list
repo/.da-versions/{id}/+ readaudit.txt). Always merge with legacy (listorg/.da-versions/{id}/) so old versions and audit entries show up until migration is complete. Response adds repo prefix back to path and extension to versionId so the API contract matches the previous implementation. - GET: Try new key first, then legacy key.
- PUT/POST: New writes only to new structure (snapshots +
audit.txt). No new writes underorg/.da-versions.
4. Migration
- Scripts (in
scripts/): (1) Analyse — list version folders, count empty vs non-empty; (2) Migrate — copy snapshots toorg/repo/.da-versions/fileId/, buildaudit.txtfrom empty-object metadata using the same 5-column format (path without repo, versionId without extension), same dedup rule (30 min window; version entries do not collapse), merge with any existingaudit.txtin new path (hybrid case); (3) Validate — compare list/GET old vs new for a sample path. - Dual-read: Keep supporting both old and new paths until migration is complete; then remove legacy fallback.
5. Benefits
- Far fewer objects: no per-edit empty version files; one
audit.txtper file with collapsed entries. - Faster listing:
.da-versionsscoped per repo, not one giant org prefix. - Clear separation: real versions (snapshots) vs audit log (single file, deduped, human-readable labels in file).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels