Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Version storage migration scripts

These scripts migrate version data from the legacy layout (`org/.da-versions/fileId/`) to the new layout (`org/repo/.da-versions/fileId/` plus `audit.txt`).

## Prerequisites

- Node.js (ESM)
- Environment: set `AEM_BUCKET_NAME`, `ORG`, and S3 credentials. Easiest: copy `.dev.vars` to `.env` or export vars, and ensure `scripts/load-env.js` is imported so `.dev.vars` / `.env` are loaded.

## Scripts

### 1. Analyse (`version-migrate-analyse.js`)

Lists all version folders under `org/.da-versions/` and samples object counts (empty vs with content).

```bash
ORG=myorg AEM_BUCKET_NAME=mybucket node scripts/version-migrate-analyse.js
# or with .dev.vars present:
node scripts/version-migrate-analyse.js myorg
```

### 2. Migrate (`version-migrate-run.js`)

For each file ID under `org/.da-versions/`:

- Copies snapshot objects (contentLength > 0) to `org/repo/.da-versions/fileId/versionId.ext` (repo from object metadata `path`).
- Builds `audit.txt`: deduplicates legacy empty-version metadata (same user + 30 min window), **merges with any existing `audit.txt` already in the new path** (hybrid case: project not yet migrated but new PUTs have been writing audit there), then writes the combined, deduplicated result.

**Dry run (no writes):**

```bash
DRY_RUN=1 ORG=myorg AEM_BUCKET_NAME=mybucket node scripts/version-migrate-run.js
```

**Execute:**

```bash
ORG=myorg AEM_BUCKET_NAME=mybucket node scripts/version-migrate-run.js
```

### 3. Validate (`version-migrate-validate.js`)

Compares object counts for a single document: legacy prefix vs new prefix.

```bash
ORG=myorg node scripts/version-migrate-validate.js myorg repo/path/to/file.html
```

## Env vars

| Variable | Description |
|---------------------|--------------------------------|
| `AEM_BUCKET_NAME` | R2/S3 bucket name |
| `ORG` | Org slug (e.g. `kptdobe`) |
| `S3_ACCESS_KEY_ID` | S3/R2 access key |
| `S3_SECRET_ACCESS_KEY` | S3/R2 secret key |
| `S3_DEF_URL` | S3/R2 endpoint URL |
| `DRY_RUN` | Set to `1` to skip writes (migrate script) |

Load from `.dev.vars` or `.env` by ensuring the script imports `./load-env.js` first (already done in each script).
35 changes: 35 additions & 0 deletions scripts/load-env.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Copyright 2025 Adobe. All rights reserved.
* This file is licensed to you under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License. You may obtain a copy
* of the License at http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under
* the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR REPRESENTATIONS
* OF ANY KIND, either express or implied. See the License for the specific language
* governing permissions and limitations under the License.
*/
import { readFileSync, existsSync } from 'fs';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';

const scriptDir = dirname(fileURLToPath(import.meta.url));
const root = join(scriptDir, '..');

function loadFile(name) {
const path = join(root, name);
if (!existsSync(path)) return;
const content = readFileSync(path, 'utf8');
for (const line of content.split('\n')) {
const t = line.trim();
const eq = t.indexOf('=');
if (t && !t.startsWith('#') && eq > 0) {
const key = t.slice(0, eq).trim();
const value = t.slice(eq + 1).trim();
if (!process.env[key]) process.env[key] = value;
}
}
}

loadFile('.dev.vars');
loadFile('.env');
170 changes: 170 additions & 0 deletions scripts/version-migrate-analyse.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
/*
* Copyright 2025 Adobe. All rights reserved.
* This file is licensed to you under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License. You may obtain a copy
* of the License at http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under
* the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR REPRESENTATIONS
* OF ANY KIND, either express or implied. See the License for the specific language
* governing permissions and limitations under the License.
*/
/* eslint-disable no-await-in-loop -- script: list + concurrency use await in loops */
import './load-env.js';
import {
S3Client,
ListObjectsV2Command,
} from '@aws-sdk/client-s3';

const Bucket = process.env.AEM_BUCKET_NAME;
const Org = process.env.ORG || process.argv[2];

/** Process N file IDs in parallel. */
const CONCURRENCY = parseInt(process.env.MIGRATE_ANALYSE_CONCURRENCY || '25', 10);

if (!Bucket || !Org) {
console.error('Set AEM_BUCKET_NAME and ORG (or pass org as first arg)');
process.exit(1);
}

const config = {
region: 'auto',
endpoint: process.env.S3_DEF_URL,
credentials: {
accessKeyId: process.env.S3_ACCESS_KEY_ID,
secretAccessKey: process.env.S3_SECRET_ACCESS_KEY,
},
};
if (process.env.S3_FORCE_PATH_STYLE === 'true') config.forcePathStyle = true;

const client = new S3Client(config);
const prefix = `${Org}/.da-versions/`;

async function runWithConcurrency(limit, items, fn) {
const results = [];
const executing = new Set();
for (const item of items) {
const p = Promise.resolve().then(() => fn(item));
results.push(p);
executing.add(p);
p.finally(() => {
executing.delete(p);
});
if (executing.size >= limit) {
await Promise.race(executing);
}
}
return Promise.all(results);
}

async function listFileIds() {
const ids = [];
let token;
do {
const cmd = new ListObjectsV2Command({
Bucket,
Prefix: prefix,
Delimiter: '/',
MaxKeys: 1000,
ContinuationToken: token,
});
const resp = await client.send(cmd);
(resp.CommonPrefixes || []).forEach((cp) => {
const p = cp.Prefix.slice(prefix.length).replace(/\/$/, '');
if (p) ids.push(p);
});
token = resp.NextContinuationToken;
} while (token);
return ids;
}

/**
* Count objects for one file ID using list only (Size in list response; no HEAD).
* @returns {{ fileId: string, total: number, empty: number, nonEmpty: number }}
*/
async function countObjects(fileId) {
const listPrefix = `${prefix}${fileId}/`;
let total = 0;
let empty = 0;
let nonEmpty = 0;
let token;
do {
const cmd = new ListObjectsV2Command({
Bucket,
Prefix: listPrefix,
MaxKeys: 1000,
ContinuationToken: token,
});
const resp = await client.send(cmd);
for (const obj of resp.Contents || []) {
total += 1;
const size = obj.Size ?? 0;
if (size === 0) empty += 1;
else nonEmpty += 1;
}
token = resp.NextContinuationToken;
} while (token);
return {
fileId, total, empty, nonEmpty,
};
}

async function main() {
const verbose = process.argv.includes('--verbose') || process.argv.includes('-v');

console.log(`Org: ${Org}, Bucket: ${Bucket}, prefix: ${prefix}`);
const fileIds = await listFileIds();
console.log(`File IDs (version folders): ${fileIds.length}`);
if (fileIds.length === 0) {
console.log('Nothing to analyse.');
return;
}

console.log(`Analysing all ${fileIds.length} folders (concurrency: ${CONCURRENCY})...`);
const startMs = Date.now();
const results = await runWithConcurrency(CONCURRENCY, fileIds, countObjects);
const elapsedSec = (Date.now() - startMs) / 1000;

let totalObjects = 0;
let totalEmpty = 0;
let totalNonEmpty = 0;
const withData = results.filter((r) => r.total > 0);

for (const r of results) {
totalObjects += r.total;
totalEmpty += r.empty;
totalNonEmpty += r.nonEmpty;
}

// Clear summary: what you have and what Migrate will do
console.log('');
console.log('--- Summary ---');
console.log(` File IDs (version folders): ${fileIds.length}`);
console.log(` Total objects (legacy): ${totalObjects}`);
console.log(` Empty (metadata only): ${totalEmpty} → will be converted to audit entries`);
console.log(` With content (snapshots): ${totalNonEmpty} → will be copied to org/repo/.da-versions/fileId/`);
console.log('');
console.log(' Migrate will:');
console.log(` • Copy ${totalNonEmpty} snapshot(s) to the new path (one per repo/fileId).`);
console.log(` • Convert ${totalEmpty} empty object(s) to audit lines in audit.txt (same-user + 30 min dedup per file; version entries do not collapse). Final line count is lower — run Migrate with DRY_RUN=1 to see exact numbers.`);
console.log(' • Merge with any existing audit.txt already in the new path (hybrid case).');
console.log('');
const idsPerSec = fileIds.length / elapsedSec;
const objectsPerSec = totalObjects / elapsedSec;
console.log(
` Timing: ${elapsedSec.toFixed(1)}s total | ${idsPerSec.toFixed(0)} file IDs/s | ${objectsPerSec.toFixed(0)} objects/s`,
);

if (verbose && withData.length > 0) {
console.log('');
console.log('--- Per-file breakdown ---');
for (const r of withData.sort((a, b) => b.total - a.total)) {
console.log(` ${r.fileId}: total=${r.total} empty=${r.empty} nonEmpty=${r.nonEmpty}`);
}
}
}

main().catch((e) => {
console.error(e);
process.exit(1);
});
Loading
Loading