RSM-1639: research — DLA integration into `studio code` by Poliuk · Pull Request #3277 · Automattic/studio

Poliuk · 2026-04-29T09:48:50Z

Related issues

Related to RSM-1639 (https://linear.app/a8c/issue/RSM-1639/figure-out-how-to-make-the-data-liberation-agent-dla-available-within)
Related to RSM-1675 (implementation phase)
Project: https://linear.app/a8c/project/bring-data-liberation-into-studio-code-8e53bd986fcc

Status: draft — DO NOT MERGE. This PR holds two phases of work in a single branch by explicit owner request: the research artifacts under issues/rsm-1639-dla-integration/ (RSM-1639) and the implementation that turns the recommendation into shipped code (RSM-1675, eight [code] tasks T1–T8 plus two [docs] tasks T9–T10). It stays draft after review so the open questions called out below can be triaged before any merge attempt.

How AI was used in this PR

Both phases were AI-orchestrated through the orchestrator skill. The roster:

Research phase (RSM-1639). A research-lead agent planned the investigation in four parallel waves; four researcher agents executed each wave with file-level evidence-gathering inside apps/cli/, the Claude Agent SDK at node_modules/@anthropic-ai/claude-agent-sdk@0.2.117, and a shallow checkout of the private Automattic/data-liberation-agent repo; the lead synthesised findings into the recommendation report. Every concrete file path, line number, type signature, MCP tool name, and bundling size in research-report.md is sourced directly from the codebase, not from the model's training data.
Implementation phase (RSM-1675). A planner agent decomposed the recommendation into ten ordered atomic tasks (plan.md); eight implementer agents (one per [code] task) landed T1–T8 against the plan; a code-reviewer agent ran the full vitest workspace, typecheck, lint, and dev/prod CLI builds, and approved the implementation (review-1.md); two documentator agents wrote T9 and T10; this PR description and the documentation review (doc-review-1.md) are produced by a final doc-reviewer pass.

The orchestrator log, individual researcher findings, planning artifacts, code-review evidence, and the doc-review report all live under issues/rsm-1639-dla-integration/ for inspection.

Proposed Changes

Research artifacts (`issues/rsm-1639-dla-integration/`)

research-plan.md — research question, sub-questions, four-wave plan, findings log, evaluation against "research complete" criteria.
tasks/wave-1-*.md — four task briefs assigned to wave-1 researchers (DLA inventory; Studio Code skill/MCP/slash-command plumbing; Claude Agent SDK plugin/MCP loading semantics; CLI bundling and distribution constraints).
findings/wave-1-*.md — four exhaustive researcher reports, each with evidence (file paths, line numbers, manifest contents verbatim, MCP type signatures, disk sizes, release cadences).
research-report.md — synthesis. Five integration approaches investigated (vendor + stdio MCP; vendor + in-process MCP; npm dep; runtime fetch; handler-only CLI spawn); head-to-head comparison; opinionated recommendation; eight open questions for the implementation phase.
plan.md — atomic, ordered task plan derived from research-report.md. Tracks which open questions each task resolves.
review-1.md — code-reviewer's verdict (approved) for T1–T8 with command outputs, per-task acceptance verification, and a list of non-blocking nits.
verification/ — captured stdout for each verification command in the code review.
doc-review-1.md — this doc-reviewer's verdict for T9–T10.
PR-description.md — this document.

Implementation (`apps/cli/`, `tools/common/ai/`, root `scripts/`, root `package.json`, root `vitest.config.ts`)

The implementation realises Approach A from research-report.md: vendor DLA's plugin tree under apps/cli/ai/dla/, load it as a second local SDK plugin alongside apps/cli/ai/plugin/, and boot DLA's MCP server as a stdio child-process entry alongside Studio's in-process MCP. A new /migrate slash command surfaces this through a thin Studio-side wrapper skill that drives DLA's liberate_* workflow and uses DLA's existing delegate: true import mode to hand artifacts back to Studio's site_create and wp_cli plumbing.

Eight [code] tasks (each a single commit on the branch):

T1 — apps/cli/vite.config.prod.ts static-copy fix. Adds the viteStaticCopy block that the dev/npm configs already have for ai/plugin. This was a pre-existing latent bug independently surfaced by the research (research-report.md Open Question 1) — landing it first makes T5 verifiable. Confirmed by inspecting apps/cli/dist/cli/plugin/.claude-plugin/plugin.json after npm run cli:build:prod. (Commit 14a58dda.)
T2 — scripts/download-data-liberation-agent.ts. Build-time fetch script modeled on scripts/download-agent-skills.ts. Pinned SHA, GH_PAT/GH_TOKEN auth, graceful skip when no token, tarball download, atomic-ish staging swap, tsc pre-compile (no runtime tsx dependency), dist-vendored/ → src/ rename, vendored PHP under src/lib/preview/scripts/ preserved, .dla-pinned-sha provenance file, --update/STUDIO_REFRESH_DLA=1 opt-in. Four vitest cases. (Commit d329634b.)
T3 — Postinstall + runtime deps. Wires the fetch script into the root package.json postinstall chain after download-agent-skills.ts; adds fast-xml-parser ^5.7.2 and papaparse ^5.5.3 to apps/cli/package.json; adds apps/cli/ai/dla/ to .gitignore. (Commit 20169f27.)
T4 — Vite static-copy targets for ai/dla. All three Vite configs (vite.config.dev.ts, vite.config.npm.ts, vite.config.prod.ts) gain an existsSync-guarded ai/dla target so the build still works on contributors without the vendored tree. Cross-config snapshot test in apps/cli/tests/vite-config.test.ts. (Commit 5785f4d5.)
T5 — agent.ts plugin + MCP wiring. apps/cli/ai/agent.ts registers DLA conditionally on dlaAvailable (fs.existsSync(path.resolve(import.meta.dirname, 'dla'))). DLA loads as a second local plugin and as a stdio MCP server keyed under data-liberation, spawned with process.execPath against the absolute path to the pre-compiled mcp-server.js. STUDIO_WPCOM_TOKEN is forwarded explicitly; LIBERATION_TOKEN/SHOPIFY_ADMIN_TOKEN are forwarded transitively via ...resolvedEnv. The wpcomAccessToken read in apps/cli/commands/ai/index.ts is moved out of the site?.remote guard so the token is available to DLA regardless of site type. Four agent.test.ts cases cover the conditional wiring. (Commit 4231813d.)
T6 — /migrate slash command. tools/common/ai/slash-commands.ts appends { name: 'migrate', description: __('Migrate a site from a closed platform into Studio') } to AI_SKILL_COMMANDS. The shared list auto-wires the chat dispatcher in apps/cli/commands/ai/index.ts, the autocomplete provider in apps/cli/ai/ui.ts, the desktop IPC dispatcher in apps/studio/src/ipc-handlers.ts, and the renderer composer slash hints — no Electron-side change required because the skill-based path was chosen specifically to satisfy the existing IPC dispatcher's AI_SKILL_COMMANDS filter. Test in tools/common/ai/tests/slash-commands.test.ts. (Commit aaa50d3d.)
T7 — apps/cli/ai/plugin/skills/migrate/SKILL.md. Studio-side wrapper skill with user-invocable: true (with C, not the K typo present in older Studio skills), argument-hint: <source-url>, and a precise allowed-tools list scoped to the DLA tools we use plus the Studio MCP tools we drive. Body covers Steps 1–9 (identify, inspect, confirm, extract, verify, setup-with-delegate, create Studio site, import-with-delegate, wrap up) plus an explicit "What this skill does NOT do" footer documenting the deferral of headless mode (Approach E). Includes the importWxr blueprint shape so the model can construct it for very large WXR exports that would otherwise hit the WP-CLI 120s IPC timeout. Vitest case in apps/cli/ai/tests/plugin-skills.test.ts. (Commit 202302c2.)
T8 — canUseTool permission scoping. apps/cli/ai/dla-permissions.ts exports buildDlaCanUseTool(options); agent.ts wires it to query()'s canUseTool only when DLA is available. Read-only DLA tools (liberate_detect, liberate_discover, liberate_inspect, liberate_status, liberate_verify) auto-allow; liberate_import with delegate: true auto-allows; ask-once tools (liberate_extract, liberate_setup, liberate_map_apis, liberate_probe, plus liberate_import without delegate) prompt via the existing onAskUser plumbing and memoise per-session via a closure-scoped Set; default-deny when onAskUser is missing (with a TODO referencing OQ2 for a future non-interactive opt-in flag) and on unrecognised DLA tools. 11 vitest cases cover the policy. (Commit e57e012f.)

Two [docs] tasks:

T9 — apps/cli/README.md. New "Migrate from a closed platform" section (after "Studio Code"). Documents the eight supported platforms, the /migrate and /migrate <url> invocation shapes, the agent-driven flow (inspect → extract → verify → site-create → import), the LIBERATION_TOKEN/SHOPIFY_ADMIN_TOKEN requirement for Webflow/Shopify, the ~/Studio/ landing dir, the GH_PAT requirement at install time, and a credit line for the Data Liberation Agent. ToC updated. (Commit fb85ddce.)
T10 — docs/design-docs/cli.md. New "Data Liberation Agent integration" architecture section. Covers vendoring (the fetch script, tsc pre-compile, dist-vendored/ → src/ rename, vendored PHP, .dla-pinned-sha), plugin and MCP wiring (process.execPath, absolute path to mcp-server.js, no cwd field on McpStdioServerConfig, env forwarding), the delegate: true handoff contract, the canUseTool permission policy, and the SHA-pin update cadence. Cross-links to research-report.md for trade-off rationale. (Commit c18568a0.)

Electron-side gotchas surfaced and resolved

vite.config.prod.ts plugin-copy gap (was Open Question 1). Independently fixed by T1 — adding the viteStaticCopy block for ai/plugin ensures the Electron-bundled studio code actually loads the SDK plugin tree. The gap was pre-existing and would have silently shipped without DLA forcing a verification.
Electron IPC dispatcher requirement (apps/studio/src/ipc-handlers.ts:295-306). Only forwards AI_SKILL_COMMANDS entries; handler-only slash commands would not appear in the desktop slash menu. The skill-based shape of /migrate (T6 + T7) satisfies this constraint by construction — no Electron-side change required.

Out of scope (consciously deferred)

/migrate --headless (Approach E from research-report.md) — documented as deferred in T7's "What this skill does NOT do" footer.
Migrating existing Studio skills off the user-invokable (with K) typo — orthogonal cleanup, T7 just uses the correct user-invocable (with C) for the new skill.
Anything under apps/studio/ — out of scope by construction (CLI-only PR per the research scope).
Multi-plugin name namespacing verification (Open Question 4) and DLA orphan-cleanup behaviour (Open Question 7) — deferred upstream; require a real cli:package run with DLA vendored to verify.

Testing Instructions

Reviewing the research

Read issues/rsm-1639-dla-integration/research-report.md end-to-end. The Executive Summary and Recommendation sections are load-bearing.
For each claim that influences your decision, cross-check against the corresponding findings/wave-1-*.md file — every claim cites file paths and line numbers.
The "Approaches Investigated" and "Comparison" sections list the four alternatives with concrete pros/cons and explicit rejection reasons.
The "Open Questions" section enumerates items deferred to the implementation phase. plan.md records which open questions each implementation task resolves.

Reviewing the implementation

The full vitest workspace passes (1474 tests across 158 files), typecheck is clean, lint is clean on every file the patches touched, and dev/prod CLI builds both succeed without DLA vendored. To reproduce locally:

npm install                                 # postinstall skips DLA fetch when GH_PAT is unset
npm run typecheck                           # all four workspaces
npx vitest run                              # full workspace
npx vitest run --project=cli                # CLI tests
npx vitest run --project=common --project=scripts
npm run cli:build                           # dev build
npm run cli:build:prod                      # prod build, validates T1
node apps/cli/dist/cli/main.mjs code --help # smoke check, default locale
LANG=fr_FR.UTF-8 node apps/cli/dist/cli/main.mjs code --help # smoke check, French

Captured outputs from the code-reviewer's verification run live at issues/rsm-1639-dla-integration/verification/review-1-*.txt for direct inspection.

Exercising `/migrate` end-to-end

The full /migrate flow can only be exercised with a vendored DLA tree, which requires read access to the private Automattic/data-liberation-agent repo:

Set GH_PAT in your environment with read access to the repo.
Run npm install (the postinstall step will run scripts/download-data-liberation-agent.ts and populate apps/cli/ai/dla/).
Run npm run cli:build and confirm apps/cli/dist/cli/dla/.claude-plugin/plugin.json is present.
Run node apps/cli/dist/cli/main.mjs code and type /migrate (or /migrate https://example.wixsite.com/foo).
The agent should walk you through inspect → extract → verify → site-create → import. The canUseTool policy will prompt before liberate_extract and unmemoise after the session ends.

Without GH_PAT, the postinstall logs a warning and exits 0; /migrate is gated on fs.existsSync(apps/cli/ai/dla) at runtime in agent.ts, so the agent runs normally without DLA but the /migrate skill is unavailable. The unit-test path for T2/T5/T7/T8 covers both branches without needing DLA vendored.

Pre-merge Checklist

This PR is intentionally draft — owner has flagged it as DO NOT MERGE. The items below need to be resolved before any merge attempt.

Have you checked for TypeScript, React or other console errors? Verified clean by review-1.md.
Real npm run cli:package run with DLA vendored. Confirms the apps/cli/vite.config.prod.ts plugin-copy fix lands DLA into the Electron extra-resource bundle (Open Question 1, resolved by T1, but verification with DLA vendored is still pending — the code-reviewer ran the prod build without DLA on disk).
GH_PAT distribution decision (Open Question 6). Right now contributors must supply their own PAT or the install silently skips DLA. Long-term resolution: push the DLA team for tagged public releases so we can move to an npm dep (Approach C from research-report.md). Short-term: decide whether the team uses a shared GitHub App token, per-user PATs, or a public-mirror-of-tagged-releases workflow.
Bump DLA_PINNED_SHA from the 'main' placeholder. Currently scripts/download-data-liberation-agent.ts pins to 'main'; before merge it must be a real commit SHA (TODO comment in the script tracks this).
Multi-plugin name namespacing verification (Open Question 4). A 10-minute test during implementation: spawn studio code with DLA vendored and confirm DLA's MCP tools surface as mcp__data-liberation__* exactly once (no double-registration via DLA's own .claude-plugin/plugin.json#mcp colliding with our query()-time mcpServers entry).
DLA orphan-cleanup behaviour (Open Question 7). Confirm DLA's src/lib/preview/studio.ts:266-279 rmSync of orphan Studio site dirs does not double-manage sites Studio Code creates directly.
Recommendation reviewed by a Studio CLI maintainer.
Recommendation reviewed by a DLA maintainer (specifically: vendoring at a pinned SHA, the assumption that delegate: true is the canonical handoff for hosts like Studio Code, and the request to consider tagged public releases so we can move to an npm dep long-term).
Linear tickets RSM-1639 and RSM-1675 updated with the recommendation, the implementation summary, and a link to this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Vendors the Data Liberation Agent at a pinned git SHA into apps/cli/ai/dla/ for runtime use by the studio code agent. Modeled on download-agent-skills.ts. Skips gracefully when GH_PAT is unset so installs keep working for contributors without access to the private upstream repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Hooks `scripts/download-data-liberation-agent.ts` (added in T2) into the root `postinstall` chain right after `download-agent-skills.ts`, mirroring the existing `ts-node ./scripts/...` invocation pattern. Without `GH_PAT` the script logs a clear warning and exits 0, keeping installs working for contributors without access to the private DLA repo. Adds the two DLA runtime deps that aren't currently workspace-hoisted: - `fast-xml-parser@^5.7.2` — DLA's XML/RSS content parser. - `papaparse@^5.5.3` — DLA's WooCommerce CSV importer. Both are pinned to the latest stable major. Once DLA is vendored at a real SHA (RSM-1675 T2 TODO), these should be reconciled with DLA's own pins — verify via `apps/cli/ai/dla/package.json` after vendoring. Per plan §T3 and `wave-1-dla-inventory.md` §4, `ink` is intentionally NOT added: DLA's `src/ui/*.tsx` Ink screens are CLI-only and not invoked by the MCP server path. The MCP server emits progress via `sendLoggingMessage` from `@modelcontextprotocol/sdk` (already in `apps/cli/dependencies`). Adds an `it.todo(...)` placeholder in `apps/cli/ai/tests/agent.test.ts` for the missing-DLA-dir branch; T5 implements the conditional registration and turns this test on. Verified locally: `npm install` from repo root succeeds, the DLA fetch hits its skip-when-missing-`GH_PAT` branch and logs the expected warning without failing the install. `npm run typecheck` passes across all workspaces. `npx vitest run apps/cli/ai/tests/agent.test.ts` shows 2 passing and 1 todo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conditionally registers the Data Liberation Agent (DLA) when its tree is present at apps/cli/ai/dla/. The agent stays runnable without DLA (the common case for contributors without a GH_PAT to vendor it) — registration gates on fs.existsSync. When DLA is available: - Adds `data-liberation` as a stdio MCP server, spawning `src/mcp-server.js` via `process.execPath` so the Electron-bundled Node runtime matches the host. - Appends DLA's directory as a second local plugin entry, exposing its `/migrate` slash command surface. - Forwards `STUDIO_WPCOM_TOKEN` (plus the broader resolvedEnv) so DLA tools targeting WordPress.com sites have credentials available. Drops the `cwd` field from the stdio MCP config: the Anthropic Agent SDK's `McpStdioServerConfig` type does not declare it, and inspecting `sdk.mjs` confirms `cwd` is honored only for the host Claude Code process — not forwarded to MCP children. We pass `mcp-server.js` as an absolute path instead, which lets DLA's `import.meta.url` peer lookups resolve regardless of working directory and keeps the config strictly typesafe with no cast. Also moves the wpcomAccessToken read in commands/ai/index.ts out from behind the `site?.remote` guard. DLA's MCP server may need the token even when the active Studio site is local (e.g. migrating a local source into a WP.com target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements RSM-1675 T8: per-tool permission policy for the Data Liberation Agent's MCP tools. Without this, `permissionMode: 'auto'` would auto-approve write-to-disk and remote-write tools alongside genuinely read-only ones. Policy summary (in `apps/cli/ai/dla-permissions.ts`): - Auto-approve `liberate_detect` / `_discover` / `_inspect` / `_status` / `_verify` (read-only). - `liberate_import` is auto-approved when `tool_input.delegate === true` (DLA returns a manifest; Studio handles the actual import via `wp_cli`). - `liberate_extract` / `_setup` / `_map_apis` / `_probe` / `_import` (without `delegate: true`) ask the user once per session through the existing `AskUserQuestion` plumbing; the answer is memoised in a closure-scoped `Set` so the user is not re-prompted later in the turn. - Non-DLA tools pass through with `{ behavior: 'allow', updatedInput }`. - Unrecognised DLA tools deny by default. Wired into `query()` only when DLA is vendored — non-DLA sessions keep the SDK's default classifier path untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Poliuk added 2 commits April 29, 2026 00:30

Add ticket context (orchestrator)

c022296

Add research report (research-lead)

322a5c3

github-actions Bot assigned Poliuk Apr 29, 2026

Poliuk and others added 13 commits April 29, 2026 12:09

Add implementation plan (planner)

826c2fd

Fix vite.config.prod.ts static-copy gap for ai/plugin (implementer)

14a58dd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add ai/dla static-copy targets to Vite configs (implementer)

5785f4d

Register /migrate slash command (implementer)

aaa50d3

Add /migrate wrapper skill (implementer)

202302c

Add review 1 (code-reviewer)

ca0b8e1

Document /migrate in CLI README (documentator)

fb85ddc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document DLA integration in CLI design doc (documentator)

c18568a

Add documentation review 1 and PR description (doc-reviewer)

59f380f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSM-1639: research — DLA integration into `studio code`#3277

RSM-1639: research — DLA integration into `studio code`#3277
Poliuk wants to merge 15 commits intotrunkfrom
rsm-1639-dla-integration

Poliuk commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Poliuk commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues

How AI was used in this PR

Proposed Changes

Research artifacts (issues/rsm-1639-dla-integration/)

Implementation (apps/cli/, tools/common/ai/, root scripts/, root package.json, root vitest.config.ts)

Electron-side gotchas surfaced and resolved

Out of scope (consciously deferred)

Testing Instructions

Reviewing the research

Reviewing the implementation

Exercising /migrate end-to-end

Pre-merge Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Poliuk commented Apr 29, 2026 •

edited

Loading

Research artifacts (`issues/rsm-1639-dla-integration/`)

Implementation (`apps/cli/`, `tools/common/ai/`, root `scripts/`, root `package.json`, root `vitest.config.ts`)

Exercising `/migrate` end-to-end