Skip to content

Commit b31b91c

Browse files
kapaleshreyasclaude
andcommitted
fix: multi-turn agent.chat() replay (#2) + example FS sweep dump (#3)
#2 — SDK: each new agent.chat() opened a fresh SSE without Last-Event-ID, so the replay buffer re-emitted turn 1's events forever. Fixed by: - always creating sessions in streaming-input mode (engine stays alive) - always pushing user messages via /messages (never via createSession body) - passing Last-Event-ID on every /events open so already-seen events are skipped - terminating each chat handle's iterable on the turn's terminal result SDK message (synthesizing a ca_session_ended for drain()) - POSTing /end-input on dispose so the engine drains cleanly Adds a regression test that two sequential chat()s produce distinct results. #3 — examples/marketing-agent.ts: the FS sweep at the end vacuumed the materialized GAP repo (32 SKILL.md files + .git internals + agent.yaml + SOUL.md + RULES.md + README.md = ~50 stray files) into the user's project dir. Fixed by snapshotting the workdir tree right after turn 1 and only fetching files that are new or modified relative to the snapshot. Also made saveTurnOutput skip empty bodies so failed turns don't leave header- only stubs. Verified end-to-end: all 4 turns produce distinct deliverables, the agent demonstrates real multi-turn memory (turn 4 references the Builder/Scale/ Enterprise tiers it designed in turn 3), and the final outputs dir contains only the 3 deliverable .md files plus the SessionStore JSONL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a5b691b commit b31b91c

5 files changed

Lines changed: 325 additions & 74 deletions

File tree

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
"@computeragent/sdk": patch
3+
---
4+
5+
Fix multi-turn `agent.chat()` replaying first-turn events (issue #2).
6+
7+
Sequential `agent.chat()` calls on the same agent now produce distinct
8+
responses. Previously the second `chat()` would re-yield the first turn's
9+
events from the SSE replay buffer, so every turn returned the same answer.
10+
11+
The fix:
12+
- Sessions are always created in streaming-input mode internally so the
13+
engine stays alive across turns
14+
- Every `chat()` pushes via `/messages` (never via the createSession body)
15+
- `/events` is opened with `Last-Event-ID: <highest seen>` so the replay
16+
buffer skips events the SDK already saw
17+
- Each chat handle's iterable terminates on the turn's `result` SDK message
18+
(synthesizing a `ca_session_ended` for `ChatHandle.drain()`)
19+
- `dispose()` POSTs `/end-input` so the engine drains cleanly
20+
21+
`consumeSseEvents` now yields `{ id?, event }` envelopes instead of bare
22+
events — the only in-package caller (`ComputerAgent.openTurnEventStream`)
23+
uses the id for `Last-Event-ID` tracking. Other consumers were not affected
24+
because the function was internal.

examples/marketing-agent.ts

Lines changed: 102 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -62,11 +62,14 @@ function yellow(s: string): string {
6262
async function streamTurn(
6363
agent: ComputerAgent,
6464
message: string,
65-
): Promise<{ sessionId: string; result: string; harnessUrl: string }> {
65+
): Promise<{ sessionId: string; result: string; harnessUrl: string; ok: boolean }> {
6666
console.log(`\n${yellow("You:")} ${message}\n`);
6767

6868
let result = "";
69+
let assistantText = "";
6970
let sessionId = "";
71+
let ok = true;
72+
let endError: string | undefined;
7073

7174
const handle = agent.chat(message);
7275

@@ -82,15 +85,14 @@ async function streamTurn(
8285
for (const block of msg?.content ?? []) {
8386
const b = block as { type?: string; text?: string; name?: string; input?: unknown };
8487
if (b.type === "text" && b.text) {
85-
// Stream text progressively — print in chunks
8688
process.stdout.write(b.text);
89+
assistantText += b.text;
8790
} else if (b.type === "tool_use") {
8891
const input = JSON.stringify(b.input ?? {});
8992
console.log(`\n${dim(` → ${b.name}(${input.slice(0, 120)}${input.length > 120 ? "…" : ""})`)}`);
9093
}
9194
}
9295
} else if (p.type === "tool") {
93-
// tool result
9496
const content = typeof p.content === "string" ? p.content : JSON.stringify(p.content);
9597
console.log(dim(` ← ${content.slice(0, 200)}${content.length > 200 ? "…" : ""}`));
9698
} else if (p.type === "result" && typeof p.result === "string") {
@@ -102,12 +104,43 @@ async function streamTurn(
102104
console.log(`\n${dim(`[usage: in=${u.input_tokens} out=${u.output_tokens} cost=$${u.cost_usd?.toFixed(4)}]`)}`);
103105
}
104106
} else if (ev.kind === "ca_session_ended") {
105-
console.log(`\n${dim(`[ended: ${ev.reason}]`)}`);
107+
if (ev.reason !== "complete") {
108+
ok = false;
109+
endError = ev.errorMessage ?? ev.reason;
110+
}
111+
console.log(`\n${dim(`[ended: ${ev.reason}${ev.errorMessage ? ` — ${ev.errorMessage}` : ""}]`)}`);
106112
}
107113
}
108114

115+
// Fall back to assembled assistant text if there was no terminal `result`.
116+
// Helps when the engine errors mid-turn — we still surface what came through.
117+
if (!result && assistantText) result = assistantText;
118+
109119
const harnessUrl = await agent.harnessUrl();
110-
return { sessionId, result, harnessUrl };
120+
if (!ok) {
121+
console.log(`\n${dim(`[turn failed: ${endError ?? "unknown"} — saved files may be empty/partial]`)}`);
122+
}
123+
return { sessionId, result, harnessUrl, ok };
124+
}
125+
126+
interface FsEntry {
127+
path: string;
128+
type: string;
129+
size: number;
130+
}
131+
132+
/** Snapshot of the workdir tree at a point in time. Used to diff at session end. */
133+
async function snapshotWorkdir(harnessUrl: string, sessionId: string): Promise<Map<string, FsEntry>> {
134+
const out = new Map<string, FsEntry>();
135+
try {
136+
const res = await fetch(`${harnessUrl}/v1/sessions/${sessionId}/fs/tree?depth=10`);
137+
if (!res.ok) return out;
138+
const tree = (await res.json()) as { entries: FsEntry[] };
139+
for (const e of tree.entries) out.set(e.path, e);
140+
} catch {
141+
// FS API unavailable — return empty snapshot; downstream diff will skip.
142+
}
143+
return out;
111144
}
112145

113146
async function saveOutput(filename: string, content: string): Promise<void> {
@@ -116,6 +149,21 @@ async function saveOutput(filename: string, content: string): Promise<void> {
116149
console.log(`\n${green("✓")} Saved → ${path}`);
117150
}
118151

152+
async function saveTurnOutput(
153+
filename: string,
154+
header: string,
155+
body: string,
156+
): Promise<void> {
157+
if (!body.trim()) {
158+
console.log(`\n${dim(`(skipped ${filename} — empty response from agent)`)}`);
159+
return;
160+
}
161+
await saveOutput(
162+
filename,
163+
`# ${header}\n\nGenerated by marketing-agent via ComputerAgent\n\n---\n\n${body}\n`,
164+
);
165+
}
166+
119167
async function fetchAgentFile(harnessUrl: string, sessionId: string, path: string): Promise<string | null> {
120168
try {
121169
const res = await fetch(`${harnessUrl}/v1/sessions/${sessionId}/fs/file?path=${encodeURIComponent(path)}`);
@@ -157,12 +205,19 @@ await using agent = new ComputerAgent({
157205
...(priorSessionId ? { sessionId: priorSessionId } : {}),
158206
});
159207

208+
// Pre-session-start snapshot is empty by definition (we don't have a sessionId
209+
// yet). The snapshot is taken right AFTER turn 1 — at that point the harness
210+
// has materialized the GAP repo into the workdir but the agent hasn't yet
211+
// produced any deliverable files. Anything that appears after this snapshot
212+
// is something the AGENT wrote (issue #3 fix).
213+
let workdirBaseline: Map<string, FsEntry> = new Map();
214+
160215
// ── Turn 1: Product context ───────────────────────────────────────────────────
161216

162217
if (!RESUME) {
163218
hr("Turn 1 — Product Context");
164219

165-
const { sessionId } = await streamTurn(
220+
const { sessionId, harnessUrl } = await streamTurn(
166221
agent,
167222
`Here is our product context. Please confirm you've loaded it and identify the top 3
168223
marketing challenges you'd recommend we tackle first.
@@ -176,6 +231,10 @@ Current channels: mostly outbound sales, some inbound content, limited PLG motio
176231
Top conversion barrier: enterprises want a PoC before committing — long evaluation cycles (60-90 days).`,
177232
);
178233

234+
// Snapshot once — captures the materialized GAP repo (agent.yaml, SOUL.md,
235+
// RULES.md, .claude/skills/**, .git/**) so the final FS sweep can exclude it.
236+
workdirBaseline = await snapshotWorkdir(harnessUrl, sessionId);
237+
179238
await writeFile(SESSION_FILE, sessionId, "utf8");
180239
console.log(`\nSession ID saved → ${SESSION_FILE}`);
181240
}
@@ -192,9 +251,10 @@ Angle: they are likely already using LangChain or CrewAI and hitting reliability
192251
Format each email with: Subject line, Body, Send timing.`,
193252
);
194253

195-
await saveOutput(
254+
await saveTurnOutput(
196255
"cold-email-sequence.md",
197-
`# Cold Email Sequence — VP Engineering @ Series B+ SaaS\n\nGenerated by marketing-agent via ComputerAgent\n\n---\n\n${emailResult}`,
256+
"Cold Email Sequence — VP Engineering @ Series B+ SaaS",
257+
emailResult,
198258
);
199259

200260
// ── Turn 3: Pricing strategy ──────────────────────────────────────────────────
@@ -213,10 +273,7 @@ const { result: pricingResult } = await streamTurn(
213273
- A/B test ideas for the pricing page`,
214274
);
215275

216-
await saveOutput(
217-
"pricing-strategy.md",
218-
`# Pricing Strategy — Lyzr AI\n\nGenerated by marketing-agent via ComputerAgent\n\n---\n\n${pricingResult}`,
219-
);
276+
await saveTurnOutput("pricing-strategy.md", "Pricing Strategy — Lyzr AI", pricingResult);
220277

221278
// ── Turn 4: Launch strategy for self-serve tier ───────────────────────────────
222279

@@ -231,40 +288,47 @@ Include: Pre-launch checklist, launch day playbook, week-by-week activation plan
231288
success metrics, and the top 3 risks with mitigations.`,
232289
);
233290

234-
await saveOutput(
291+
await saveTurnOutput(
235292
"launch-strategy.md",
236-
`# Self-Serve Launch Strategy — 30-Day Plan\n\nGenerated by marketing-agent via ComputerAgent\n\n---\n\n${launchResult}`,
293+
"Self-Serve Launch Strategy — 30-Day Plan",
294+
launchResult,
237295
);
238296

239-
// ── Bonus: check if agent wrote any files to its workdir ─────────────────────
297+
// ── Bonus: capture files the AGENT wrote (not the materialized GAP repo) ─────
298+
//
299+
// Diffs the current workdir against the baseline snapshot taken after turn 1.
300+
// Anything new (or grown) is something the agent itself produced via its
301+
// Write/Bash tools. The materialized GAP repo (agent.yaml, SKILL.md files,
302+
// .git/**) is excluded. See issue #3 for context.
240303

241304
hr("Harness Filesystem");
242305

243-
try {
244-
const treeRes = await fetch(
245-
`${harnessUrl}/v1/sessions/${finalSession}/fs/tree?depth=2`,
246-
);
247-
if (treeRes.ok) {
248-
const tree = (await treeRes.json()) as { entries: { path: string; type: string; size: number }[] };
249-
const interesting = tree.entries.filter((e) => e.path !== "/" && !e.path.startsWith("/."));
250-
if (interesting.length > 0) {
251-
console.log("\nFiles the agent wrote to its workspace:");
252-
for (const e of interesting) {
253-
console.log(` ${e.type.padEnd(4)} ${e.path.padEnd(50)} ${e.size}b`);
254-
if (e.type === "file") {
255-
const content = await fetchAgentFile(harnessUrl, finalSession, e.path);
256-
if (content) {
257-
const outName = e.path.replace(/^\//, "").replace(/\//g, "-");
258-
await saveOutput(`agent-workdir-${outName}`, content);
259-
}
260-
}
261-
}
262-
} else {
263-
console.log(dim(" (agent workdir is empty — all output was inline text)"));
306+
const finalTree = await snapshotWorkdir(harnessUrl, finalSession);
307+
const newFiles: FsEntry[] = [];
308+
for (const [path, entry] of finalTree) {
309+
if (entry.type !== "file") continue;
310+
const before = workdirBaseline.get(path);
311+
if (!before) {
312+
newFiles.push(entry); // brand new file
313+
} else if (before.size !== entry.size) {
314+
newFiles.push(entry); // existed before but was modified
315+
}
316+
}
317+
318+
if (newFiles.length === 0) {
319+
console.log(dim(" (agent didn't write any files to its workspace — all output was inline text)"));
320+
console.log(dim(" Note: the materialized GAP repo is excluded from this view."));
321+
} else {
322+
console.log("\nFiles the agent wrote during this session:");
323+
for (const e of newFiles) {
324+
console.log(` file ${e.path.padEnd(50)} ${e.size}b`);
325+
const content = await fetchAgentFile(harnessUrl, finalSession, e.path);
326+
if (content) {
327+
// Drop leading slash; preserve subdirectory structure with "/" → "_"
328+
const outName = e.path.replace(/^\//, "").replace(/\//g, "_");
329+
await saveOutput(outName, content);
264330
}
265331
}
266-
} catch {
267-
// FS API optional — harness may not expose it in all configurations
268332
}
269333

270334
// ── Summary ───────────────────────────────────────────────────────────────────

packages/sdk/src/computer-agent.test.ts

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,47 @@ describe("ComputerAgent — multi-turn", () => {
157157
expect(first.sessionId).toMatch(/^sess_/);
158158
expect(agent.sessionId).toBe(first.sessionId);
159159
});
160+
161+
it("two sequential .chat() calls produce distinct responses (issue #2)", async () => {
162+
// Two turns scripted in one engine session. The engine waits for each
163+
// user message in turn, then emits a result. If the SDK's multi-turn
164+
// wiring is right, turn 2's response must be "response-2", not the
165+
// replayed "response-1" from turn 1.
166+
const engine = new MockEngine([
167+
{ kind: "wait_for_user_message" },
168+
{ kind: "emit", payload: { type: "result", text: "response-1" } },
169+
{ kind: "wait_for_user_message" },
170+
{ kind: "emit", payload: { type: "result", text: "response-2" } },
171+
]);
172+
serverHandle = await bootServer(engine);
173+
174+
const agent = new ComputerAgent({
175+
source: { type: "local", path: "/tmp" },
176+
harness: "mock",
177+
identityLoader: "mock",
178+
harnessUrl: serverHandle.url,
179+
});
180+
181+
const r1 = await agent.chat("turn 1");
182+
const r1Text = (r1.messages.find(
183+
(m): m is { type: "result"; text: string } =>
184+
(m as { type?: string }).type === "result",
185+
))?.text;
186+
expect(r1Text).toBe("response-1");
187+
188+
const r2 = await agent.chat("turn 2");
189+
const r2Text = (r2.messages.find(
190+
(m): m is { type: "result"; text: string } =>
191+
(m as { type?: string }).type === "result",
192+
))?.text;
193+
expect(r2Text).toBe("response-2");
194+
195+
// Same session across both turns
196+
expect(r1.sessionId).toBe(r2.sessionId);
197+
198+
// Engine actually saw both user messages
199+
expect(engine.received.userMessages).toHaveLength(2);
200+
});
160201
});
161202

162203
describe("ComputerAgent — Substrate runtime", () => {

0 commit comments

Comments
 (0)