Skip to content

Commit 9ce4fe2

Browse files
authored
Merge pull request #15 from open-gitagent/docs/computeragent-info
docs(computeragent): add info.md — architecture + harness protocol tour
2 parents f91d254 + 66c34c9 commit 9ce4fe2

1 file changed

Lines changed: 352 additions & 0 deletions

File tree

packages/computeragent/info.md

Lines changed: 352 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
# info — architecture & non-obvious bits
2+
3+
A guided tour of the parts of ComputerAgent that read interesting in a slide deck, a talk, or a HN comment. Sibling to `README.md` (which is the "how do I use this" intro). This file is the "what's actually going on under the hood" companion.
4+
5+
---
6+
7+
## TL;DR — four orthogonal ports
8+
9+
ComputerAgent decomposes the agent stack into four pluggable axes. Every axis is one TypeScript interface; you swap any one without touching the others.
10+
11+
```
12+
┌───────────────────────────────────┐
13+
│ ComputerAgent │
14+
│ (one constructor call) │
15+
└──────────────┬────────────────────┘
16+
17+
┌──────────────┬───────────────┼───────────────┬───────────────┐
18+
│ │ │ │ │
19+
▼ ▼ ▼ ▼ ▼
20+
WHAT HOW WHERE REMEMBER AUDIT
21+
IdentityLoader EngineDriver Substrate SessionStore AuditSink
22+
(agent) (loop) (sandbox) (memory) (telemetry)
23+
24+
GAP git repo | claude-agent-sdk | Local | in-memory | Mongo
25+
inline yaml | deepagents | Bwrap | file/jsonl | OTel + ClickHouse
26+
local folder | gitagent | E2B | Mongo | Honeycomb / Datadog
27+
VZ/Tart | SQLite | console
28+
```
29+
30+
Five interfaces. The fifth — `AuditSink` — sits on top of the SDK rather than inside ComputerAgent's constructor (it's wired explicitly by callers that want telemetry), but it's the same shape: one method, one swap.
31+
32+
---
33+
34+
## 1. Git URL is the agent identity
35+
36+
Most agent frameworks invent a registry (UUIDs, names, versions). ComputerAgent collapses that:
37+
38+
```ts
39+
new ComputerAgent({
40+
source: { type: "git", url: "github.com/acme/triage-agent" }
41+
})
42+
```
43+
44+
The git URL **is** the canonical name. Versioning is `?ref=v1.2` or a commit SHA. Discovery is `git clone`. The Mongo `agent_registry` is a cache + telemetry index — **not** the source of truth. You can delete the entire registry and re-create it by running agents.
45+
46+
Implication: agents share an identity across every machine that runs them. The same git URL fired from a customer's Temporal worker and from your laptop writes to the **same** `agent_logs` document. Cross-machine deduplication, free.
47+
48+
---
49+
50+
## 2. Substrate-agnostic agent code
51+
52+
The agent doesn't know — or care — where it runs:
53+
54+
| Substrate | What it actually is | Use when |
55+
|---|---|---|
56+
| `LocalSubstrate` | A subprocess on the same host | dev, library-mode (in someone's existing worker) |
57+
| `BwrapSubstrate` | Linux user-namespaces (bubblewrap) | "isolation without containers" — fast, ~ms startup |
58+
| `E2BSubstrate` | Firecracker microVM in the cloud | strong isolation, untrusted code |
59+
| `VZSubstrate` | Apple VZ.framework via Tart | macOS-native VM, full OS + persistent disk |
60+
61+
```ts
62+
new ComputerAgent({
63+
source: { type: "git", url: "..." },
64+
runtime: new LocalSubstrate(), // ← only the deploy story changes
65+
});
66+
```
67+
68+
You change one constructor arg. Not the agent. Not the harness. Not the tools. There's a **substrate × source × engine matrix test** that fires every cell of the grid — adding a new substrate adds one column, not three months of edge-case chasing.
69+
70+
---
71+
72+
## 3. Harness protocol — the layer most frameworks don't have
73+
74+
Between "the SDK calling Anthropic" and "the substrate running it" there's a **harness** boundary. It's a tiny HTTP server (Hono on Bun/Node) speaking SSE + plain JSON, and it's the thing that makes claude-agent-sdk, gitagent, and deepagents fungible.
75+
76+
```
77+
Client (SDK) Harness Engine
78+
│ │ │
79+
│ POST /v1/sessions │ │
80+
│ { source, harness, runtime } │ │
81+
│ ─────────────────────────────▶ │ │
82+
│ │ EngineDriver.startSession
83+
│ │ ──────────────────────▶ │
84+
│ │ │
85+
│ Content-Type: text/event-stream │
86+
│ ◀───────────────────────────── │ │
87+
│ event: ca_session_started │ │
88+
│ data: { sessionId, engine } │ ◀─── EngineEvent stream │
89+
│ │ │
90+
│ event: sdk_message │ ◀── { type: "assistant" }│
91+
│ event: ca_permission_request │ │
92+
│ POST /v1/sessions/:id/permission/:callId │
93+
│ { decision: "allow" } │ │
94+
│ ─────────────────────────────▶ │ │
95+
│ │ │
96+
│ event: ca_usage_snapshot │ │
97+
│ event: ca_session_ended │ │
98+
│ ◀───────────────────────────── │ │
99+
```
100+
101+
The wire is documented under `packages/protocol/src/` and verified by a Zod-schema test suite (`harness-rest.test.ts`, `sse-events.test.ts`). `curl` can drive every endpoint. No proprietary RPC.
102+
103+
### Why a separate harness process?
104+
105+
Three reasons that compound:
106+
107+
1. **Engine portability.** claude-agent-sdk wants `$HOME/.claude/projects/*.jsonl`. gitclaw wants `$GITCLAW_MODEL_BASE_URL`. deepagents is built on LangChain. Wrapping each in a uniform `EngineDriver` interface and putting them all behind one HTTP shape means the client SDK never speaks engine-specific dialects.
108+
109+
2. **Substrate boundary == process boundary.** When you swap from `LocalSubstrate` to `E2BSubstrate`, the harness moves to a different machine. Same wire protocol, different physical location. Your SDK code doesn't notice.
110+
111+
3. **Resumability.** Every SSE event has a monotonic `id`. If the client disconnects, it reconnects with `Last-Event-ID: <last-id>` and the harness server replays from a per-session ring buffer (default: last 1,000 events or 5 minutes). Critical when running over flaky networks.
112+
113+
### Harness events (the wire protocol)
114+
115+
```ts
116+
type HarnessEvent =
117+
| { kind: "ca_session_started"; sessionId; engine; identity; capabilities }
118+
| { kind: "sdk_message"; sessionId; payload } // engine-native
119+
| { kind: "ca_permission_request"; sessionId; callId; toolName; input; risk }
120+
| { kind: "ca_permission_decision";sessionId; callId; decision; reason? }
121+
| { kind: "ca_turn_started"; sessionId; userTextLen? }
122+
| { kind: "ca_usage_snapshot"; sessionId; inputTokens?; outputTokens?;
123+
costUsd?; costSemantic? } // see §6
124+
| { kind: "ca_session_ended"; sessionId; reason; errorMessage? };
125+
```
126+
127+
`sdk_message.payload` is **opaque** — it's whatever the engine's native message shape is. The client SDK doesn't try to normalize it; the engine knows how to emit, the consumer knows how to consume.
128+
129+
---
130+
131+
## 4. AuditSink — telemetry as a protocol
132+
133+
There's no logger interface and no metrics interface. There's `AuditSink`:
134+
135+
```ts
136+
interface AuditSink {
137+
emit(event: AgentEvent): Promise<void> | void;
138+
}
139+
```
140+
141+
One method. Plug in any of:
142+
143+
- `MongoTelemetry` — persists turn history to `agent_registry` + `agent_logs`
144+
- `OtelAuditSink` — emits `gen_ai.*` OpenTelemetry spans → OTLP → ClickHouse / Datadog / Honeycomb / your APM
145+
- `console` — dev
146+
- Chain them: `[mongoSink, otelSink, consoleSink]` — the SDK fires `emit()` on each, fire-and-forget
147+
148+
We were early adopters of the **OpenTelemetry `gen_ai.*` semantic conventions**`gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.response.cost_usd`. So your existing Grafana board built for OTel renders agent traffic out of the box.
149+
150+
> AuditSink is fire-and-forget by contract. The SDK catches thrown errors and never propagates them up. Telemetry must never break an agent run.
151+
152+
---
153+
154+
## 5. Library-mode vs server-mode
155+
156+
Most agent platforms force you into their server. ComputerAgent has **two equally first-class modes**:
157+
158+
```
159+
server-mode library-mode
160+
──────────── ────────────
161+
your customers ──→ AgentOS UI your existing worker
162+
──→ computeragent-server imports `computeragent`
163+
──→ harness imports it
164+
──→ Anthropic imports it
165+
└→ harness ──→ Anthropic
166+
167+
(new pods, new auth, new ingress) (zero new infra)
168+
```
169+
170+
For customers who already run Temporal / Airflow / their own job runner, library-mode means **no new pods, no new auth surface, no new ingress** — their existing worker becomes the agent runner. The de-risk spike (`spike/temporal-k8s-localsubstrate/REPORT.md`) demonstrates 7.3s end-to-end Claude turn from inside a Temporal activity in a K8s pod with no `Service`, no `Ingress`, no new RBAC.
171+
172+
---
173+
174+
## 6. Cost semantics — the subtle bit
175+
176+
`ChatHandle` aggregates per-message usage snapshots into a single `ChatResult.usage`. Tokens always SUM. Cost depends on the **engine's `costSemantic`**:
177+
178+
| Semantic | Engine | Aggregation |
179+
|---|---|---|
180+
| `cumulative` | claude-agent-sdk | take the **MAX** value seen (each snapshot is a running total) |
181+
| `delta` | gitclaw | **SUM** per-message deltas |
182+
| `undefined` | legacy | treat as cumulative (safe — never double-count) |
183+
| mixed (defensive) | hypothetical chained engines | prefer cumulative |
184+
185+
This is the kind of invariant that is easy to get subtly wrong with no live harness — so it's nailed down by 7 dedicated unit tests in `packages/sdk/src/chat-handle.test.ts`.
186+
187+
---
188+
189+
## 7. JSONL session replay (auditor-friendly by accident)
190+
191+
claude-agent-sdk persists each session as a JSONL file in `~/.claude/projects/<encoded>/<session-id>.jsonl`. Append-only, plain text, one event per line. We didn't invent this — but two things fall out for free:
192+
193+
- **Resumable across crashes** — restart the worker, replay the JSONL, continue
194+
- **Audit trail with no extra plumbing**`grep`, `jq`, ship to S3. Compliance team smiles.
195+
196+
The dashboard reads these directly when you click into a session — no proprietary log store.
197+
198+
---
199+
200+
## 8. SessionStore — swappable conversation memory
201+
202+
Replace `agent.sessionStore` with one constructor arg:
203+
204+
| Kind | Backend | Use |
205+
|---|---|---|
206+
| `"memory"` | in-process map | dev / tests |
207+
| `"file"` | JSONL on disk | local persistence, no infra |
208+
| `"mongo"` | MongoDB collection | shared memory across worker pods |
209+
| `"sqlite"` | local SQLite file | embedded, queryable, fast |
210+
211+
```ts
212+
new ComputerAgent({
213+
source: { type: "git", url: "..." },
214+
sessionStore: { kind: "mongo", options: { url: MONGO_URL, database: "agentos" } },
215+
});
216+
```
217+
218+
Same SDK call. The engine doesn't know which backend is in play. **Resume across process restart, host change, substrate teardown** is built-in — not a per-integration manual replay job.
219+
220+
---
221+
222+
## 9. IRSA, no static AWS keys
223+
224+
For Bedrock, every other framework's instructions tell you to set `AWS_ACCESS_KEY_ID` in the pod env. We refuse to do that.
225+
226+
Instead, the pod's ServiceAccount has an `eks.amazonaws.com/role-arn` annotation. The AWS SDK's default-credential-chain finds `AWS_ROLE_ARN` + `AWS_WEB_IDENTITY_TOKEN_FILE` (auto-injected by the EKS pod-identity webhook), assumes the role, and Bedrock calls just work.
227+
228+
The harness explicitly allow-lists those env vars from the host process to the engine subprocess (see `engine-claude-agent-sdk/src/engine.ts:inheritEssentialHostEnv`). The 9 keys it passes:
229+
230+
```
231+
CLAUDE_CODE_USE_BEDROCK
232+
AWS_REGION
233+
AWS_DEFAULT_REGION
234+
AWS_BEDROCK_MODEL_ID
235+
AWS_ROLE_ARN ← IRSA-injected
236+
AWS_WEB_IDENTITY_TOKEN_FILE ← IRSA-injected
237+
AWS_PROFILE
238+
AWS_SHARED_CREDENTIALS_FILE
239+
AWS_CONFIG_FILE
240+
```
241+
242+
Empirically verified in the spike: `bedrock-2023-05-31` invoke against Claude Haiku 4.5 in us-east-2, 7.3s, $0.035, `is_error: false`. No static keys anywhere in the cluster.
243+
244+
---
245+
246+
## 10. Permission protocol — every tool call is auditable
247+
248+
Every `Bash`, `Read`, `Edit` call by an agent goes through a permission check that emits a `ca_permission_decision` event. This event includes:
249+
250+
- the tool name
251+
- the tool arguments (`Bash` command, `Read` path)
252+
- the decision (`allow` / `deny` / `ask`)
253+
- *why* (the matching policy rule, if any)
254+
255+
```
256+
engine harness client (or policy decider)
257+
│ │ │
258+
│ permission_request │ │
259+
│ ──────────────────────▶│ │
260+
│ │ ca_permission_request │
261+
│ │ (SSE event) │
262+
│ │ ─────────────────────────▶ │
263+
│ │ │
264+
│ │ POST /permission/:callId │
265+
│ │ { decision: "allow" } │
266+
│ │ ◀──────────────────────────┤
267+
│ PermissionResult │ │
268+
│ ◀──────────────────────┤ │
269+
│ │ ca_permission_decision │
270+
│ │ → AuditSink │
271+
```
272+
273+
The harness can short-circuit: if there's an in-process `PolicyDecider` (Cedar/OPA via SRS), the harness resolves the decision without a client round-trip. Same wire event still flows to `AuditSink` for the audit trail.
274+
275+
Pipe `ca_permission_decision` events into your SIEM and you have full audit-replay for every agent action.
276+
277+
---
278+
279+
## 11. Conformance suite for third-party plug-ins
280+
281+
`@computeragent/testing` exports a **table-driven conformance suite** that any third-party `EngineDriver` / `Substrate` / `SessionStore` implementation can run against itself:
282+
283+
```ts
284+
import { runEngineConformance } from "@computeragent/testing";
285+
286+
runEngineConformance(myCustomEngine, {
287+
capabilities: { streamingInput: true, permissionCallback: true, /**/ },
288+
});
289+
```
290+
291+
The suite asserts: engine emits the right events in the right order, respects abort signals, surfaces tool calls through the permission protocol, doesn't crash on empty input. About 30 invariants. Plug-in authors discover protocol violations at `vitest run`, not in production.
292+
293+
---
294+
295+
## 12. OTLP everywhere, vendor nowhere
296+
297+
The harness exports OTel via plain `OTEL_EXPORTER_OTLP_ENDPOINT`. That's it. The harness doesn't know:
298+
299+
- ❌ "We use Datadog"
300+
- ❌ "We use ClickHouse"
301+
- ❌ "We use Honeycomb"
302+
303+
It knows: "POST traces to this URL." An OTel Collector sitting next to it does the demux. Your vendor of choice is a collector config away — no recompilation, no harness restart, no new code path.
304+
305+
---
306+
307+
## End-to-end flow — a single chat turn
308+
309+
The pieces above tied together, for one `agent.chat("hello")` call against a remote E2B substrate:
310+
311+
```
312+
1. agent.chat("hello")
313+
314+
▼ POST {harnessUrl}/v1/sessions
315+
┌──────────────────────────────────┐
316+
│ Substrate (E2B microVM, remote) │
317+
│ ┌────────────────────────────┐ │
318+
│ │ Harness server (Hono) │ │
319+
│ │ ┌──────────────────────┐ │ │
320+
│ │ │ EngineDriver │ │ │ 2. starts session
321+
│ │ │ (claude-agent-sdk) │ │ │ 3. invokes Claude API
322+
│ │ │ + AuditSink chain │ │ │
323+
│ │ └─────┬────────────────┘ │ │
324+
│ │ │ │ │
325+
│ └─────────┼───────────────────┘ │
326+
└────────────┼─────────────────────┘
327+
328+
▼ SSE: ca_session_started, sdk_message, ca_usage_snapshot, ca_session_ended
329+
┌────────────────────┐
330+
│ ChatHandle │ 5. yields raw events as `for await of handle`
331+
│ (client SDK) │ 6. drains to ChatResult on `await handle`
332+
└────┬───────────────┘
333+
334+
├─→ MongoTelemetry (agent_logs row)
335+
└─→ OtelAuditSink (gen_ai.* spans → OTel Collector → ClickHouse)
336+
337+
4. Engine fires AuditSink.emit() on every event, fire-and-forget.
338+
```
339+
340+
The interesting part is how little of this the **agent code** has to know. The agent's `agent.yaml` + `SOUL.md` files (its GAP manifest) describe what it does. ComputerAgent figures out where to run it, who tracks it, and how its output gets to the dashboard.
341+
342+
---
343+
344+
## See also
345+
346+
- [`README.md`](README.md) — install + quickstart
347+
- [`packages/protocol/`](../protocol/) — the wire-protocol schemas, Zod-validated
348+
- [`packages/sdk/src/chat-handle.ts`](../sdk/src/chat-handle.ts) — the client-side stream wrapper covered in §6
349+
- [`packages/engine-claude-agent-sdk/`](../engine-claude-agent-sdk/) — the reference `EngineDriver` implementation
350+
- [`packages/harness-server/`](../harness-server/) — the Hono server that hosts engines + substrates
351+
- [`@open-gitagent/agent-registry-mongo`](../agent-registry-mongo/) — the first-class `MongoTelemetry` + `AuditSink` impl
352+
- [`spike/temporal-k8s-localsubstrate/REPORT.md`](https://github.com/open-gitagent/enterprise-computeragent/blob/main/spike/temporal-k8s-localsubstrate/REPORT.md) — library-mode under Temporal + K8s (de-risk spike, runs live)

0 commit comments

Comments
 (0)