|
| 1 | +# info — architecture & non-obvious bits |
| 2 | + |
| 3 | +A guided tour of the parts of ComputerAgent that read interesting in a slide deck, a talk, or a HN comment. Sibling to `README.md` (which is the "how do I use this" intro). This file is the "what's actually going on under the hood" companion. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## TL;DR — four orthogonal ports |
| 8 | + |
| 9 | +ComputerAgent decomposes the agent stack into four pluggable axes. Every axis is one TypeScript interface; you swap any one without touching the others. |
| 10 | + |
| 11 | +``` |
| 12 | + ┌───────────────────────────────────┐ |
| 13 | + │ ComputerAgent │ |
| 14 | + │ (one constructor call) │ |
| 15 | + └──────────────┬────────────────────┘ |
| 16 | + │ |
| 17 | + ┌──────────────┬───────────────┼───────────────┬───────────────┐ |
| 18 | + │ │ │ │ │ |
| 19 | + ▼ ▼ ▼ ▼ ▼ |
| 20 | + WHAT HOW WHERE REMEMBER AUDIT |
| 21 | + IdentityLoader EngineDriver Substrate SessionStore AuditSink |
| 22 | + (agent) (loop) (sandbox) (memory) (telemetry) |
| 23 | +
|
| 24 | + GAP git repo | claude-agent-sdk | Local | in-memory | Mongo |
| 25 | + inline yaml | deepagents | Bwrap | file/jsonl | OTel + ClickHouse |
| 26 | + local folder | gitagent | E2B | Mongo | Honeycomb / Datadog |
| 27 | + VZ/Tart | SQLite | console |
| 28 | +``` |
| 29 | + |
| 30 | +Five interfaces. The fifth — `AuditSink` — sits on top of the SDK rather than inside ComputerAgent's constructor (it's wired explicitly by callers that want telemetry), but it's the same shape: one method, one swap. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## 1. Git URL is the agent identity |
| 35 | + |
| 36 | +Most agent frameworks invent a registry (UUIDs, names, versions). ComputerAgent collapses that: |
| 37 | + |
| 38 | +```ts |
| 39 | +new ComputerAgent({ |
| 40 | + source: { type: "git", url: "github.com/acme/triage-agent" } |
| 41 | +}) |
| 42 | +``` |
| 43 | + |
| 44 | +The git URL **is** the canonical name. Versioning is `?ref=v1.2` or a commit SHA. Discovery is `git clone`. The Mongo `agent_registry` is a cache + telemetry index — **not** the source of truth. You can delete the entire registry and re-create it by running agents. |
| 45 | + |
| 46 | +Implication: agents share an identity across every machine that runs them. The same git URL fired from a customer's Temporal worker and from your laptop writes to the **same** `agent_logs` document. Cross-machine deduplication, free. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## 2. Substrate-agnostic agent code |
| 51 | + |
| 52 | +The agent doesn't know — or care — where it runs: |
| 53 | + |
| 54 | +| Substrate | What it actually is | Use when | |
| 55 | +|---|---|---| |
| 56 | +| `LocalSubstrate` | A subprocess on the same host | dev, library-mode (in someone's existing worker) | |
| 57 | +| `BwrapSubstrate` | Linux user-namespaces (bubblewrap) | "isolation without containers" — fast, ~ms startup | |
| 58 | +| `E2BSubstrate` | Firecracker microVM in the cloud | strong isolation, untrusted code | |
| 59 | +| `VZSubstrate` | Apple VZ.framework via Tart | macOS-native VM, full OS + persistent disk | |
| 60 | + |
| 61 | +```ts |
| 62 | +new ComputerAgent({ |
| 63 | + source: { type: "git", url: "..." }, |
| 64 | + runtime: new LocalSubstrate(), // ← only the deploy story changes |
| 65 | +}); |
| 66 | +``` |
| 67 | + |
| 68 | +You change one constructor arg. Not the agent. Not the harness. Not the tools. There's a **substrate × source × engine matrix test** that fires every cell of the grid — adding a new substrate adds one column, not three months of edge-case chasing. |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## 3. Harness protocol — the layer most frameworks don't have |
| 73 | + |
| 74 | +Between "the SDK calling Anthropic" and "the substrate running it" there's a **harness** boundary. It's a tiny HTTP server (Hono on Bun/Node) speaking SSE + plain JSON, and it's the thing that makes claude-agent-sdk, gitagent, and deepagents fungible. |
| 75 | + |
| 76 | +``` |
| 77 | + Client (SDK) Harness Engine |
| 78 | + │ │ │ |
| 79 | + │ POST /v1/sessions │ │ |
| 80 | + │ { source, harness, runtime } │ │ |
| 81 | + │ ─────────────────────────────▶ │ │ |
| 82 | + │ │ EngineDriver.startSession |
| 83 | + │ │ ──────────────────────▶ │ |
| 84 | + │ │ │ |
| 85 | + │ Content-Type: text/event-stream │ |
| 86 | + │ ◀───────────────────────────── │ │ |
| 87 | + │ event: ca_session_started │ │ |
| 88 | + │ data: { sessionId, engine } │ ◀─── EngineEvent stream │ |
| 89 | + │ │ │ |
| 90 | + │ event: sdk_message │ ◀── { type: "assistant" }│ |
| 91 | + │ event: ca_permission_request │ │ |
| 92 | + │ POST /v1/sessions/:id/permission/:callId │ |
| 93 | + │ { decision: "allow" } │ │ |
| 94 | + │ ─────────────────────────────▶ │ │ |
| 95 | + │ │ │ |
| 96 | + │ event: ca_usage_snapshot │ │ |
| 97 | + │ event: ca_session_ended │ │ |
| 98 | + │ ◀───────────────────────────── │ │ |
| 99 | +``` |
| 100 | + |
| 101 | +The wire is documented under `packages/protocol/src/` and verified by a Zod-schema test suite (`harness-rest.test.ts`, `sse-events.test.ts`). `curl` can drive every endpoint. No proprietary RPC. |
| 102 | + |
| 103 | +### Why a separate harness process? |
| 104 | + |
| 105 | +Three reasons that compound: |
| 106 | + |
| 107 | +1. **Engine portability.** claude-agent-sdk wants `$HOME/.claude/projects/*.jsonl`. gitclaw wants `$GITCLAW_MODEL_BASE_URL`. deepagents is built on LangChain. Wrapping each in a uniform `EngineDriver` interface and putting them all behind one HTTP shape means the client SDK never speaks engine-specific dialects. |
| 108 | + |
| 109 | +2. **Substrate boundary == process boundary.** When you swap from `LocalSubstrate` to `E2BSubstrate`, the harness moves to a different machine. Same wire protocol, different physical location. Your SDK code doesn't notice. |
| 110 | + |
| 111 | +3. **Resumability.** Every SSE event has a monotonic `id`. If the client disconnects, it reconnects with `Last-Event-ID: <last-id>` and the harness server replays from a per-session ring buffer (default: last 1,000 events or 5 minutes). Critical when running over flaky networks. |
| 112 | + |
| 113 | +### Harness events (the wire protocol) |
| 114 | + |
| 115 | +```ts |
| 116 | +type HarnessEvent = |
| 117 | + | { kind: "ca_session_started"; sessionId; engine; identity; capabilities } |
| 118 | + | { kind: "sdk_message"; sessionId; payload } // engine-native |
| 119 | + | { kind: "ca_permission_request"; sessionId; callId; toolName; input; risk } |
| 120 | + | { kind: "ca_permission_decision";sessionId; callId; decision; reason? } |
| 121 | + | { kind: "ca_turn_started"; sessionId; userTextLen? } |
| 122 | + | { kind: "ca_usage_snapshot"; sessionId; inputTokens?; outputTokens?; |
| 123 | + costUsd?; costSemantic? } // see §6 |
| 124 | + | { kind: "ca_session_ended"; sessionId; reason; errorMessage? }; |
| 125 | +``` |
| 126 | + |
| 127 | +`sdk_message.payload` is **opaque** — it's whatever the engine's native message shape is. The client SDK doesn't try to normalize it; the engine knows how to emit, the consumer knows how to consume. |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## 4. AuditSink — telemetry as a protocol |
| 132 | + |
| 133 | +There's no logger interface and no metrics interface. There's `AuditSink`: |
| 134 | + |
| 135 | +```ts |
| 136 | +interface AuditSink { |
| 137 | + emit(event: AgentEvent): Promise<void> | void; |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +One method. Plug in any of: |
| 142 | + |
| 143 | +- `MongoTelemetry` — persists turn history to `agent_registry` + `agent_logs` |
| 144 | +- `OtelAuditSink` — emits `gen_ai.*` OpenTelemetry spans → OTLP → ClickHouse / Datadog / Honeycomb / your APM |
| 145 | +- `console` — dev |
| 146 | +- Chain them: `[mongoSink, otelSink, consoleSink]` — the SDK fires `emit()` on each, fire-and-forget |
| 147 | + |
| 148 | +We were early adopters of the **OpenTelemetry `gen_ai.*` semantic conventions** — `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.response.cost_usd`. So your existing Grafana board built for OTel renders agent traffic out of the box. |
| 149 | + |
| 150 | +> AuditSink is fire-and-forget by contract. The SDK catches thrown errors and never propagates them up. Telemetry must never break an agent run. |
| 151 | +
|
| 152 | +--- |
| 153 | + |
| 154 | +## 5. Library-mode vs server-mode |
| 155 | + |
| 156 | +Most agent platforms force you into their server. ComputerAgent has **two equally first-class modes**: |
| 157 | + |
| 158 | +``` |
| 159 | + server-mode library-mode |
| 160 | + ──────────── ──────────── |
| 161 | + your customers ──→ AgentOS UI your existing worker |
| 162 | + ──→ computeragent-server imports `computeragent` |
| 163 | + ──→ harness imports it |
| 164 | + ──→ Anthropic imports it |
| 165 | + └→ harness ──→ Anthropic |
| 166 | +
|
| 167 | + (new pods, new auth, new ingress) (zero new infra) |
| 168 | +``` |
| 169 | + |
| 170 | +For customers who already run Temporal / Airflow / their own job runner, library-mode means **no new pods, no new auth surface, no new ingress** — their existing worker becomes the agent runner. The de-risk spike (`spike/temporal-k8s-localsubstrate/REPORT.md`) demonstrates 7.3s end-to-end Claude turn from inside a Temporal activity in a K8s pod with no `Service`, no `Ingress`, no new RBAC. |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## 6. Cost semantics — the subtle bit |
| 175 | + |
| 176 | +`ChatHandle` aggregates per-message usage snapshots into a single `ChatResult.usage`. Tokens always SUM. Cost depends on the **engine's `costSemantic`**: |
| 177 | + |
| 178 | +| Semantic | Engine | Aggregation | |
| 179 | +|---|---|---| |
| 180 | +| `cumulative` | claude-agent-sdk | take the **MAX** value seen (each snapshot is a running total) | |
| 181 | +| `delta` | gitclaw | **SUM** per-message deltas | |
| 182 | +| `undefined` | legacy | treat as cumulative (safe — never double-count) | |
| 183 | +| mixed (defensive) | hypothetical chained engines | prefer cumulative | |
| 184 | + |
| 185 | +This is the kind of invariant that is easy to get subtly wrong with no live harness — so it's nailed down by 7 dedicated unit tests in `packages/sdk/src/chat-handle.test.ts`. |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## 7. JSONL session replay (auditor-friendly by accident) |
| 190 | + |
| 191 | +claude-agent-sdk persists each session as a JSONL file in `~/.claude/projects/<encoded>/<session-id>.jsonl`. Append-only, plain text, one event per line. We didn't invent this — but two things fall out for free: |
| 192 | + |
| 193 | +- **Resumable across crashes** — restart the worker, replay the JSONL, continue |
| 194 | +- **Audit trail with no extra plumbing** — `grep`, `jq`, ship to S3. Compliance team smiles. |
| 195 | + |
| 196 | +The dashboard reads these directly when you click into a session — no proprietary log store. |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## 8. SessionStore — swappable conversation memory |
| 201 | + |
| 202 | +Replace `agent.sessionStore` with one constructor arg: |
| 203 | + |
| 204 | +| Kind | Backend | Use | |
| 205 | +|---|---|---| |
| 206 | +| `"memory"` | in-process map | dev / tests | |
| 207 | +| `"file"` | JSONL on disk | local persistence, no infra | |
| 208 | +| `"mongo"` | MongoDB collection | shared memory across worker pods | |
| 209 | +| `"sqlite"` | local SQLite file | embedded, queryable, fast | |
| 210 | + |
| 211 | +```ts |
| 212 | +new ComputerAgent({ |
| 213 | + source: { type: "git", url: "..." }, |
| 214 | + sessionStore: { kind: "mongo", options: { url: MONGO_URL, database: "agentos" } }, |
| 215 | +}); |
| 216 | +``` |
| 217 | + |
| 218 | +Same SDK call. The engine doesn't know which backend is in play. **Resume across process restart, host change, substrate teardown** is built-in — not a per-integration manual replay job. |
| 219 | + |
| 220 | +--- |
| 221 | + |
| 222 | +## 9. IRSA, no static AWS keys |
| 223 | + |
| 224 | +For Bedrock, every other framework's instructions tell you to set `AWS_ACCESS_KEY_ID` in the pod env. We refuse to do that. |
| 225 | + |
| 226 | +Instead, the pod's ServiceAccount has an `eks.amazonaws.com/role-arn` annotation. The AWS SDK's default-credential-chain finds `AWS_ROLE_ARN` + `AWS_WEB_IDENTITY_TOKEN_FILE` (auto-injected by the EKS pod-identity webhook), assumes the role, and Bedrock calls just work. |
| 227 | + |
| 228 | +The harness explicitly allow-lists those env vars from the host process to the engine subprocess (see `engine-claude-agent-sdk/src/engine.ts:inheritEssentialHostEnv`). The 9 keys it passes: |
| 229 | + |
| 230 | +``` |
| 231 | +CLAUDE_CODE_USE_BEDROCK |
| 232 | +AWS_REGION |
| 233 | +AWS_DEFAULT_REGION |
| 234 | +AWS_BEDROCK_MODEL_ID |
| 235 | +AWS_ROLE_ARN ← IRSA-injected |
| 236 | +AWS_WEB_IDENTITY_TOKEN_FILE ← IRSA-injected |
| 237 | +AWS_PROFILE |
| 238 | +AWS_SHARED_CREDENTIALS_FILE |
| 239 | +AWS_CONFIG_FILE |
| 240 | +``` |
| 241 | + |
| 242 | +Empirically verified in the spike: `bedrock-2023-05-31` invoke against Claude Haiku 4.5 in us-east-2, 7.3s, $0.035, `is_error: false`. No static keys anywhere in the cluster. |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## 10. Permission protocol — every tool call is auditable |
| 247 | + |
| 248 | +Every `Bash`, `Read`, `Edit` call by an agent goes through a permission check that emits a `ca_permission_decision` event. This event includes: |
| 249 | + |
| 250 | +- the tool name |
| 251 | +- the tool arguments (`Bash` command, `Read` path) |
| 252 | +- the decision (`allow` / `deny` / `ask`) |
| 253 | +- *why* (the matching policy rule, if any) |
| 254 | + |
| 255 | +``` |
| 256 | + engine harness client (or policy decider) |
| 257 | + │ │ │ |
| 258 | + │ permission_request │ │ |
| 259 | + │ ──────────────────────▶│ │ |
| 260 | + │ │ ca_permission_request │ |
| 261 | + │ │ (SSE event) │ |
| 262 | + │ │ ─────────────────────────▶ │ |
| 263 | + │ │ │ |
| 264 | + │ │ POST /permission/:callId │ |
| 265 | + │ │ { decision: "allow" } │ |
| 266 | + │ │ ◀──────────────────────────┤ |
| 267 | + │ PermissionResult │ │ |
| 268 | + │ ◀──────────────────────┤ │ |
| 269 | + │ │ ca_permission_decision │ |
| 270 | + │ │ → AuditSink │ |
| 271 | +``` |
| 272 | + |
| 273 | +The harness can short-circuit: if there's an in-process `PolicyDecider` (Cedar/OPA via SRS), the harness resolves the decision without a client round-trip. Same wire event still flows to `AuditSink` for the audit trail. |
| 274 | + |
| 275 | +Pipe `ca_permission_decision` events into your SIEM and you have full audit-replay for every agent action. |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +## 11. Conformance suite for third-party plug-ins |
| 280 | + |
| 281 | +`@computeragent/testing` exports a **table-driven conformance suite** that any third-party `EngineDriver` / `Substrate` / `SessionStore` implementation can run against itself: |
| 282 | + |
| 283 | +```ts |
| 284 | +import { runEngineConformance } from "@computeragent/testing"; |
| 285 | + |
| 286 | +runEngineConformance(myCustomEngine, { |
| 287 | + capabilities: { streamingInput: true, permissionCallback: true, /* … */ }, |
| 288 | +}); |
| 289 | +``` |
| 290 | + |
| 291 | +The suite asserts: engine emits the right events in the right order, respects abort signals, surfaces tool calls through the permission protocol, doesn't crash on empty input. About 30 invariants. Plug-in authors discover protocol violations at `vitest run`, not in production. |
| 292 | + |
| 293 | +--- |
| 294 | + |
| 295 | +## 12. OTLP everywhere, vendor nowhere |
| 296 | + |
| 297 | +The harness exports OTel via plain `OTEL_EXPORTER_OTLP_ENDPOINT`. That's it. The harness doesn't know: |
| 298 | + |
| 299 | +- ❌ "We use Datadog" |
| 300 | +- ❌ "We use ClickHouse" |
| 301 | +- ❌ "We use Honeycomb" |
| 302 | + |
| 303 | +It knows: "POST traces to this URL." An OTel Collector sitting next to it does the demux. Your vendor of choice is a collector config away — no recompilation, no harness restart, no new code path. |
| 304 | + |
| 305 | +--- |
| 306 | + |
| 307 | +## End-to-end flow — a single chat turn |
| 308 | + |
| 309 | +The pieces above tied together, for one `agent.chat("hello")` call against a remote E2B substrate: |
| 310 | + |
| 311 | +``` |
| 312 | + 1. agent.chat("hello") |
| 313 | + │ |
| 314 | + ▼ POST {harnessUrl}/v1/sessions |
| 315 | + ┌──────────────────────────────────┐ |
| 316 | + │ Substrate (E2B microVM, remote) │ |
| 317 | + │ ┌────────────────────────────┐ │ |
| 318 | + │ │ Harness server (Hono) │ │ |
| 319 | + │ │ ┌──────────────────────┐ │ │ |
| 320 | + │ │ │ EngineDriver │ │ │ 2. starts session |
| 321 | + │ │ │ (claude-agent-sdk) │ │ │ 3. invokes Claude API |
| 322 | + │ │ │ + AuditSink chain │ │ │ |
| 323 | + │ │ └─────┬────────────────┘ │ │ |
| 324 | + │ │ │ │ │ |
| 325 | + │ └─────────┼───────────────────┘ │ |
| 326 | + └────────────┼─────────────────────┘ |
| 327 | + │ |
| 328 | + ▼ SSE: ca_session_started, sdk_message, ca_usage_snapshot, ca_session_ended |
| 329 | + ┌────────────────────┐ |
| 330 | + │ ChatHandle │ 5. yields raw events as `for await of handle` |
| 331 | + │ (client SDK) │ 6. drains to ChatResult on `await handle` |
| 332 | + └────┬───────────────┘ |
| 333 | + │ |
| 334 | + ├─→ MongoTelemetry (agent_logs row) |
| 335 | + └─→ OtelAuditSink (gen_ai.* spans → OTel Collector → ClickHouse) |
| 336 | +
|
| 337 | + 4. Engine fires AuditSink.emit() on every event, fire-and-forget. |
| 338 | +``` |
| 339 | + |
| 340 | +The interesting part is how little of this the **agent code** has to know. The agent's `agent.yaml` + `SOUL.md` files (its GAP manifest) describe what it does. ComputerAgent figures out where to run it, who tracks it, and how its output gets to the dashboard. |
| 341 | + |
| 342 | +--- |
| 343 | + |
| 344 | +## See also |
| 345 | + |
| 346 | +- [`README.md`](README.md) — install + quickstart |
| 347 | +- [`packages/protocol/`](../protocol/) — the wire-protocol schemas, Zod-validated |
| 348 | +- [`packages/sdk/src/chat-handle.ts`](../sdk/src/chat-handle.ts) — the client-side stream wrapper covered in §6 |
| 349 | +- [`packages/engine-claude-agent-sdk/`](../engine-claude-agent-sdk/) — the reference `EngineDriver` implementation |
| 350 | +- [`packages/harness-server/`](../harness-server/) — the Hono server that hosts engines + substrates |
| 351 | +- [`@open-gitagent/agent-registry-mongo`](../agent-registry-mongo/) — the first-class `MongoTelemetry` + `AuditSink` impl |
| 352 | +- [`spike/temporal-k8s-localsubstrate/REPORT.md`](https://github.com/open-gitagent/enterprise-computeragent/blob/main/spike/temporal-k8s-localsubstrate/REPORT.md) — library-mode under Temporal + K8s (de-risk spike, runs live) |
0 commit comments