A guided tour of the parts of ComputerAgent that read interesting in a slide deck, a talk, or a HN comment. Sibling to README.md (which is the "how do I use this" intro). This file is the "what's actually going on under the hood" companion.
ComputerAgent decomposes the agent stack into four pluggable axes. Every axis is one TypeScript interface; you swap any one without touching the others.
┌───────────────────────────────────┐
│ ComputerAgent │
│ (one constructor call) │
└──────────────┬────────────────────┘
│
┌──────────────┬───────────────┼───────────────┬───────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
WHAT HOW WHERE REMEMBER AUDIT
IdentityLoader EngineDriver Substrate SessionStore AuditSink
(agent) (loop) (sandbox) (memory) (telemetry)
GAP git repo | claude-agent-sdk | Local | in-memory | Mongo
inline yaml | deepagents | Bwrap | file/jsonl | OTel + ClickHouse
local folder | gitagent | E2B | Mongo | Honeycomb / Datadog
VZ/Tart | SQLite | console
Five interfaces. The fifth — AuditSink — sits on top of the SDK rather than inside ComputerAgent's constructor (it's wired explicitly by callers that want telemetry), but it's the same shape: one method, one swap.
Most agent frameworks invent a registry (UUIDs, names, versions). ComputerAgent collapses that:
new ComputerAgent({
source: { type: "git", url: "github.com/acme/triage-agent" }
})The git URL is the canonical name. Versioning is ?ref=v1.2 or a commit SHA. Discovery is git clone. The Mongo agent_registry is a cache + telemetry index — not the source of truth. You can delete the entire registry and re-create it by running agents.
Implication: agents share an identity across every machine that runs them. The same git URL fired from a customer's Temporal worker and from your laptop writes to the same agent_logs document. Cross-machine deduplication, free.
The agent doesn't know — or care — where it runs:
| Substrate | What it actually is | Use when |
|---|---|---|
LocalSubstrate |
A subprocess on the same host | dev, library-mode (in someone's existing worker) |
BwrapSubstrate |
Linux user-namespaces (bubblewrap) | "isolation without containers" — fast, ~ms startup |
E2BSubstrate |
Firecracker microVM in the cloud | strong isolation, untrusted code |
VZSubstrate |
Apple VZ.framework via Tart | macOS-native VM, full OS + persistent disk |
new ComputerAgent({
source: { type: "git", url: "..." },
runtime: new LocalSubstrate(), // ← only the deploy story changes
});You change one constructor arg. Not the agent. Not the harness. Not the tools. There's a substrate × source × engine matrix test that fires every cell of the grid — adding a new substrate adds one column, not three months of edge-case chasing.
Between "the SDK calling Anthropic" and "the substrate running it" there's a harness boundary. It's a tiny HTTP server (Hono on Bun/Node) speaking SSE + plain JSON, and it's the thing that makes claude-agent-sdk, gitagent, and deepagents fungible.
Client (SDK) Harness Engine
│ │ │
│ POST /v1/sessions │ │
│ { source, harness, runtime } │ │
│ ─────────────────────────────▶ │ │
│ │ EngineDriver.startSession
│ │ ──────────────────────▶ │
│ │ │
│ Content-Type: text/event-stream │
│ ◀───────────────────────────── │ │
│ event: ca_session_started │ │
│ data: { sessionId, engine } │ ◀─── EngineEvent stream │
│ │ │
│ event: sdk_message │ ◀── { type: "assistant" }│
│ event: ca_permission_request │ │
│ POST /v1/sessions/:id/permission/:callId │
│ { decision: "allow" } │ │
│ ─────────────────────────────▶ │ │
│ │ │
│ event: ca_usage_snapshot │ │
│ event: ca_session_ended │ │
│ ◀───────────────────────────── │ │
The wire is documented under packages/protocol/src/ and verified by a Zod-schema test suite (harness-rest.test.ts, sse-events.test.ts). curl can drive every endpoint. No proprietary RPC.
Three reasons that compound:
-
Engine portability. claude-agent-sdk wants
$HOME/.claude/projects/*.jsonl. gitclaw wants$GITCLAW_MODEL_BASE_URL. deepagents is built on LangChain. Wrapping each in a uniformEngineDriverinterface and putting them all behind one HTTP shape means the client SDK never speaks engine-specific dialects. -
Substrate boundary == process boundary. When you swap from
LocalSubstratetoE2BSubstrate, the harness moves to a different machine. Same wire protocol, different physical location. Your SDK code doesn't notice. -
Resumability. Every SSE event has a monotonic
id. If the client disconnects, it reconnects withLast-Event-ID: <last-id>and the harness server replays from a per-session ring buffer (default: last 1,000 events or 5 minutes). Critical when running over flaky networks.
type HarnessEvent =
| { kind: "ca_session_started"; sessionId; engine; identity; capabilities }
| { kind: "sdk_message"; sessionId; payload } // engine-native
| { kind: "ca_permission_request"; sessionId; callId; toolName; input; risk }
| { kind: "ca_permission_decision";sessionId; callId; decision; reason? }
| { kind: "ca_turn_started"; sessionId; userTextLen? }
| { kind: "ca_usage_snapshot"; sessionId; inputTokens?; outputTokens?;
costUsd?; costSemantic? } // see §6
| { kind: "ca_session_ended"; sessionId; reason; errorMessage? };sdk_message.payload is opaque — it's whatever the engine's native message shape is. The client SDK doesn't try to normalize it; the engine knows how to emit, the consumer knows how to consume.
There's no logger interface and no metrics interface. There's AuditSink:
interface AuditSink {
emit(event: AgentEvent): Promise<void> | void;
}One method. Plug in any of:
MongoTelemetry— persists turn history toagent_registry+agent_logsOtelAuditSink— emitsgen_ai.*OpenTelemetry spans → OTLP → ClickHouse / Datadog / Honeycomb / your APMconsole— dev- Chain them:
[mongoSink, otelSink, consoleSink]— the SDK firesemit()on each, fire-and-forget
We were early adopters of the OpenTelemetry gen_ai.* semantic conventions — gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.response.cost_usd. So your existing Grafana board built for OTel renders agent traffic out of the box.
AuditSink is fire-and-forget by contract. The SDK catches thrown errors and never propagates them up. Telemetry must never break an agent run.
Most agent platforms force you into their server. ComputerAgent has two equally first-class modes:
server-mode library-mode
──────────── ────────────
your customers ──→ AgentOS UI your existing worker
──→ computeragent-server imports `computeragent`
──→ harness imports it
──→ Anthropic imports it
└→ harness ──→ Anthropic
(new pods, new auth, new ingress) (zero new infra)
For customers who already run Temporal / Airflow / their own job runner, library-mode means no new pods, no new auth surface, no new ingress — their existing worker becomes the agent runner. The de-risk spike (spike/temporal-k8s-localsubstrate/REPORT.md) demonstrates 7.3s end-to-end Claude turn from inside a Temporal activity in a K8s pod with no Service, no Ingress, no new RBAC.
ChatHandle aggregates per-message usage snapshots into a single ChatResult.usage. Tokens always SUM. Cost depends on the engine's costSemantic:
| Semantic | Engine | Aggregation |
|---|---|---|
cumulative |
claude-agent-sdk | take the MAX value seen (each snapshot is a running total) |
delta |
gitclaw | SUM per-message deltas |
undefined |
legacy | treat as cumulative (safe — never double-count) |
| mixed (defensive) | hypothetical chained engines | prefer cumulative |
This is the kind of invariant that is easy to get subtly wrong with no live harness — so it's nailed down by 7 dedicated unit tests in packages/sdk/src/chat-handle.test.ts.
claude-agent-sdk persists each session as a JSONL file in ~/.claude/projects/<encoded>/<session-id>.jsonl. Append-only, plain text, one event per line. We didn't invent this — but two things fall out for free:
- Resumable across crashes — restart the worker, replay the JSONL, continue
- Audit trail with no extra plumbing —
grep,jq, ship to S3. Compliance team smiles.
The dashboard reads these directly when you click into a session — no proprietary log store.
Replace agent.sessionStore with one constructor arg:
| Kind | Backend | Use |
|---|---|---|
"memory" |
in-process map | dev / tests |
"file" |
JSONL on disk | local persistence, no infra |
"mongo" |
MongoDB collection | shared memory across worker pods |
"sqlite" |
local SQLite file | embedded, queryable, fast |
new ComputerAgent({
source: { type: "git", url: "..." },
sessionStore: { kind: "mongo", options: { url: MONGO_URL, database: "agentos" } },
});Same SDK call. The engine doesn't know which backend is in play. Resume across process restart, host change, substrate teardown is built-in — not a per-integration manual replay job.
For Bedrock, every other framework's instructions tell you to set AWS_ACCESS_KEY_ID in the pod env. We refuse to do that.
Instead, the pod's ServiceAccount has an eks.amazonaws.com/role-arn annotation. The AWS SDK's default-credential-chain finds AWS_ROLE_ARN + AWS_WEB_IDENTITY_TOKEN_FILE (auto-injected by the EKS pod-identity webhook), assumes the role, and Bedrock calls just work.
The harness explicitly allow-lists those env vars from the host process to the engine subprocess (see engine-claude-agent-sdk/src/engine.ts:inheritEssentialHostEnv). The 9 keys it passes:
CLAUDE_CODE_USE_BEDROCK
AWS_REGION
AWS_DEFAULT_REGION
AWS_BEDROCK_MODEL_ID
AWS_ROLE_ARN ← IRSA-injected
AWS_WEB_IDENTITY_TOKEN_FILE ← IRSA-injected
AWS_PROFILE
AWS_SHARED_CREDENTIALS_FILE
AWS_CONFIG_FILE
Empirically verified in the spike: bedrock-2023-05-31 invoke against Claude Haiku 4.5 in us-east-2, 7.3s, $0.035, is_error: false. No static keys anywhere in the cluster.
Every Bash, Read, Edit call by an agent goes through a permission check that emits a ca_permission_decision event. This event includes:
- the tool name
- the tool arguments (
Bashcommand,Readpath) - the decision (
allow/deny/ask) - why (the matching policy rule, if any)
engine harness client (or policy decider)
│ │ │
│ permission_request │ │
│ ──────────────────────▶│ │
│ │ ca_permission_request │
│ │ (SSE event) │
│ │ ─────────────────────────▶ │
│ │ │
│ │ POST /permission/:callId │
│ │ { decision: "allow" } │
│ │ ◀──────────────────────────┤
│ PermissionResult │ │
│ ◀──────────────────────┤ │
│ │ ca_permission_decision │
│ │ → AuditSink │
The harness can short-circuit: if there's an in-process PolicyDecider (Cedar/OPA via SRS), the harness resolves the decision without a client round-trip. Same wire event still flows to AuditSink for the audit trail.
Pipe ca_permission_decision events into your SIEM and you have full audit-replay for every agent action.
@computeragent/testing exports a table-driven conformance suite that any third-party EngineDriver / Substrate / SessionStore implementation can run against itself:
import { runEngineConformance } from "@computeragent/testing";
runEngineConformance(myCustomEngine, {
capabilities: { streamingInput: true, permissionCallback: true, /* … */ },
});The suite asserts: engine emits the right events in the right order, respects abort signals, surfaces tool calls through the permission protocol, doesn't crash on empty input. About 30 invariants. Plug-in authors discover protocol violations at vitest run, not in production.
The harness exports OTel via plain OTEL_EXPORTER_OTLP_ENDPOINT. That's it. The harness doesn't know:
- ❌ "We use Datadog"
- ❌ "We use ClickHouse"
- ❌ "We use Honeycomb"
It knows: "POST traces to this URL." An OTel Collector sitting next to it does the demux. Your vendor of choice is a collector config away — no recompilation, no harness restart, no new code path.
The pieces above tied together, for one agent.chat("hello") call against a remote E2B substrate:
1. agent.chat("hello")
│
▼ POST {harnessUrl}/v1/sessions
┌──────────────────────────────────┐
│ Substrate (E2B microVM, remote) │
│ ┌────────────────────────────┐ │
│ │ Harness server (Hono) │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ EngineDriver │ │ │ 2. starts session
│ │ │ (claude-agent-sdk) │ │ │ 3. invokes Claude API
│ │ │ + AuditSink chain │ │ │
│ │ └─────┬────────────────┘ │ │
│ │ │ │ │
│ └─────────┼───────────────────┘ │
└────────────┼─────────────────────┘
│
▼ SSE: ca_session_started, sdk_message, ca_usage_snapshot, ca_session_ended
┌────────────────────┐
│ ChatHandle │ 5. yields raw events as `for await of handle`
│ (client SDK) │ 6. drains to ChatResult on `await handle`
└────┬───────────────┘
│
├─→ MongoTelemetry (agent_logs row)
└─→ OtelAuditSink (gen_ai.* spans → OTel Collector → ClickHouse)
4. Engine fires AuditSink.emit() on every event, fire-and-forget.
The interesting part is how little of this the agent code has to know. The agent's agent.yaml + SOUL.md files (its GAP manifest) describe what it does. ComputerAgent figures out where to run it, who tracks it, and how its output gets to the dashboard.
README.md— install + quickstartpackages/protocol/— the wire-protocol schemas, Zod-validatedpackages/sdk/src/chat-handle.ts— the client-side stream wrapper covered in §6packages/engine-claude-agent-sdk/— the referenceEngineDriverimplementationpackages/harness-server/— the Hono server that hosts engines + substrates@open-gitagent/agent-registry-mongo— the first-classMongoTelemetry+AuditSinkimplspike/temporal-k8s-localsubstrate/REPORT.md— library-mode under Temporal + K8s (de-risk spike, runs live)