info — architecture & non-obvious bits

A guided tour of the parts of ComputerAgent that read interesting in a slide deck, a talk, or a HN comment. Sibling to README.md (which is the "how do I use this" intro). This file is the "what's actually going on under the hood" companion.

TL;DR — four orthogonal ports

ComputerAgent decomposes the agent stack into four pluggable axes. Every axis is one TypeScript interface; you swap any one without touching the others.

                                ┌───────────────────────────────────┐
                                │            ComputerAgent          │
                                │       (one constructor call)      │
                                └──────────────┬────────────────────┘
                                               │
                ┌──────────────┬───────────────┼───────────────┬───────────────┐
                │              │               │               │               │
                ▼              ▼               ▼               ▼               ▼
            WHAT             HOW             WHERE          REMEMBER         AUDIT
        IdentityLoader   EngineDriver      Substrate     SessionStore     AuditSink
            (agent)        (loop)           (sandbox)      (memory)       (telemetry)

   GAP git repo |       claude-agent-sdk |  Local      |  in-memory   |  Mongo
   inline yaml  |       deepagents       |  Bwrap      |  file/jsonl  |  OTel + ClickHouse
   local folder |       gitagent         |  E2B        |  Mongo       |  Honeycomb / Datadog
                                          VZ/Tart      |  SQLite      |  console

Five interfaces. The fifth — AuditSink — sits on top of the SDK rather than inside ComputerAgent's constructor (it's wired explicitly by callers that want telemetry), but it's the same shape: one method, one swap.

1. Git URL is the agent identity

Most agent frameworks invent a registry (UUIDs, names, versions). ComputerAgent collapses that:

new ComputerAgent({
  source: { type: "git", url: "github.com/acme/triage-agent" }
})

The git URL is the canonical name. Versioning is ?ref=v1.2 or a commit SHA. Discovery is git clone. The Mongo agent_registry is a cache + telemetry index — not the source of truth. You can delete the entire registry and re-create it by running agents.

Implication: agents share an identity across every machine that runs them. The same git URL fired from a customer's Temporal worker and from your laptop writes to the same agent_logs document. Cross-machine deduplication, free.

2. Substrate-agnostic agent code

The agent doesn't know — or care — where it runs:

Substrate	What it actually is	Use when
`LocalSubstrate`	A subprocess on the same host	dev, library-mode (in someone's existing worker)
`BwrapSubstrate`	Linux user-namespaces (bubblewrap)	"isolation without containers" — fast, ~ms startup
`E2BSubstrate`	Firecracker microVM in the cloud	strong isolation, untrusted code
`VZSubstrate`	Apple VZ.framework via Tart	macOS-native VM, full OS + persistent disk

new ComputerAgent({
  source: { type: "git", url: "..." },
  runtime: new LocalSubstrate(),       // ← only the deploy story changes
});

You change one constructor arg. Not the agent. Not the harness. Not the tools. There's a substrate × source × engine matrix test that fires every cell of the grid — adding a new substrate adds one column, not three months of edge-case chasing.

3. Harness protocol — the layer most frameworks don't have

Between "the SDK calling Anthropic" and "the substrate running it" there's a harness boundary. It's a tiny HTTP server (Hono on Bun/Node) speaking SSE + plain JSON, and it's the thing that makes claude-agent-sdk, gitagent, and deepagents fungible.

   Client (SDK)                       Harness                     Engine
        │                                │                          │
        │  POST /v1/sessions             │                          │
        │  { source, harness, runtime }  │                          │
        │ ─────────────────────────────▶ │                          │
        │                                │  EngineDriver.startSession
        │                                │ ──────────────────────▶  │
        │                                │                          │
        │  Content-Type: text/event-stream                          │
        │ ◀───────────────────────────── │                          │
        │  event: ca_session_started     │                          │
        │  data: { sessionId, engine }   │ ◀─── EngineEvent stream  │
        │                                │                          │
        │  event: sdk_message            │ ◀── { type: "assistant" }│
        │  event: ca_permission_request  │                          │
        │  POST /v1/sessions/:id/permission/:callId                 │
        │  { decision: "allow" }         │                          │
        │ ─────────────────────────────▶ │                          │
        │                                │                          │
        │  event: ca_usage_snapshot      │                          │
        │  event: ca_session_ended       │                          │
        │ ◀───────────────────────────── │                          │

The wire is documented under packages/protocol/src/ and verified by a Zod-schema test suite (harness-rest.test.ts, sse-events.test.ts). curl can drive every endpoint. No proprietary RPC.

Why a separate harness process?

Three reasons that compound:

Engine portability. claude-agent-sdk wants $HOME/.claude/projects/*.jsonl. gitclaw wants $GITCLAW_MODEL_BASE_URL. deepagents is built on LangChain. Wrapping each in a uniform EngineDriver interface and putting them all behind one HTTP shape means the client SDK never speaks engine-specific dialects.
Substrate boundary == process boundary. When you swap from LocalSubstrate to E2BSubstrate, the harness moves to a different machine. Same wire protocol, different physical location. Your SDK code doesn't notice.
Resumability. Every SSE event has a monotonic id. If the client disconnects, it reconnects with Last-Event-ID: <last-id> and the harness server replays from a per-session ring buffer (default: last 1,000 events or 5 minutes). Critical when running over flaky networks.

Harness events (the wire protocol)

type HarnessEvent =
  | { kind: "ca_session_started";    sessionId; engine; identity; capabilities }
  | { kind: "sdk_message";           sessionId; payload }            // engine-native
  | { kind: "ca_permission_request"; sessionId; callId; toolName; input; risk }
  | { kind: "ca_permission_decision";sessionId; callId; decision; reason? }
  | { kind: "ca_turn_started";       sessionId; userTextLen? }
  | { kind: "ca_usage_snapshot";     sessionId; inputTokens?; outputTokens?;
                                     costUsd?; costSemantic? }       // see §6
  | { kind: "ca_session_ended";      sessionId; reason; errorMessage? };

sdk_message.payload is opaque — it's whatever the engine's native message shape is. The client SDK doesn't try to normalize it; the engine knows how to emit, the consumer knows how to consume.

4. AuditSink — telemetry as a protocol

There's no logger interface and no metrics interface. There's AuditSink:

interface AuditSink {
  emit(event: AgentEvent): Promise<void> | void;
}

One method. Plug in any of:

MongoTelemetry — persists turn history to agent_registry + agent_logs
OtelAuditSink — emits gen_ai.* OpenTelemetry spans → OTLP → ClickHouse / Datadog / Honeycomb / your APM
console — dev
Chain them: [mongoSink, otelSink, consoleSink] — the SDK fires emit() on each, fire-and-forget

We were early adopters of the OpenTelemetry gen_ai.* semantic conventions — gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.response.cost_usd. So your existing Grafana board built for OTel renders agent traffic out of the box.

AuditSink is fire-and-forget by contract. The SDK catches thrown errors and never propagates them up. Telemetry must never break an agent run.

5. Library-mode vs server-mode

Most agent platforms force you into their server. ComputerAgent has two equally first-class modes:

   server-mode                              library-mode
   ────────────                             ────────────
   your customers ──→ AgentOS UI            your existing worker
                  ──→ computeragent-server       imports `computeragent`
                  ──→ harness                    imports it
                  ──→ Anthropic                  imports it
                                                 └→ harness ──→ Anthropic

   (new pods, new auth, new ingress)        (zero new infra)

For customers who already run Temporal / Airflow / their own job runner, library-mode means no new pods, no new auth surface, no new ingress — their existing worker becomes the agent runner. The de-risk spike (spike/temporal-k8s-localsubstrate/REPORT.md) demonstrates 7.3s end-to-end Claude turn from inside a Temporal activity in a K8s pod with no Service, no Ingress, no new RBAC.

6. Cost semantics — the subtle bit

ChatHandle aggregates per-message usage snapshots into a single ChatResult.usage. Tokens always SUM. Cost depends on the engine's costSemantic:

Semantic	Engine	Aggregation
`cumulative`	claude-agent-sdk	take the MAX value seen (each snapshot is a running total)
`delta`	gitclaw	SUM per-message deltas
`undefined`	legacy	treat as cumulative (safe — never double-count)
mixed (defensive)	hypothetical chained engines	prefer cumulative

This is the kind of invariant that is easy to get subtly wrong with no live harness — so it's nailed down by 7 dedicated unit tests in packages/sdk/src/chat-handle.test.ts.

7. JSONL session replay (auditor-friendly by accident)

claude-agent-sdk persists each session as a JSONL file in ~/.claude/projects/<encoded>/<session-id>.jsonl. Append-only, plain text, one event per line. We didn't invent this — but two things fall out for free:

Resumable across crashes — restart the worker, replay the JSONL, continue
Audit trail with no extra plumbing — grep, jq, ship to S3. Compliance team smiles.

The dashboard reads these directly when you click into a session — no proprietary log store.

8. SessionStore — swappable conversation memory

Replace agent.sessionStore with one constructor arg:

Kind	Backend	Use
`"memory"`	in-process map	dev / tests
`"file"`	JSONL on disk	local persistence, no infra
`"mongo"`	MongoDB collection	shared memory across worker pods
`"sqlite"`	local SQLite file	embedded, queryable, fast

new ComputerAgent({
  source: { type: "git", url: "..." },
  sessionStore: { kind: "mongo", options: { url: MONGO_URL, database: "agentos" } },
});

Same SDK call. The engine doesn't know which backend is in play. Resume across process restart, host change, substrate teardown is built-in — not a per-integration manual replay job.

9. IRSA, no static AWS keys

For Bedrock, every other framework's instructions tell you to set AWS_ACCESS_KEY_ID in the pod env. We refuse to do that.

Instead, the pod's ServiceAccount has an eks.amazonaws.com/role-arn annotation. The AWS SDK's default-credential-chain finds AWS_ROLE_ARN + AWS_WEB_IDENTITY_TOKEN_FILE (auto-injected by the EKS pod-identity webhook), assumes the role, and Bedrock calls just work.

The harness explicitly allow-lists those env vars from the host process to the engine subprocess (see engine-claude-agent-sdk/src/engine.ts:inheritEssentialHostEnv). The 9 keys it passes:

CLAUDE_CODE_USE_BEDROCK
AWS_REGION
AWS_DEFAULT_REGION
AWS_BEDROCK_MODEL_ID
AWS_ROLE_ARN                       ← IRSA-injected
AWS_WEB_IDENTITY_TOKEN_FILE        ← IRSA-injected
AWS_PROFILE
AWS_SHARED_CREDENTIALS_FILE
AWS_CONFIG_FILE

Empirically verified in the spike: bedrock-2023-05-31 invoke against Claude Haiku 4.5 in us-east-2, 7.3s, $0.035, is_error: false. No static keys anywhere in the cluster.

10. Permission protocol — every tool call is auditable

Every Bash, Read, Edit call by an agent goes through a permission check that emits a ca_permission_decision event. This event includes:

the tool name
the tool arguments (Bash command, Read path)
the decision (allow / deny / ask)
why (the matching policy rule, if any)

  engine                  harness                  client (or policy decider)
     │                        │                            │
     │  permission_request    │                            │
     │ ──────────────────────▶│                            │
     │                        │  ca_permission_request     │
     │                        │  (SSE event)               │
     │                        │ ─────────────────────────▶ │
     │                        │                            │
     │                        │  POST /permission/:callId  │
     │                        │  { decision: "allow" }     │
     │                        │ ◀──────────────────────────┤
     │  PermissionResult      │                            │
     │ ◀──────────────────────┤                            │
     │                        │  ca_permission_decision    │
     │                        │  → AuditSink               │

The harness can short-circuit: if there's an in-process PolicyDecider (Cedar/OPA via SRS), the harness resolves the decision without a client round-trip. Same wire event still flows to AuditSink for the audit trail.

Pipe ca_permission_decision events into your SIEM and you have full audit-replay for every agent action.

11. Conformance suite for third-party plug-ins

@computeragent/testing exports a table-driven conformance suite that any third-party EngineDriver / Substrate / SessionStore implementation can run against itself:

import { runEngineConformance } from "@computeragent/testing";

runEngineConformance(myCustomEngine, {
  capabilities: { streamingInput: true, permissionCallback: true, /* … */ },
});

The suite asserts: engine emits the right events in the right order, respects abort signals, surfaces tool calls through the permission protocol, doesn't crash on empty input. About 30 invariants. Plug-in authors discover protocol violations at vitest run, not in production.

12. OTLP everywhere, vendor nowhere

The harness exports OTel via plain OTEL_EXPORTER_OTLP_ENDPOINT. That's it. The harness doesn't know:

❌ "We use Datadog"
❌ "We use ClickHouse"
❌ "We use Honeycomb"

It knows: "POST traces to this URL." An OTel Collector sitting next to it does the demux. Your vendor of choice is a collector config away — no recompilation, no harness restart, no new code path.

End-to-end flow — a single chat turn

The pieces above tied together, for one agent.chat("hello") call against a remote E2B substrate:

   1. agent.chat("hello")
            │
            ▼ POST {harnessUrl}/v1/sessions
        ┌──────────────────────────────────┐
        │  Substrate (E2B microVM, remote) │
        │  ┌────────────────────────────┐  │
        │  │   Harness server (Hono)     │  │
        │  │   ┌──────────────────────┐  │  │
        │  │   │  EngineDriver        │  │  │ 2. starts session
        │  │   │  (claude-agent-sdk)  │  │  │ 3. invokes Claude API
        │  │   │  + AuditSink chain   │  │  │
        │  │   └─────┬────────────────┘  │  │
        │  │         │                   │  │
        │  └─────────┼───────────────────┘  │
        └────────────┼─────────────────────┘
                     │
                     ▼  SSE: ca_session_started, sdk_message, ca_usage_snapshot, ca_session_ended
            ┌────────────────────┐
            │  ChatHandle        │  5. yields raw events as `for await of handle`
            │  (client SDK)      │  6. drains to ChatResult on `await handle`
            └────┬───────────────┘
                 │
                 ├─→ MongoTelemetry  (agent_logs row)
                 └─→ OtelAuditSink   (gen_ai.* spans → OTel Collector → ClickHouse)

   4. Engine fires AuditSink.emit() on every event, fire-and-forget.

The interesting part is how little of this the agent code has to know. The agent's agent.yaml + SOUL.md files (its GAP manifest) describe what it does. ComputerAgent figures out where to run it, who tracks it, and how its output gets to the dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

info — architecture & non-obvious bits

TL;DR — four orthogonal ports

1. Git URL is the agent identity

2. Substrate-agnostic agent code

3. Harness protocol — the layer most frameworks don't have

Why a separate harness process?

Harness events (the wire protocol)

4. AuditSink — telemetry as a protocol

5. Library-mode vs server-mode

6. Cost semantics — the subtle bit

7. JSONL session replay (auditor-friendly by accident)

8. SessionStore — swappable conversation memory

9. IRSA, no static AWS keys

10. Permission protocol — every tool call is auditable

11. Conformance suite for third-party plug-ins

12. OTLP everywhere, vendor nowhere

End-to-end flow — a single chat turn

See also

FilesExpand file tree

info.md

Latest commit

History

info.md

File metadata and controls

info — architecture & non-obvious bits

TL;DR — four orthogonal ports

1. Git URL is the agent identity

2. Substrate-agnostic agent code

3. Harness protocol — the layer most frameworks don't have

Why a separate harness process?

Harness events (the wire protocol)

4. AuditSink — telemetry as a protocol

5. Library-mode vs server-mode

6. Cost semantics — the subtle bit

7. JSONL session replay (auditor-friendly by accident)

8. SessionStore — swappable conversation memory

9. IRSA, no static AWS keys

10. Permission protocol — every tool call is auditable

11. Conformance suite for third-party plug-ins

12. OTLP everywhere, vendor nowhere

End-to-end flow — a single chat turn

See also