diff --git a/.gitignore b/.gitignore
index cf9381d..ccf99db 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,3 +3,4 @@ dist/
*.tsbuildinfo
.worktrees/
.superpowers/
+coverage/
diff --git a/.prettierignore b/.prettierignore
index 52af816..c45c1e5 100644
--- a/.prettierignore
+++ b/.prettierignore
@@ -2,3 +2,4 @@ dist/
node_modules/
pnpm-lock.yaml
charts/
+coverage/
diff --git a/README.md b/README.md
index 2b3448b..3bb6657 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
# @copilotkit/llmock [](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [](https://www.npmjs.com/package/@copilotkit/llmock)
-Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
+Mock infrastructure for AI application testing — LLM APIs, MCP tools, A2A agents, vector databases, search, and more. Real HTTP server on a real port, fixture-driven, zero runtime dependencies.
## Quick Start
@@ -23,25 +23,106 @@ const url = await mock.start();
await mock.stop();
```
+## Usage Scenarios
+
+### In-process testing
+
+Use the programmatic API to start and stop the mock server in your test setup. Every test framework works — Vitest, Jest, Playwright, Mocha, anything.
+
+```typescript
+import { LLMock } from "@copilotkit/llmock";
+
+const mock = new LLMock({ port: 5555 });
+mock.loadFixtureDir("./fixtures");
+const url = await mock.start();
+process.env.OPENAI_BASE_URL = `${url}/v1`;
+
+// ... run tests ...
+
+await mock.stop();
+```
+
+### Running locally
+
+Use the CLI with `--watch` to hot-reload fixtures as you edit them. Point your app at the mock and iterate without touching real APIs.
+
+```bash
+llmock -p 4010 -f ./fixtures --watch
+```
+
+### CI pipelines
+
+Use the Docker image with `--strict` mode and record-and-replay for deterministic, zero-cost CI runs.
+
+```yaml
+# GitHub Actions example
+- name: Start aimock
+ run: |
+ docker run -d --name aimock \
+ -v ./fixtures:/fixtures \
+ -p 4010:4010 \
+ ghcr.io/copilotkit/aimock \
+ llmock --strict -f /fixtures
+
+- name: Run tests
+ env:
+ OPENAI_BASE_URL: http://localhost:4010/v1
+ run: pnpm test
+
+- name: Stop aimock
+ run: docker stop aimock
+```
+
+### Cross-language testing
+
+The Docker image runs as a standalone HTTP server — any language that speaks HTTP can use it. Python, Go, Rust, Ruby, Java, anything.
+
+```bash
+docker run -d -p 4010:4010 ghcr.io/copilotkit/aimock llmock -f /fixtures
+
+# Python
+client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")
+
+# Go
+client := openai.NewClient(option.WithBaseURL("http://localhost:4010/v1"))
+
+# Rust
+let client = Client::new().with_base_url("http://localhost:4010/v1");
+```
+
## Features
-- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[Record-and-replay](https://llmock.copilotkit.dev/record-replay.html)** — VCR-style proxy records real API responses as fixtures for deterministic replay
+- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [Responses API](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[MCPMock](https://llmock.copilotkit.dev/mcp-mock.html)** — Mock MCP server with tools, resources, prompts, and session management
+- **[A2AMock](https://llmock.copilotkit.dev/a2a-mock.html)** — Mock A2A protocol server with agent cards, message routing, and streaming
+- **[VectorMock](https://llmock.copilotkit.dev/vector-mock.html)** — Mock vector database with Pinecone, Qdrant, and ChromaDB endpoints
+- **[Services](https://llmock.copilotkit.dev/services.html)** — Built-in search (Tavily), rerank (Cohere), and moderation (OpenAI) mocks
+- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
+- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
- **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)** — `response_format`, `json_schema`, and function calling
- **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
- **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
- **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
- **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
-- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
-- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
- **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
- **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
-- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
- **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
- **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly
+## aimock CLI (Full-Stack Mock)
+
+For projects that need more than LLM mocking, the `aimock` CLI reads a JSON config file and serves all mock services on one port:
+
+```bash
+aimock --config aimock.json --port 4010
+```
+
+See the [aimock documentation](https://llmock.copilotkit.dev/aimock-cli.html) for config file format and Docker usage.
+
## CLI Quick Reference
```bash
@@ -50,6 +131,7 @@ llmock [options]
| Option | Short | Default | Description |
| -------------------- | ----- | ------------ | ------------------------------------------- |
+| `--config` | | | Config file for aimock CLI |
| `--port` | `-p` | `4010` | Port to listen on |
| `--host` | `-h` | `127.0.0.1` | Host to bind to |
| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
@@ -90,6 +172,19 @@ Full API reference, fixture format, E2E patterns, and provider-specific guides:
**[https://llmock.copilotkit.dev/docs.html](https://llmock.copilotkit.dev/docs.html)**
+## llmock vs MSW
+
+[MSW (Mock Service Worker)](https://mswjs.io/) patches `http`/`https`/`fetch` inside a single Node.js process. llmock runs a real HTTP server on a real port that any process can reach — child processes, microservices, agent workers, Docker containers. MSW can't intercept any of those; llmock can. For a detailed comparison including other tools, see the [full comparison on the docs site](https://llmock.copilotkit.dev/#comparison).
+
+| Capability | llmock | MSW |
+| -------------------------- | ---------------------------- | ---------------------- |
+| Cross-process interception | **Yes** (real server) | No (in-process only) |
+| LLM SSE streaming | **Built-in** (13+ providers) | Manual for each format |
+| Fixture files (JSON) | **Yes** | No (code-only) |
+| Record & replay | **Yes** | No |
+| WebSocket APIs | **Yes** | No |
+| Zero dependencies | **Yes** | No (~300KB) |
+
## Real-World Usage
[CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs.
diff --git a/docs/a2a-mock.html b/docs/a2a-mock.html
new file mode 100644
index 0000000..e09bf9a
--- /dev/null
+++ b/docs/a2a-mock.html
@@ -0,0 +1,243 @@
+
+
+
+
+
+ A2AMock — aimock
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
A2AMock
+
+ Mock A2A (Agent-to-Agent) protocol server for testing multi-agent systems. Implements the
+ A2A JSON-RPC protocol with agent card discovery, message routing, task management, and SSE
+ streaming.
+
+ The agent card is served at GET /.well-known/agent-card.json and includes all
+ registered agents' skills and capabilities. The A2A-Version: 1.0 header is
+ included on all responses.
+
+
+
Inspection
+
+
+ Inspection API typescript
+
+
a2a.health(); // { status: "ok", agents: 2, tasks: 5 }
+a2a.reset(); // Clears all agents and tasks
+ aimock is the full-stack mock orchestrator. Where aimock serves
+ LLM endpoints only, aimock reads a JSON config file and serves LLM mocks
+ alongside additional mock services (MCP, A2A, vector stores) on a single port.
+
+ The config file is a JSON object describing which services to run and how to configure
+ them. The llm section configures the core LLMock server. Additional services
+ are mounted at path prefixes.
+
- llmock supports the AWS Bedrock Claude invoke and Converse API endpoints — both
- streaming and non-streaming. Point the AWS SDK at your llmock instance and fixtures match
+ aimock supports the AWS Bedrock Claude invoke and Converse API endpoints — both
+ streaming and non-streaming. Point the AWS SDK at your aimock instance and fixtures match
against the Bedrock-format requests, returning responses in the authentic Bedrock format
including AWS Event Stream binary framing for streaming.
@@ -96,13 +67,13 @@
How It Works
model field in the body (the model is in the URL).
- llmock detects the Bedrock URL pattern, extracts the model ID, translates the request to
+ aimock detects the Bedrock URL pattern, extracts the model ID, translates the request to
the internal fixture-matching format, and returns the response in the Anthropic Messages
API format — which is identical to the Bedrock Claude response format. For
streaming, responses use the AWS Event Stream binary framing protocol.
- llmock also supports the Converse API (Converse API (/model/{modelId}/converse
and /model/{modelId}/converse-stream), which uses a different
@@ -205,7 +176,7 @@
Model Resolution
SDK Configuration
-
To point the AWS SDK Bedrock Runtime client at llmock, configure the endpoint URL:
+
To point the AWS SDK Bedrock Runtime client at aimock, configure the endpoint URL:
Fixtures are shared across all providers. The same fixture file works for OpenAI, Claude
- Messages, Gemini, Azure, and Bedrock endpoints — llmock translates each provider's
+ Messages, Gemini, Azure, and Bedrock endpoints — aimock translates each provider's
request format to a common internal format before matching.
@@ -264,7 +235,7 @@
Fixture Examples
Streaming (invoke-with-response-stream)
The invoke-with-response-stream endpoint returns responses using the
- AWS Event Stream binary protocol. llmock implements this protocol
+ AWS Event Stream binary protocol. aimock implements this protocol
natively — each response chunk is encoded as a binary frame with CRC32 checksums,
headers, and a JSON payload, exactly as the real Bedrock service sends them.
@@ -322,7 +293,7 @@
AWS Event Stream Binary Format
[message_crc32: 4B CRC32 of entire frame minus last 4 bytes]
- llmock encodes these frames with proper CRC32 checksums, so the AWS SDK can decode them
+ aimock encodes these frames with proper CRC32 checksums, so the AWS SDK can decode them
natively. The :event-type header in each frame carries the event name (e.g.
chunk), and the :content-type header is set to
application/json.
@@ -332,7 +303,7 @@
Converse API
The Converse API is AWS Bedrock's provider-agnostic conversation interface. It uses
camelCase field names and a different request structure than the Claude-native invoke
- endpoints. llmock supports both /model/{modelId}/converse (non-streaming) and
+ endpoints. aimock supports both /model/{modelId}/converse (non-streaming) and
/model/{modelId}/converse-stream (streaming via Event Stream binary).
@@ -369,14 +340,14 @@
Converse API
The Converse API also supports tool calls via toolUse and
toolResult content blocks, and tool definitions via the
- toolConfig field. llmock translates all of these to the unified internal
+ toolConfig field. aimock translates all of these to the unified internal
format for fixture matching.