From 5f04ca537fa334127f0afd79f1a411f2a6bd2761 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nathan=20=F0=9F=94=B6=20Tarbert?= <66887028+NathanTarbert@users.noreply.github.com> Date: Tue, 31 Mar 2026 21:06:19 -0400 Subject: [PATCH] docs: update README with WebSocket support and expanded API reference Add video demo, document WebSocket endpoints (OpenAI Responses, Realtime, Gemini Live), expand fixture matching/response reference, and add error injection and request journal sections. --- README.md | 207 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 204 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2b3448b..231f3d9 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,12 @@ # @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [![npm version](https://img.shields.io/npm/v/@copilotkit/llmock)](https://www.npmjs.com/package/@copilotkit/llmock) -Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies. +https://github.com/user-attachments/assets/1aa9f81d-7efb-4bd2-8e81-51f466f8a8e3 -## Quick Start +Deterministic multi-provider mock LLM server for testing. Streams SSE and WebSocket responses in real OpenAI, Claude, and Gemini API formats, driven entirely by fixtures. Zero runtime dependencies — built on Node.js builtins only. + +Supports streaming (SSE), non-streaming JSON, and WebSocket responses across OpenAI (Chat Completions + Responses + Realtime), Anthropic Claude (Messages), and Google Gemini (GenerateContent + Live) APIs. Text completions, tool calls, and error injection. Point any process at it via `OPENAI_BASE_URL`, `ANTHROPIC_BASE_URL`, or Gemini base URL and get reproducible, instant responses. + +## Install ```bash npm install @copilotkit/llmock @@ -42,7 +46,204 @@ await mock.stop(); - **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes - **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly -## CLI Quick Reference +### Error Injection + +#### `nextRequestError(status, errorBody?)` + +Queue a one-shot error for the very next request. The error fires once, then auto-removes itself. + +```typescript +mock.nextRequestError(429, { + message: "Rate limited", + type: "rate_limit_error", +}); + +// Next request → 429 error +// Subsequent requests → normal fixture matching +``` + +### Request Journal + +Every request to all API endpoints (`/v1/chat/completions`, `/v1/responses`, `/v1/messages`, Gemini endpoints, and all WebSocket endpoints) is recorded in a journal. + +#### Programmatic Access + +| Method | Returns | Description | +| ------------------ | ---------------------- | ------------------------------------- | +| `getRequests()` | `JournalEntry[]` | All recorded requests | +| `getLastRequest()` | `JournalEntry \| null` | Most recent request | +| `clearRequests()` | `void` | Clear the journal | +| `journal` | `Journal` | Direct access to the journal instance | + +```typescript +await fetch(mock.url + "/v1/chat/completions", { ... }); + +const last = mock.getLastRequest(); +expect(last?.body.messages).toContainEqual({ + role: "user", + content: "hello", +}); +``` + +#### HTTP Endpoints + +The server also exposes journal data over HTTP (useful in CLI mode): + +- `GET /v1/_requests` — returns all journal entries as JSON. Supports `?limit=N`. +- `DELETE /v1/_requests` — clears the journal. Returns 204. + +### Reset + +#### `reset()` + +Clear all fixtures **and** the journal in one call. Works before or after the server is started. + +```typescript +afterEach(() => { + mock.reset(); +}); +``` + +## Fixture Matching + +Fixtures are evaluated in registration order (first match wins). A fixture matches when **all** specified fields match the incoming request (AND logic). + +| Field | Type | Matches on | +| ------------- | ------------------ | --------------------------------------------- | +| `userMessage` | `string \| RegExp` | Content of the last `role: "user"` message | +| `toolName` | `string` | Name of a tool in the request's `tools` array | +| `toolCallId` | `string` | `tool_call_id` on a `role: "tool"` message | +| `model` | `string \| RegExp` | The `model` field in the request | +| `predicate` | `(req) => boolean` | Arbitrary matching function | + +## Fixture Responses + +### Text + +```typescript +{ + content: "Hello world"; +} +``` + +Streams as SSE chunks, splitting `content` by `chunkSize`. With `stream: false`, returns a standard `chat.completion` JSON object. + +### Tool Calls + +```typescript +{ + toolCalls: [{ name: "get_weather", arguments: '{"location":"SF"}' }]; +} +``` + +### Errors + +```typescript +{ + error: { message: "Rate limited", type: "rate_limit_error" }, + status: 429 +} +``` + +## API Endpoints + +The server handles: + +- **POST `/v1/chat/completions`** — OpenAI Chat Completions API (streaming and non-streaming) +- **POST `/v1/responses`** — OpenAI Responses API (streaming and non-streaming) +- **POST `/v1/messages`** — Anthropic Claude Messages API (streaming and non-streaming) +- **POST `/v1beta/models/{model}:generateContent`** — Google Gemini (non-streaming) +- **POST `/v1beta/models/{model}:streamGenerateContent`** — Google Gemini (streaming) + +WebSocket endpoints: + +- **WS `/v1/responses`** — OpenAI Responses API over WebSocket +- **WS `/v1/realtime`** — OpenAI Realtime API (text + tool calls) +- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live + +All endpoints share the same fixture pool — the same fixtures work across all providers. Requests are translated to a common format internally for fixture matching. + +## WebSocket APIs + +The same fixtures that drive HTTP responses also work over WebSocket transport. llmock implements RFC 6455 WebSocket framing with zero external dependencies — connect, send events, and receive streaming responses in real provider formats. + +Only text and tool call paths are supported over WebSocket. Audio, video, and binary frames are not implemented. + +### OpenAI Responses API (WebSocket) + +Connect to `ws://localhost:5555/v1/responses` and send a `response.create` event. The server streams back the same events as OpenAI's real WebSocket Responses API: + +```jsonc +// → Client sends: +{ + "type": "response.create", + "response": { + "modalities": ["text"], + "instructions": "You are a helpful assistant.", + "input": [ + { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] }, + ], + }, +} + +// ← Server streams: +// {"type": "response.created", ...} +// {"type": "response.output_item.added", ...} +// {"type": "response.content_part.added", ...} +// {"type": "response.output_item.done", ...} +// {"type": "response.done", ...} +``` + +### OpenAI Realtime API + +Connect to `ws://localhost:5555/v1/realtime`. The Realtime API uses a session-based protocol — configure the session, add conversation items, then request a response: + +```jsonc +// → Configure session: +{ "type": "session.update", "session": { "modalities": ["text"], "model": "gpt-4o-realtime" } } + +// → Add a user message: +{ + "type": "conversation.item.create", + "item": { + "type": "message", + "role": "user", + "content": [{ "type": "input_text", "text": "What is the capital of France?" }] + } +} + +// → Request a response: +{ "type": "response.create" } + +// ← Server streams: +// {"type": "response.created", ...} +// {"type": "response.text.delta", "delta": "The"} +// {"type": "response.text.delta", "delta": " capital"} +// ... +// {"type": "response.text.done", ...} +// {"type": "response.done", ...} +``` + +### Gemini Live (BidiGenerateContent) + +Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow: + +```jsonc +// → Setup message (must be first): +{ "setup": { "model": "models/gemini-2.0-flash-live", "generationConfig": { "responseModalities": ["TEXT"] } } } + +// → Send user content: +{ "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } } + +// ← Server streams: +// {"setupComplete": {}} +// {"serverContent": {"modelTurnComplete": false, "parts": [{"text": "Hello"}]}} +// {"serverContent": {"modelTurnComplete": true}} +``` + +## CLI + +The package includes a standalone server binary: ```bash llmock [options]