Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner by muhamadjawdatsalemalakoum · Pull Request #19 · vectorize-io/agent-memory-benchmark

muhamadjawdatsalemalakoum · 2026-06-10T05:35:38Z

Context

Following up on the exchange with Chris Latimer about our Context Swarm Memory (CSM) BEAM 100k comparison: the guidance was that official publication requires following the AMB repo instructions and demonstrating results with the benchmark runners. This PR is exactly that, and a question: is an external-provider PR like this your preferred submission path? Happy to restructure however you prefer.

What this PR contains (3 commits, no harness changes)

src/memory_bench/memory/csm.py + one registry line — a MemoryProvider backed by the open-source context-swarm-memory TypeScript implementation. initialize() starts a warm localhost bridge service, ingest() streams documents to it (with a durable JSONL log for resume), retrieve() queries it, cleanup() shuts it down. Stdlib-only (urllib/subprocess). Zero changes to AMB scoring, answer prompt, judge prompt, gold data, or any other harness source.
README requirements blurb — Node 22+, CSM_REPO_DIR, shared GEMINI_API_KEY, and a Windows note (uv sync --no-install-package uvloop: uvloop arrives transitively via hindsight-api and does not build on Windows; it is unused by this provider).
outputs/beam/csm/rag/100k.json.gz — the full 400-query BEAM 100k result produced by omb run on unmodified harness code at base 45fa380, compressed with your own omb publish-results flow (manifest regeneration left to you; we ran from a sparse checkout).

Result (your runner, your judge path)

BEAM 100k	score	correct	avg context tokens	avg retrieve
csm (this PR)	0.743110	337/400	27,026	3.47 s
hindsight (published artifact)	0.733658	326/400	17,654.6	6.38 s

Answer gemini:gemini-3.1-pro-preview, judge gemini:gemini-2.5-flash-lite — chosen to match the published Hindsight 100k artifact; if you prefer different defaults for chart entries, we will rerun with whatever you specify.
Single attempt, no resume. Exact command, env, pinned SHAs, sanity captures (omb providers / omb splits), a 1-query smoke, full stdout log, and a 400-row per-query CSM token telemetry sidecar are preserved here: run manifest and result write-up.
Honest accounting: CSM additionally spends ~8.8k internal input tokens/query inside its own probe/recall/synthesis pipeline (reported separately in the telemetry sidecar so the visible context_tokens never under-states CSM cost), and its answer-visible context is larger than Hindsight's. Retrieval is concurrently faster (warm service + parallel probe/recall stages).
Integrity: CSM retrieval receives only the ingested documents, the query, user_id, and query_timestamp — no gold answers, rubrics, query IDs, or benchmark-specific logic. We do not describe this as an official leaderboard result unless/until you accept it.

Asks

Is this provider-PR + committed-result format the submission path you want, or would you rather run it yourselves / receive something else?
If you want different answer/judge models or additional splits (500k/1m/10m), tell us which and we will run them the same way.
Small docs note: the README usage examples (uv run amb ... --domain) are ahead of the shipped CLI (omb, --split) at 45fa380 — happy to include a fix here if useful.

External memory provider backed by the open-source context-swarm-memory TypeScript implementation. initialize() starts a warm localhost bridge service (npm run amb:csm:serve in CSM_REPO_DIR), ingest() streams AMB documents to it (with a durable JSONL log for resume), retrieve() queries it per AMB query, and cleanup() shuts it down. Stdlib-only (urllib/subprocess); no AMB harness, scoring, prompt, or data changes. Requirements: Node 22+, a context-swarm-memory checkout (CSM_REPO_DIR, npm install), and GEMINI_API_KEY for CSM internal retrieval. Optional env (models, return-k, telemetry sidecar) documented in the module docstring. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Node 22+, CSM_REPO_DIR checkout with npm install, GEMINI_API_KEY shared with the harness, optional env in the provider docstring, and the Windows uvloop install note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Full 400-query run: score 0.743110, 337/400 correct, avg retrieval 3.467s. Answer gemini:gemini-3.1-pro-preview, judge gemini:gemini-2.5-flash-lite (matching the published Hindsight 100k row). Compressed with omb publish-results; results-manifest.json left untouched (sparse checkout - regeneration belongs to maintainers at publish time). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel · 2026-06-10T05:35:42Z

@muhamadjawdatsalemalakoum is attempting to deploy a commit to the Vectorize Team on Vercel.

A member of the Team first needs to authorize it.

muhamadjawdatsalemalakoum and others added 3 commits June 10, 2026 03:54

Document csm provider requirements

cf0d47a

Node 22+, CSM_REPO_DIR checkout with npm install, GEMINI_API_KEY shared with the harness, optional env in the provider docstring, and the Windows uvloop install note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner#19

Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner#19
muhamadjawdatsalemalakoum wants to merge 3 commits into
vectorize-io:mainfrom
muhamadjawdatsalemalakoum:csm-provider

muhamadjawdatsalemalakoum commented Jun 10, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

muhamadjawdatsalemalakoum commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What this PR contains (3 commits, no harness changes)

Result (your runner, your judge path)

Asks

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

muhamadjawdatsalemalakoum commented Jun 10, 2026 •

edited

Loading