Skip to content

Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner#19

Open
muhamadjawdatsalemalakoum wants to merge 3 commits into
vectorize-io:mainfrom
muhamadjawdatsalemalakoum:csm-provider
Open

Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner#19
muhamadjawdatsalemalakoum wants to merge 3 commits into
vectorize-io:mainfrom
muhamadjawdatsalemalakoum:csm-provider

Conversation

@muhamadjawdatsalemalakoum

@muhamadjawdatsalemalakoum muhamadjawdatsalemalakoum commented Jun 10, 2026

Copy link
Copy Markdown

Context

Following up on the exchange with Chris Latimer about our Context Swarm Memory (CSM) BEAM 100k comparison: the guidance was that official publication requires following the AMB repo instructions and demonstrating results with the benchmark runners. This PR is exactly that, and a question: is an external-provider PR like this your preferred submission path? Happy to restructure however you prefer.

What this PR contains (3 commits, no harness changes)

  1. src/memory_bench/memory/csm.py + one registry line — a MemoryProvider backed by the open-source context-swarm-memory TypeScript implementation. initialize() starts a warm localhost bridge service, ingest() streams documents to it (with a durable JSONL log for resume), retrieve() queries it, cleanup() shuts it down. Stdlib-only (urllib/subprocess). Zero changes to AMB scoring, answer prompt, judge prompt, gold data, or any other harness source.
  2. README requirements blurb — Node 22+, CSM_REPO_DIR, shared GEMINI_API_KEY, and a Windows note (uv sync --no-install-package uvloop: uvloop arrives transitively via hindsight-api and does not build on Windows; it is unused by this provider).
  3. outputs/beam/csm/rag/100k.json.gz — the full 400-query BEAM 100k result produced by omb run on unmodified harness code at base 45fa380, compressed with your own omb publish-results flow (manifest regeneration left to you; we ran from a sparse checkout).

Result (your runner, your judge path)

BEAM 100k score correct avg context tokens avg retrieve
csm (this PR) 0.743110 337/400 27,026 3.47 s
hindsight (published artifact) 0.733658 326/400 17,654.6 6.38 s
  • Answer gemini:gemini-3.1-pro-preview, judge gemini:gemini-2.5-flash-lite — chosen to match the published Hindsight 100k artifact; if you prefer different defaults for chart entries, we will rerun with whatever you specify.
  • Single attempt, no resume. Exact command, env, pinned SHAs, sanity captures (omb providers / omb splits), a 1-query smoke, full stdout log, and a 400-row per-query CSM token telemetry sidecar are preserved here: run manifest and result write-up.
  • Honest accounting: CSM additionally spends ~8.8k internal input tokens/query inside its own probe/recall/synthesis pipeline (reported separately in the telemetry sidecar so the visible context_tokens never under-states CSM cost), and its answer-visible context is larger than Hindsight's. Retrieval is concurrently faster (warm service + parallel probe/recall stages).
  • Integrity: CSM retrieval receives only the ingested documents, the query, user_id, and query_timestamp — no gold answers, rubrics, query IDs, or benchmark-specific logic. We do not describe this as an official leaderboard result unless/until you accept it.

Asks

  1. Is this provider-PR + committed-result format the submission path you want, or would you rather run it yourselves / receive something else?
  2. If you want different answer/judge models or additional splits (500k/1m/10m), tell us which and we will run them the same way.
  3. Small docs note: the README usage examples (uv run amb ... --domain) are ahead of the shipped CLI (omb, --split) at 45fa380 — happy to include a fix here if useful.

External memory provider backed by the open-source context-swarm-memory
TypeScript implementation. initialize() starts a warm localhost bridge
service (npm run amb:csm:serve in CSM_REPO_DIR), ingest() streams AMB
documents to it (with a durable JSONL log for resume), retrieve()
queries it per AMB query, and cleanup() shuts it down. Stdlib-only
(urllib/subprocess); no AMB harness, scoring, prompt, or data changes.

Requirements: Node 22+, a context-swarm-memory checkout (CSM_REPO_DIR,
npm install), and GEMINI_API_KEY for CSM internal retrieval. Optional
env (models, return-k, telemetry sidecar) documented in the module
docstring.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Node 22+, CSM_REPO_DIR checkout with npm install, GEMINI_API_KEY
shared with the harness, optional env in the provider docstring, and
the Windows uvloop install note.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Full 400-query run: score 0.743110, 337/400 correct, avg retrieval
3.467s. Answer gemini:gemini-3.1-pro-preview, judge
gemini:gemini-2.5-flash-lite (matching the published Hindsight 100k
row). Compressed with omb publish-results; results-manifest.json left
untouched (sparse checkout - regeneration belongs to maintainers at
publish time).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@muhamadjawdatsalemalakoum is attempting to deploy a commit to the Vectorize Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant