Add Context Swarm Memory (csm) provider + BEAM 100k result from the omb runner#19
Open
muhamadjawdatsalemalakoum wants to merge 3 commits into
Open
Conversation
External memory provider backed by the open-source context-swarm-memory TypeScript implementation. initialize() starts a warm localhost bridge service (npm run amb:csm:serve in CSM_REPO_DIR), ingest() streams AMB documents to it (with a durable JSONL log for resume), retrieve() queries it per AMB query, and cleanup() shuts it down. Stdlib-only (urllib/subprocess); no AMB harness, scoring, prompt, or data changes. Requirements: Node 22+, a context-swarm-memory checkout (CSM_REPO_DIR, npm install), and GEMINI_API_KEY for CSM internal retrieval. Optional env (models, return-k, telemetry sidecar) documented in the module docstring. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Node 22+, CSM_REPO_DIR checkout with npm install, GEMINI_API_KEY shared with the harness, optional env in the provider docstring, and the Windows uvloop install note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Full 400-query run: score 0.743110, 337/400 correct, avg retrieval 3.467s. Answer gemini:gemini-3.1-pro-preview, judge gemini:gemini-2.5-flash-lite (matching the published Hindsight 100k row). Compressed with omb publish-results; results-manifest.json left untouched (sparse checkout - regeneration belongs to maintainers at publish time). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@muhamadjawdatsalemalakoum is attempting to deploy a commit to the Vectorize Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Following up on the exchange with Chris Latimer about our Context Swarm Memory (CSM) BEAM 100k comparison: the guidance was that official publication requires following the AMB repo instructions and demonstrating results with the benchmark runners. This PR is exactly that, and a question: is an external-provider PR like this your preferred submission path? Happy to restructure however you prefer.
What this PR contains (3 commits, no harness changes)
src/memory_bench/memory/csm.py+ one registry line — aMemoryProviderbacked by the open-source context-swarm-memory TypeScript implementation.initialize()starts a warm localhost bridge service,ingest()streams documents to it (with a durable JSONL log for resume),retrieve()queries it,cleanup()shuts it down. Stdlib-only (urllib/subprocess). Zero changes to AMB scoring, answer prompt, judge prompt, gold data, or any other harness source.CSM_REPO_DIR, sharedGEMINI_API_KEY, and a Windows note (uv sync --no-install-package uvloop: uvloop arrives transitively via hindsight-api and does not build on Windows; it is unused by this provider).outputs/beam/csm/rag/100k.json.gz— the full 400-query BEAM 100k result produced byomb runon unmodified harness code at base45fa380, compressed with your ownomb publish-resultsflow (manifest regeneration left to you; we ran from a sparse checkout).Result (your runner, your judge path)
gemini:gemini-3.1-pro-preview, judgegemini:gemini-2.5-flash-lite— chosen to match the published Hindsight 100k artifact; if you prefer different defaults for chart entries, we will rerun with whatever you specify.omb providers/omb splits), a 1-query smoke, full stdout log, and a 400-row per-query CSM token telemetry sidecar are preserved here: run manifest and result write-up.context_tokensnever under-states CSM cost), and its answer-visible context is larger than Hindsight's. Retrieval is concurrently faster (warm service + parallel probe/recall stages).user_id, andquery_timestamp— no gold answers, rubrics, query IDs, or benchmark-specific logic. We do not describe this as an official leaderboard result unless/until you accept it.Asks
uv run amb ... --domain) are ahead of the shipped CLI (omb,--split) at45fa380— happy to include a fix here if useful.