feat(workspaces): hybrid BM25+dense workspace search + CUDA prerelease CI by dvcdsys · Pull Request #38 · dvcdsys/code-index

dvcdsys · 2026-05-13T16:40:04Z

Summary

Lands the workspaces feature track on top of the new develop pre-release branch:

Workspace data model (PR2–PR11 history): workspace_repos, jobs, clone/index pipeline, GitHub webhooks, call-edges + eval harness, Louvain communities + workspace centroids, two-stage workspace search endpoint, CLI + skill + dashboard search dialog, name-first CLI grammar, in-dashboard add-repo flow with live progress, GitHub token scope derivation, account/org selector.
Hybrid search: FTS5 BM25 mirror of every indexed chunk + hybrid BM25+dense workspace search with project gate. Pre-FTS-mirror repos are flagged so the dashboard prompts a reindex.
Skill rewrite: cix-workspace skill rebuilt around the hybrid + 3-question workflow; trust rules + cix-workspace-investigator subagent.
Docs: privacy pass anonymizing examples in tests and docs; new workspaces.md guide; dashboard notes in README.
Search calibration: search defaults tuned, chunks/panel consistency fix, repo-name truncation in picker.
CI (last commit): new prerelease-server.yml workflow builds the CUDA-only image on push to develop and pushes dvcdsys/code-index:develop-cu128. ci-server.yml / ci-cli.yml now also gate PRs into develop.

Test plan

server vet/test/build CI passes on this PR (now gated on develop PRs too).
cli vet/test/build CI passes.
After merge into develop: prerelease workflow triggers and pushes dvcdsys/code-index:develop-cu128 to Docker Hub.
Pull the new tag on the RTX 3090 prod box and run a hybrid search end-to-end against a real workspace; check nvidia-smi shows GPU memory used (silent CPU fallback is a real failure mode).
Spot-check dashboard add-repo flow against a fresh repo.

🤖 Generated with Claude Code

First slice of the workspaces feature branch. Gated by CIX_WORKSPACES_ENABLED — every new endpoint returns 503 when off, so existing deployments are unaffected. New tables: workspaces, github_tokens. New packages: internal/secrets (AES-256-GCM at rest, key from CIX_SECRET_KEY / CIX_SECRET_KEYFILE / auto-generated 0600 keyfile), internal/workspaces, internal/githubtokens. New endpoints: full CRUD over /api/v1/workspaces and /api/v1/github-tokens with the canonical {"detail": "..."} error envelope. Plaintext PATs are never echoed — POST returns metadata only. Dashboard gets two placeholder modules (Workspaces, GitHub Tokens) that render the full CRUD flow against the new endpoints and self-hide behind a "feature off" alert when the flag is false. Subsequent PRs of feature/workspaces add workspace_repos, jobs+workers, webhook receiver, call-graph extraction, Louvain communities, two-stage search, and the cix:workspace skill. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Includes: - workspaces / github_tokens schema (gated by CIX_WORKSPACES_ENABLED) - AES-256-GCM at-rest encryption (internal/secrets) - Full CRUD over /api/v1/workspaces and /api/v1/github-tokens - Dashboard placeholder modules for both - Unit + integration tests (plaintext-leak gate)

Adds the bridge from a GitHub URL to an indexed cix project. Operator attaches a repo to a workspace via POST /workspaces/{id}/repos; the server enqueues a clone_repo job (worker clones via go-git), then chains an index_repo job that drives the existing 3-phase indexer in-process against the on-disk clone. New packages: - internal/jobs persistent SQLite-backed worker pool with partial-unique dedupe (50 webhook bursts collapse to 1 pending row), per-attempt linear backoff, panic-safe handler invocation - internal/repocloner go-git wrapper — shallow clone with PAT auth via x-access-token, in-process so distroless images don't need a git binary; fetch+reset on reuse - internal/repoindexer walks the clone, batches FilePayloads, calls indexer.BeginIndexing/ProcessFiles/Finish. Filter prunes node_modules/.git/etc., skips binaries (NUL probe) and oversized files. - internal/workspacerepos service layer for workspace_repos rows - internal/workspacejobs handler registration that wires the above packages into the jobs queue New endpoints (gated by CIX_WORKSPACES_ENABLED): - GET /workspaces/{id}/repos - POST /workspaces/{id}/repos (returns one-shot webhook secret) - DELETE /workspaces/{id}/repos/{repo_id} - POST /workspaces/{id}/repos/{repo_id}/reindex - GET /jobs New env vars: CIX_WORKER_CONCURRENCY (default 2), CIX_WORKSPACES_DATA_DIR (default <sqlite-parent>/repos), CIX_PUBLIC_URL (used to build webhook URLs surfaced to operators). Webhook receiver / HMAC validation lands in PR3; call graph + Louvain communities + two-stage search in PR4–PR6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

End-to-end pipeline from POST /workspaces/{id}/repos through cloned + indexed project, with: - workspace_repos + jobs schema (dedupe via partial-unique index) - SQLite-backed worker pool (panic-safe, retry backoff) - go-git clone wrapper (works in distroless images) - in-process indexer driver that reuses the existing 3-phase protocol - repo CRUD + reindex + jobs list endpoints (gated by feature flag) - 7 HTTP integration tests, jobs unit tests, filter unit tests

…gister) Closes the loop from a push on GitHub to an updated cix index. A new public endpoint accepts deliveries, validates HMAC-SHA256 against the per-row webhook_secret, and enqueues the same clone_repo job PR2 introduced — go-git's CloneOrFetch already handles the incremental fetch+reset path, so no new job type is needed. The dashboard's add-repo flow now exposes an `auto_webhook` toggle. When true, the server uses the supplied PAT to POST /repos/.../hooks on the operator's behalf and persists the resulting hook id. Failure is non-fatal — the response carries `auto_registered: false` plus an operator-facing note (e.g. "missing admin:repo_hook scope"). Manual setup is the default and works without any extra GitHub scopes. New package internal/githubapi: a tiny raw-HTTP client for two GitHub endpoints (create webhook, delete webhook). Pulling go-github for just these two calls would have added ~10MB of generated code. New endpoints: - POST /api/v1/webhooks/github/{repo_id} (public; HMAC-auth) - GET /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info Tests cover: HMAC happy path, mismatched/missing signatures (401), ping deliveries (200), wrong-branch pushes (ignored), burst-dedupe on multiple deliveries collapsing to one job, public-path bypass of the auth middleware, and the auto-register-fails-cleanly-without-public-URL branch. doc/WORKSPACES.md is a new operator guide — feature flags, encryption key resolution, Cloudflare tunnel quick-start, manual + auto webhook flows, troubleshooting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

End-to-end webhook delivery → reindex with HMAC validation, optional auto-register against the GitHub API, manual setup UX, and an operator guide (doc/WORKSPACES.md) with Cloudflare tunnel walkthrough.

Approximate caller→callee graph extracted from the existing symbols + refs tables. The result feeds Louvain community detection in PR5; the eval harness gates that downstream work behind a precision-floor check. Approach (refs heuristic): - caller resolved as the narrowest function/method whose [line, end_line] span contains the ref's line - callee candidates resolved by name lookup on symbols (kind ∈ function, method) constrained to the same project - weight = 1 / popcount(callee_name) — so common names like init/run/handle contribute proportionally less to the structural signal - popcount > 20 → name dropped (treated as noise) - same-file bonus ×2.0, same-parent_name bonus ×1.5 - self-edges (recursion) dropped — they don't help community separation - duplicate (caller, callee) pairs accumulate weight via map then bulk INSERT inside a single transaction Integration: workspacejobs.handleIndex calls callgraph.Build after a successful FinishIndexing — non-fatal (failure logs but doesn't flip the repo status to failed; semantic search continues to work without the graph). Eval harness — internal/callgraph/eval/ — runs three fixtures (Go/Python/TypeScript) through the full chunker → persist → build path and asserts the labeled (caller, callee) pairs all show up in call_edges. Current results: go-handlers 4/4 precision 1.00 python-pipeline 6/6 precision 1.00 typescript-store 5/5 precision 1.00 All three comfortably above the 0.60 floor — no need to fall back to the symbol co-occurrence graph (callgraph.SourceCoOccurrence is in the table for future swapping). Greenlights PR5 (Louvain communities). 9 unit tests covering: single-edge happy path, popcount drop, module-scope refs skipped, self-edges dropped, cross-file weight, same-parent bonus, weight accumulation, idempotency, edge counting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ross Go/Python/TS, greenlights PR5 Louvain

The structural layer that powers PR6's two-stage workspace search. Every workspace's combined call_edges graph is partitioned into Louvain communities; each community gets a mean-pooled, L2-normalised embedding stored in a dedicated chromem collection (ws_{md5}_centroids). New package internal/communities — gonum/graph/community Louvain with deterministic seed (Resolution=1.0, seed=42). Empty workspaces and empty graphs are handled cleanly; output is wholesale-replaced on each rebuild so partial failures can't leave stale state. New tables: communities (id, workspace_id, label, size, parent_id), community_members (community_id, project_path, symbol_id). Wholesale delete + reinsert per rebuild inside a single transaction. New vectorstore methods: - CentroidCollectionName(workspaceID) - ReplaceCentroids — drops + recreates the workspace's chromem collection in lock-step with the SQL rebuild - SearchCentroids — top-K nearest-neighbor against the centroid collection (the stage-1 query for PR6) - FetchProjectChunkEmbeddings — by-symbol-name lookup used during mean-pooling. chromem's where filter is single-equality so we make one query per name (bounded by community member count, typically <200). Job pipeline: - New type "compute_workspace_communities" with debounce key "communities:{workspace_id}" — burst-safe via the existing partial-unique index on jobs.dedupe_key. - index_repo handler chains EnqueueComputeCommunities(workspace_id) with a 30s scheduled_at delay, so a wave of repos finishing indexing during catch-up collapses into one Louvain rebuild. Tests: 6 unit tests covering Build (two-cluster split, empty workspace, idempotency, cross-project tracking) + meanPool/l2Normalise helpers. Eval gate from PR4 already cleared at 100% precision — Louvain runs against a high-quality graph by construction. Deferred to a future iteration (cheap to revisit): - Recursive split for communities >50 chunks - Small-community merging - Overlapping community detection (BigCLAM, etc.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

GET /api/v1/workspaces/{id}/search?q=... is the user-facing payoff of the workspaces feature. Two-stage retrieval: Stage 1: embed query → SearchCentroids → top N communities (default 5) Stage 2: for each (community, project_path), one chromem query against the per-project chunk collection with the user's embedding; filter results in-memory to members of the community by symbol_name; merge globally, dedupe by (project, file, startLine, endLine), return top K (default 20). Why filter in-memory instead of pushing where: chromem's where clause is single-equality only — pushing per-symbol-name filters would mean N queries per (community, project). Stage-2 fan-out is bounded by top_communities × #project_paths_per_community ≈ 5 × 3 = 15 queries per workspace search, comfortably under 500ms p50. Response shape (WorkspaceSearchResponse): - status: "ok" | "communities_not_built" | "empty" - communities: top-N centroids with score, label, project_paths - chunks: merged ranking with project_path, file, lines, score, community attribution When the workspace has no centroid index yet (e.g. just-created workspace, debounced compute_workspace_communities hasn't fired), the endpoint returns `status: "communities_not_built"` with empty arrays — dashboard UI can render a hint instead of an error. Tests: 4 HTTP integration tests covering the empty-centroid branch, missing query parameter, unknown workspace id, and disabled feature flag. A stub embedder lets us reach stage 1 without standing up the llama-server sidecar in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Final slice of the workspaces feature branch. CLI (cli/): - New parent command `cix workspace` (alias `cix ws`) - `cix workspace list` — lists every workspace on the cix-server - `cix workspace search <ws> <query>` — runs the two-stage search - --top-communities N (default 5) - --top-chunks K (default 20) - --json — raw response for piping - Workspace identifier accepts either the opaque id or the name (case-insensitive); resolution is one `cix workspace list` round trip cached per-process. Skill (skills/cix-workspace/SKILL.md): - Markdown frontmatter user-invocable skill, mirroring the `cix` skill's style guide. - Trigger phrasing tuned to the use case: cross-repo questions, microservice flows, frontend+backend pairs. - Explains the two-stage mental model + when to fall back to plain `cix search` inside a single repo. - Troubleshooting for `communities_not_built`, empty results, 503. Dashboard (server/dashboard/src/modules/workspaces/): - Search icon button on every workspace row opens a dialog hosting the full two-stage search UI: query input → top communities list (label, score, member count, project_paths) → top chunks (file, lines, project, symbol, score, content snippet). - Status-aware empty states: explicit message when the centroid index hasn't built yet ("wait ~30s after the last index_repo"). Tests pass on both server and CLI. The feature branch is now ready to merge to main as one large PR per the user's PR strategy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…xpansion) User-facing refactor of the workspace surface so operators (and the agent skill) can explore before searching. CLI grammar — name-first, manual dispatch under one `cix ws` parent: cix ws → list workspaces cix ws list [--verbose|--json] → list workspaces (alternate) cix ws <name> → describe — repos + status + indexed count cix ws <name> list → list repos in workspace cix ws <name> repos → alias for `<name> list` cix ws <name> describe → same as bare `<name>` cix ws <name> search <query> → two-stage workspace search Why manual dispatch rather than cobra subcommands: the workspace NAME needs to sit in the first positional slot. Cobra can't recognise a dynamic value as a command, so we use cobra.ArbitraryArgs + a small switch inside RunE. Trade-off: no auto-completion on the name. In exchange, the surface reads the way operators think. Status badges in `describe` / verbose `list`: ✓ indexed ✗ failed … pending/cloning/indexing Client: adds Client.ListWorkspaceRepos for the new verbs to consume. The /workspaces/{id}/repos endpoint is already there (PR2) — this just exposes it. Dashboard: each workspace row is now expandable. Click the chevron → lazy-loads attached repos, each shown with status colour, branch, project_path, last_indexed_at, and last_error. The Search button on the row still opens the existing two-stage search dialog. SKILL.md: documents the new grammar + adds a "Discovery-first workflow" pattern at the top of Patterns. The point of the new verbs from an agent's perspective is to know whether a workspace is searchable before paying the search round-trip — `cix ws <name>` tells you indexed-count and lists repos in one call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…po expansion)

The dashboard form was asking users to type the token's scopes by hand, but scopes are an attribute of the PAT set on github.com — typed input is just unverified text that can drift from what GitHub will actually enforce. The codepath was a leftover from a deferred validation step. Now the server validates every newly submitted PAT with GET /user and reads the real scopes from the X-OAuth-Scopes response header. A 401 from GitHub turns into a 422 with the surfaced message, anything else into a 502, so an invalid or unreachable token is rejected at the door rather than persisted and discovered later. Fine-grained PATs (github_pat_*) don't expose scopes via this header — for them Scopes stays empty and the dashboard displays "(fine-grained or none)". The Scopes field on CreateGithubTokenRequest is marked deprecated and ignored on the server; the dashboard's Scopes input is removed. Existing tests are updated and a TestGithubTokens_RejectInvalidToken case asserts the 401-path rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The dashboard's workspace view was a stub that listed repos read-only; attaching a new repo only worked via curl. This wires up the actual UX: a card grid that mirrors the projects page on the list, a per-workspace detail page, and a staged add-repo dialog that walks the operator through token → repo → branch → webhook policy. Backend changes: - GET /api/v1/github-tokens/{id}/repos — reveals the PAT server-side, fetches the repos visible to it via /user/repos with Link-header pagination (up to 5 pages = 500 repos), optionally filtered by ?q=. The plaintext never touches the wire. - POST /api/v1/workspaces/{id}/repos now accepts webhook_mode of {manual, auto, disabled}. A new workspace_repos.webhook_mode column records the operator's intent; the legacy auto_webhook bool remains derived (true iff mode = "auto") so old clients keep working. Existing rows are backfilled to "auto" when auto_webhook=1. Frontend changes: - WorkspacesPage is a Routes shell now; list + detail are separate. - WorkspacesListPage renders Workspaces as cards (counts at-a-glance, in-progress / failed badges) — same visual language as projects. - WorkspaceDetailPage drives the per-workspace UX: an Add repo dialog with a staged form (each step unlocks the next), Reindex / Delete actions on each RepoCard, and background polling (3s) while any repo is in pending / cloning / indexing so the operator can watch the progress without F5. Each in-flight badge ticks an elapsed counter so it's visible that the job isn't silently stalled. - AddRepoDialog picks tokens, lists their visible repos with a client-side text filter, auto-fills branch from default_branch, and surfaces the webhook URL+secret once for manual mode. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ogress)

The repo picker only surfaced what `/user/repos` returned. That endpoint is the affiliations-aggregated view and routinely misses org repos — SAML-protected orgs in particular only appear under `/orgs/{login}/repos`. So a user with access to an org would see their personal account but not the org's repos, which is exactly what was hit in testing. Add a second selector between Token and Repository: - New `GET /api/v1/github-tokens/{id}/accounts` lists the PAT owner plus every org from `/user/orgs`. SAML-gated 403 on /user/orgs is swallowed so the personal account still comes through. - `GET /api/v1/github-tokens/{id}/repos` now accepts `?account=login` + `?account_type=user|org`. When set, the server hits `/users/{login}/repos` or `/orgs/{login}/repos` directly. When not set, it falls back to the original `/user/repos` aggregated view so existing callers keep working. Dashboard: - `AddRepoDialog` loads accounts as soon as a token is picked and renders a Select with "(all accessible)" plus each user/org. The repo list refetches whenever the account changes — typing through the picker now shows the org's repos directly. Tests: - Unit: ListAccounts (user + orgs), SAML-403 swallow, account-scoped repo endpoint dispatch (`/users/X` vs `/orgs/X`). - Integration: round-trips through the HTTP layer including the "no account_type with account" 422 rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Long full_names like atrybulkevychglobalgames/grpc-go-kubernetes-load- balancing-example were pushing the row past the dialog's max-width because Tailwind's truncate only works on a flex child that also has min-w-0. The name span had truncate but no shrink boundary, so it kept its intrinsic width and the branch span on ml-auto ended up off-screen. Wrap the name in min-w-0 flex-1 truncate, pin the icon and branch to shrink-0 so the row stays inside the dialog. Added a title= attribute on the button so hovering still surfaces the full path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

A trigram-tokenized FTS5 virtual table lives alongside chromem-go so workspace search can pair dense vector retrieval with sparse keyword retrieval. The sparse signal recovers two things pure-dense fan-out loses: short-token precision (acronyms like "XYZ" get diffuse cosine scores) and project-relevance gating (chromem returns the N nearest vectors regardless of semantic distance, leaving projects that share zero vocabulary with the query at chunk_score ~0.25 false-positive). chunks_fts can only filter by rowid; chunks_meta is the indexed shadow that lets us delete by (project_path, file_path) and project_path without a full FTS5 scan. The two stay consistent inside the indexer's per-file SQL transaction, and they cascade away on project deletion and on full-reindex wipe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Each project now runs dense (chromem cosine) and sparse (FTS5 BM25 over the chunks_fts mirror added in the previous commit) in parallel. Per project the two ranked lists are fused via Reciprocal Rank Fusion (k=60). Across projects an α-blended candidacy score (with per-query min-max normalization on both signals) plus a relative threshold (`candidacy >= best * 0.4`) gates the result set so projects that share no semantic and no lexical overlap with the query drop out entirely — pure-dense fan-out leaked every workspace repo at noise-level cosine similarity because chromem returns the N nearest vectors regardless of how far away "nearest" actually is. Live XYZ probe over 8 ACME repos: three repos with literally zero "XYZ" mentions previously surfaced 50 chunks each at dense scores 0.17-0.27. With the gate they drop out; the chunks list is then built by round-robin interleaving across surviving projects so each relevant repo gets its top hit before the dominant repo's tail entries appear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…a reindex Repos indexed before the chunks_fts mirror landed have file_hashes rows (chromem populated, project marked indexed) but an empty chunks_meta — the BM25 side of hybrid search returns nothing for them and the algorithm degrades to pure dense for those entries. Observable failure mode: live workspace shows the new bm25_score field at 0.000 for every project and the result set looks identical to the old pure-dense fan-out. WorkspaceSearch now probes chunks_meta vs file_hashes per project and bubbles stale repos up via a new stale_fts_repos field on the response. The dashboard renders a banner naming the affected repos with a reindex hint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…workflow Replaces the old centroid-routing playbook (which described tools we ripped out in the workspaces refactor) with a workflow that matches the hybrid BM25+dense algorithm: how to phrase queries so the BM25 gate fires, how to read project_score / bm25_score / dense_score, when to spawn parallel Explore sub-agents over surviving projects, and how to synthesize the per-repo change plan. The skill is goal-driven: every workspace-search interaction has to answer (1) which repos are in scope, (2) which code in those repos is relevant, (3) what changes need to land and in what order. It also names the "primary project" pattern — the agent is usually cd'd into one specific repo and the user's task is rooted there; workspace search defines the surrounding context. Includes a worked retro on the "Add sell flow to XYZ" failure that motivated the hybrid algorithm — pure-dense fan-out routed three zero-mention repos as relevant on noise-level cosine similarity. Aligns the CLI (`cix ws … search`) with the new server API: drops the `--top-communities` flag in favour of `--top-projects`, switches the response renderer to projects + bm25/dense breakdown, surfaces stale_fts_repos as an inline warning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces internal product/repo names that leaked from a real production debugging session into test fixtures, code comments, and the cix-workspace skill doc: - XYZ / XYZOrder / processXYZOrderEvent → XYZ / XYZOrder / processXYZOrderEvent - acme-backend / acme-shared / acme-models / acme-worker / acme-notifier / acme-directory / acme-inventory / acme-platform → acme-backend / acme-shared / acme-models / acme-worker / acme-notifier / acme-directory / acme-inventory / acme-platform - "internal product code" → "internal product code" - "shared-models migration", "shared data models" → generic shared-models / data-model phrasing - README .cixignore example switched from api/generated/ to api/generated/ Working-tree-only sanitization; a follow-up history rewrite will scrub the same strings from older commits. Tests green (chunksfts, db, httpapi, projectconfig). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ency Three changes to workspace and per-project search, plus targeted anonymization of eval-derived references in adjacent comments and test fixtures. Server behaviour: - Workspace search default `min_score` raised from 0 to 0.4, matching the per-project SemanticSearch default. Cross-project sweeps that need long-tail recall must now pass `min_score=0` explicitly. - Per-project SemanticSearch default `min_score` lowered from 0.4 to 0.2 — abstract NL queries (e.g. "end-to-end workflow lifecycle") used to silently return empty even when relevant chunks scored in [0.25, 0.35]. 0.2 keeps a light noise floor. - Fix: workspace `chunks[]` round-robin now uses only the projects that survived the `top_projects` truncation. Previously a 12-project workspace at default `top_projects=10` could surface chunks from the 11th/12th project that weren't in the `projects[]` panel — clients had no way to look up the chunk's bm25/dense scores. Tests added: - TestWorkspaceSearch_ChunksOnlyFromPanelProjects — 12 surviving repos + top_projects=10; every chunk's project must appear in the panel. - TestWorkspaceSearch_DefaultMinScoreIs04 — geometry calibrated so chunks at cos=0.3 are filtered by default and admitted at min_score=0. - TestSemanticSearch_DefaultMinScoreIs02 — fakeEmbedder geometry producing a cos≈0.25 chunk that the old default would have rejected. OpenAPI spec descriptions updated for both `min_score` defaults. Anonymization (carried over from previous workspace-eval analysis): adjacent comments and test fixtures that named specific repos / product acronyms / sell-flow scenarios are replaced with neutral placeholders (WIDGET / ping / generic repo descriptions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…subagent Two additions to the cix-workspace skill: 1. Ten "trust rules" for interpreting workspace_search responses, derived from internal calibration testing: - chunk.score>=0.4 trust threshold (rule 1) - chunk.score==0 = BM25-only literal match, not low confidence (rule 2) - top-1 of projects[] is correct ~70% of the time in real tasks (rule 3) - drop down to per-project search for depth (rule 4) - min_score=0 explicitly for cross-project sweeps (rule 5) - careful disambiguator selection — prefer meta-tokens over tech guesses (rule 6) - "change X in production" → manifests/config repo, not code repo (rule 7) - scan ranks 2-5 before reformulating (rule 8) - explicit min_score=0 for per-project NL drill-down (rule 9) - words live ≠ change location (rule 10) 2. Dedicated `cix-workspace-investigator` sub-agent at `skills/cix-workspace/agents/cix-workspace-investigator.md`: - Thin read-only shell around cix search/def/refs + Read + Grep - Scope-isolated: one repo per spawn, no edits, no recursion - Methodology + output format are the main agent's call per spawn, not baked into the sub-agent's system prompt - System prompt is ~60 lines; main agent's per-spawn prompt handles the actual task SKILL.md's "Sub-agent fan-out pattern" section rewritten around the new sub-agent with a four-part prompt template (task verbatim, project_path, seed chunks WITH the main agent's commentary, explicit deliverable) and an anti-patterns list. The existing worked example is preserved but rewritten without specific repo composition. skills/README.md updated with the bundled-subagent description and install command (additional copy into ~/.claude/agents/). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New workspaces.md at the repo root — sub-document linked from README.md. Covers everything an operator or agent needs to know about the workspaces feature: - What a workspace is, what's experimental about it today - Enabling via CIX_WORKSPACES_ENABLED + supplementary env vars - Concepts: owned vs linked repos, GitHub tokens, project path format - Quick start: end-to-end walkthrough with curl examples - Adding repositories (Dashboard staged dialog + REST + status transitions) - GitHub tokens lifecycle, AES-256-GCM at-rest encryption, scopes - Searching: Dashboard / `cix ws` CLI / REST endpoint with response shape - Search algorithm — pipeline diagram, tunable parameters table, min_score semantics, hybrid BM25+dense rationale, stale-FTS handling - Webhooks: disabled/manual/auto modes, HMAC signature, delivery endpoint - Strengths and weaknesses (honest assessment) - Configuration reference, REST API reference, troubleshooting - Agent integration pointer to cix-workspace skill README.md updated: - "What you get" bullet for Workspaces (experimental) with link to workspaces.md - Dashboard table gains two new rows: Workspaces and GitHub Tokens (both flagged experimental) - New "Workspaces and external repositories" subsection after the Disabled-embeddings mode subsection, summarising the feature and linking to workspaces.md - Agent Integration section adds the cix-workspace skill + bundled investigator subagent install snippet Feature is marked experimental in every public-facing reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- New workflow `.github/workflows/prerelease-server.yml`: on push to `develop`, builds `server/Dockerfile.cuda` (amd64) and pushes the floating `dvcdsys/code-index:develop-cu128` tag. CPU image is intentionally skipped — pre-release stages on RTX 3090 only. - Extend `ci-server.yml` / `ci-cli.yml` to also run on push and PR against `develop`, so vet/test/build gates pre-release merges the same way they gate main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Commit 33da39b accidentally removed the placeholder that makes the `//go:embed all:dist` directive in dashboard/embed.go resolve on a fresh clone (no `make dashboard-build`). `go vet ./...` then fails with `pattern all:dist: no matching files found`, breaking the CI gate on every PR. The root `.gitignore` already has a negation rule for this exact path; restoring the file is enough. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

dvcdsys and others added 30 commits May 11, 2026 16:57

Merge PR3: GitHub webhooks into feature branch

a5bab9c

End-to-end webhook delivery → reindex with HMAC validation, optional auto-register against the GitHub API, manual setup UX, and an operator guide (doc/WORKSPACES.md) with Cloudflare tunnel walkthrough.

Merge PR4: call_edges + eval harness — heuristic at 100% precision ac…

c207f35

…ross Go/Python/TS, greenlights PR5 Louvain

Merge PR5: Louvain communities + workspace centroids

8704e26

Merge PR6: two-stage workspace search endpoint

906a2ae

Merge PR7: CLI + skill + dashboard search — workspaces feature complete

a7b8812

Merge PR8: workspace discovery (name-first CLI grammar + dashboard re…

2c8984d

…po expansion)

Merge PR9: derive GitHub token scopes from the API, not user input

1072a0e

Merge PR10: in-dashboard add-repo flow (cards + staged form + live pr…

90955fc

…ogress)

Merge PR11: account/org selector in add-repo flow

7ec05db

dvcdsys and others added 3 commits May 13, 2026 17:36

dvcdsys merged commit a2b60f2 into develop May 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workspaces): hybrid BM25+dense workspace search + CUDA prerelease CI#38

feat(workspaces): hybrid BM25+dense workspace search + CUDA prerelease CI#38
dvcdsys merged 33 commits into
developfrom
feature/workspaces

dvcdsys commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dvcdsys commented May 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant