feat(workspaces): hybrid BM25+dense workspace search + CUDA prerelease CI#38
Merged
Conversation
First slice of the workspaces feature branch. Gated by
CIX_WORKSPACES_ENABLED — every new endpoint returns 503 when off, so
existing deployments are unaffected.
New tables: workspaces, github_tokens. New packages: internal/secrets
(AES-256-GCM at rest, key from CIX_SECRET_KEY / CIX_SECRET_KEYFILE /
auto-generated 0600 keyfile), internal/workspaces, internal/githubtokens.
New endpoints: full CRUD over /api/v1/workspaces and /api/v1/github-tokens
with the canonical {"detail": "..."} error envelope. Plaintext PATs are
never echoed — POST returns metadata only.
Dashboard gets two placeholder modules (Workspaces, GitHub Tokens) that
render the full CRUD flow against the new endpoints and self-hide behind
a "feature off" alert when the flag is false.
Subsequent PRs of feature/workspaces add workspace_repos, jobs+workers,
webhook receiver, call-graph extraction, Louvain communities, two-stage
search, and the cix:workspace skill.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Includes: - workspaces / github_tokens schema (gated by CIX_WORKSPACES_ENABLED) - AES-256-GCM at-rest encryption (internal/secrets) - Full CRUD over /api/v1/workspaces and /api/v1/github-tokens - Dashboard placeholder modules for both - Unit + integration tests (plaintext-leak gate)
Adds the bridge from a GitHub URL to an indexed cix project. Operator
attaches a repo to a workspace via POST /workspaces/{id}/repos; the
server enqueues a clone_repo job (worker clones via go-git), then
chains an index_repo job that drives the existing 3-phase indexer
in-process against the on-disk clone.
New packages:
- internal/jobs persistent SQLite-backed worker pool with
partial-unique dedupe (50 webhook bursts collapse
to 1 pending row), per-attempt linear backoff,
panic-safe handler invocation
- internal/repocloner go-git wrapper — shallow clone with PAT auth via
x-access-token, in-process so distroless images
don't need a git binary; fetch+reset on reuse
- internal/repoindexer walks the clone, batches FilePayloads, calls
indexer.BeginIndexing/ProcessFiles/Finish.
Filter prunes node_modules/.git/etc., skips
binaries (NUL probe) and oversized files.
- internal/workspacerepos service layer for workspace_repos rows
- internal/workspacejobs handler registration that wires the above
packages into the jobs queue
New endpoints (gated by CIX_WORKSPACES_ENABLED):
- GET /workspaces/{id}/repos
- POST /workspaces/{id}/repos (returns one-shot webhook secret)
- DELETE /workspaces/{id}/repos/{repo_id}
- POST /workspaces/{id}/repos/{repo_id}/reindex
- GET /jobs
New env vars: CIX_WORKER_CONCURRENCY (default 2),
CIX_WORKSPACES_DATA_DIR (default <sqlite-parent>/repos), CIX_PUBLIC_URL
(used to build webhook URLs surfaced to operators).
Webhook receiver / HMAC validation lands in PR3; call graph + Louvain
communities + two-stage search in PR4–PR6.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end pipeline from POST /workspaces/{id}/repos through cloned +
indexed project, with:
- workspace_repos + jobs schema (dedupe via partial-unique index)
- SQLite-backed worker pool (panic-safe, retry backoff)
- go-git clone wrapper (works in distroless images)
- in-process indexer driver that reuses the existing 3-phase protocol
- repo CRUD + reindex + jobs list endpoints (gated by feature flag)
- 7 HTTP integration tests, jobs unit tests, filter unit tests
…gister)
Closes the loop from a push on GitHub to an updated cix index. A new
public endpoint accepts deliveries, validates HMAC-SHA256 against the
per-row webhook_secret, and enqueues the same clone_repo job PR2
introduced — go-git's CloneOrFetch already handles the incremental
fetch+reset path, so no new job type is needed.
The dashboard's add-repo flow now exposes an `auto_webhook` toggle.
When true, the server uses the supplied PAT to POST /repos/.../hooks
on the operator's behalf and persists the resulting hook id. Failure
is non-fatal — the response carries `auto_registered: false` plus an
operator-facing note (e.g. "missing admin:repo_hook scope"). Manual
setup is the default and works without any extra GitHub scopes.
New package internal/githubapi: a tiny raw-HTTP client for two GitHub
endpoints (create webhook, delete webhook). Pulling go-github for just
these two calls would have added ~10MB of generated code.
New endpoints:
- POST /api/v1/webhooks/github/{repo_id} (public; HMAC-auth)
- GET /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info
Tests cover: HMAC happy path, mismatched/missing signatures (401),
ping deliveries (200), wrong-branch pushes (ignored), burst-dedupe on
multiple deliveries collapsing to one job, public-path bypass of the
auth middleware, and the auto-register-fails-cleanly-without-public-URL
branch.
doc/WORKSPACES.md is a new operator guide — feature flags, encryption
key resolution, Cloudflare tunnel quick-start, manual + auto webhook
flows, troubleshooting.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end webhook delivery → reindex with HMAC validation, optional auto-register against the GitHub API, manual setup UX, and an operator guide (doc/WORKSPACES.md) with Cloudflare tunnel walkthrough.
Approximate caller→callee graph extracted from the existing symbols + refs tables. The result feeds Louvain community detection in PR5; the eval harness gates that downstream work behind a precision-floor check. Approach (refs heuristic): - caller resolved as the narrowest function/method whose [line, end_line] span contains the ref's line - callee candidates resolved by name lookup on symbols (kind ∈ function, method) constrained to the same project - weight = 1 / popcount(callee_name) — so common names like init/run/handle contribute proportionally less to the structural signal - popcount > 20 → name dropped (treated as noise) - same-file bonus ×2.0, same-parent_name bonus ×1.5 - self-edges (recursion) dropped — they don't help community separation - duplicate (caller, callee) pairs accumulate weight via map then bulk INSERT inside a single transaction Integration: workspacejobs.handleIndex calls callgraph.Build after a successful FinishIndexing — non-fatal (failure logs but doesn't flip the repo status to failed; semantic search continues to work without the graph). Eval harness — internal/callgraph/eval/ — runs three fixtures (Go/Python/TypeScript) through the full chunker → persist → build path and asserts the labeled (caller, callee) pairs all show up in call_edges. Current results: go-handlers 4/4 precision 1.00 python-pipeline 6/6 precision 1.00 typescript-store 5/5 precision 1.00 All three comfortably above the 0.60 floor — no need to fall back to the symbol co-occurrence graph (callgraph.SourceCoOccurrence is in the table for future swapping). Greenlights PR5 (Louvain communities). 9 unit tests covering: single-edge happy path, popcount drop, module-scope refs skipped, self-edges dropped, cross-file weight, same-parent bonus, weight accumulation, idempotency, edge counting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ross Go/Python/TS, greenlights PR5 Louvain
The structural layer that powers PR6's two-stage workspace search.
Every workspace's combined call_edges graph is partitioned into Louvain
communities; each community gets a mean-pooled, L2-normalised embedding
stored in a dedicated chromem collection (ws_{md5}_centroids).
New package internal/communities — gonum/graph/community Louvain with
deterministic seed (Resolution=1.0, seed=42). Empty workspaces and
empty graphs are handled cleanly; output is wholesale-replaced on each
rebuild so partial failures can't leave stale state.
New tables: communities (id, workspace_id, label, size, parent_id),
community_members (community_id, project_path, symbol_id). Wholesale
delete + reinsert per rebuild inside a single transaction.
New vectorstore methods:
- CentroidCollectionName(workspaceID)
- ReplaceCentroids — drops + recreates the workspace's chromem
collection in lock-step with the SQL rebuild
- SearchCentroids — top-K nearest-neighbor against the centroid
collection (the stage-1 query for PR6)
- FetchProjectChunkEmbeddings — by-symbol-name lookup used during
mean-pooling. chromem's where filter is single-equality so we make
one query per name (bounded by community member count, typically <200).
Job pipeline:
- New type "compute_workspace_communities" with debounce key
"communities:{workspace_id}" — burst-safe via the existing
partial-unique index on jobs.dedupe_key.
- index_repo handler chains EnqueueComputeCommunities(workspace_id)
with a 30s scheduled_at delay, so a wave of repos finishing
indexing during catch-up collapses into one Louvain rebuild.
Tests: 6 unit tests covering Build (two-cluster split, empty workspace,
idempotency, cross-project tracking) + meanPool/l2Normalise helpers.
Eval gate from PR4 already cleared at 100% precision — Louvain runs
against a high-quality graph by construction.
Deferred to a future iteration (cheap to revisit):
- Recursive split for communities >50 chunks
- Small-community merging
- Overlapping community detection (BigCLAM, etc.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/v1/workspaces/{id}/search?q=... is the user-facing payoff of
the workspaces feature. Two-stage retrieval:
Stage 1: embed query → SearchCentroids → top N communities (default 5)
Stage 2: for each (community, project_path), one chromem query against
the per-project chunk collection with the user's embedding;
filter results in-memory to members of the community by
symbol_name; merge globally, dedupe by (project, file,
startLine, endLine), return top K (default 20).
Why filter in-memory instead of pushing where: chromem's where clause
is single-equality only — pushing per-symbol-name filters would mean N
queries per (community, project). Stage-2 fan-out is bounded by
top_communities × #project_paths_per_community ≈ 5 × 3 = 15 queries
per workspace search, comfortably under 500ms p50.
Response shape (WorkspaceSearchResponse):
- status: "ok" | "communities_not_built" | "empty"
- communities: top-N centroids with score, label, project_paths
- chunks: merged ranking with project_path, file, lines, score,
community attribution
When the workspace has no centroid index yet (e.g. just-created
workspace, debounced compute_workspace_communities hasn't fired),
the endpoint returns `status: "communities_not_built"` with empty
arrays — dashboard UI can render a hint instead of an error.
Tests: 4 HTTP integration tests covering the empty-centroid branch,
missing query parameter, unknown workspace id, and disabled feature
flag. A stub embedder lets us reach stage 1 without standing up the
llama-server sidecar in CI.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final slice of the workspaces feature branch.
CLI (cli/):
- New parent command `cix workspace` (alias `cix ws`)
- `cix workspace list` — lists every workspace on the cix-server
- `cix workspace search <ws> <query>` — runs the two-stage search
- --top-communities N (default 5)
- --top-chunks K (default 20)
- --json — raw response for piping
- Workspace identifier accepts either the opaque id or the name
(case-insensitive); resolution is one `cix workspace list` round
trip cached per-process.
Skill (skills/cix-workspace/SKILL.md):
- Markdown frontmatter user-invocable skill, mirroring the `cix`
skill's style guide.
- Trigger phrasing tuned to the use case: cross-repo questions,
microservice flows, frontend+backend pairs.
- Explains the two-stage mental model + when to fall back to plain
`cix search` inside a single repo.
- Troubleshooting for `communities_not_built`, empty results, 503.
Dashboard (server/dashboard/src/modules/workspaces/):
- Search icon button on every workspace row opens a dialog hosting
the full two-stage search UI: query input → top communities list
(label, score, member count, project_paths) → top chunks (file,
lines, project, symbol, score, content snippet).
- Status-aware empty states: explicit message when the centroid
index hasn't built yet ("wait ~30s after the last index_repo").
Tests pass on both server and CLI. The feature branch is now ready
to merge to main as one large PR per the user's PR strategy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…xpansion)
User-facing refactor of the workspace surface so operators (and the
agent skill) can explore before searching.
CLI grammar — name-first, manual dispatch under one `cix ws` parent:
cix ws → list workspaces
cix ws list [--verbose|--json] → list workspaces (alternate)
cix ws <name> → describe — repos + status + indexed count
cix ws <name> list → list repos in workspace
cix ws <name> repos → alias for `<name> list`
cix ws <name> describe → same as bare `<name>`
cix ws <name> search <query> → two-stage workspace search
Why manual dispatch rather than cobra subcommands: the workspace NAME
needs to sit in the first positional slot. Cobra can't recognise a
dynamic value as a command, so we use cobra.ArbitraryArgs + a small
switch inside RunE. Trade-off: no auto-completion on the name. In
exchange, the surface reads the way operators think.
Status badges in `describe` / verbose `list`:
✓ indexed ✗ failed … pending/cloning/indexing
Client: adds Client.ListWorkspaceRepos for the new verbs to consume.
The /workspaces/{id}/repos endpoint is already there (PR2) — this
just exposes it.
Dashboard: each workspace row is now expandable. Click the chevron
→ lazy-loads attached repos, each shown with status colour, branch,
project_path, last_indexed_at, and last_error. The Search button
on the row still opens the existing two-stage search dialog.
SKILL.md: documents the new grammar + adds a "Discovery-first
workflow" pattern at the top of Patterns. The point of the new
verbs from an agent's perspective is to know whether a workspace
is searchable before paying the search round-trip — `cix ws <name>`
tells you indexed-count and lists repos in one call.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dashboard form was asking users to type the token's scopes by hand, but scopes are an attribute of the PAT set on github.com — typed input is just unverified text that can drift from what GitHub will actually enforce. The codepath was a leftover from a deferred validation step. Now the server validates every newly submitted PAT with GET /user and reads the real scopes from the X-OAuth-Scopes response header. A 401 from GitHub turns into a 422 with the surfaced message, anything else into a 502, so an invalid or unreachable token is rejected at the door rather than persisted and discovered later. Fine-grained PATs (github_pat_*) don't expose scopes via this header — for them Scopes stays empty and the dashboard displays "(fine-grained or none)". The Scopes field on CreateGithubTokenRequest is marked deprecated and ignored on the server; the dashboard's Scopes input is removed. Existing tests are updated and a TestGithubTokens_RejectInvalidToken case asserts the 401-path rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dashboard's workspace view was a stub that listed repos read-only;
attaching a new repo only worked via curl. This wires up the actual UX:
a card grid that mirrors the projects page on the list, a per-workspace
detail page, and a staged add-repo dialog that walks the operator
through token → repo → branch → webhook policy.
Backend changes:
- GET /api/v1/github-tokens/{id}/repos — reveals the PAT server-side,
fetches the repos visible to it via /user/repos with Link-header
pagination (up to 5 pages = 500 repos), optionally filtered by ?q=.
The plaintext never touches the wire.
- POST /api/v1/workspaces/{id}/repos now accepts webhook_mode of
{manual, auto, disabled}. A new workspace_repos.webhook_mode column
records the operator's intent; the legacy auto_webhook bool remains
derived (true iff mode = "auto") so old clients keep working.
Existing rows are backfilled to "auto" when auto_webhook=1.
Frontend changes:
- WorkspacesPage is a Routes shell now; list + detail are separate.
- WorkspacesListPage renders Workspaces as cards (counts at-a-glance,
in-progress / failed badges) — same visual language as projects.
- WorkspaceDetailPage drives the per-workspace UX: an Add repo dialog
with a staged form (each step unlocks the next), Reindex / Delete
actions on each RepoCard, and background polling (3s) while any
repo is in pending / cloning / indexing so the operator can watch
the progress without F5. Each in-flight badge ticks an elapsed
counter so it's visible that the job isn't silently stalled.
- AddRepoDialog picks tokens, lists their visible repos with a
client-side text filter, auto-fills branch from default_branch,
and surfaces the webhook URL+secret once for manual mode.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The repo picker only surfaced what `/user/repos` returned. That endpoint
is the affiliations-aggregated view and routinely misses org repos —
SAML-protected orgs in particular only appear under `/orgs/{login}/repos`.
So a user with access to an org would see their personal account but
not the org's repos, which is exactly what was hit in testing.
Add a second selector between Token and Repository:
- New `GET /api/v1/github-tokens/{id}/accounts` lists the PAT owner
plus every org from `/user/orgs`. SAML-gated 403 on /user/orgs is
swallowed so the personal account still comes through.
- `GET /api/v1/github-tokens/{id}/repos` now accepts `?account=login`
+ `?account_type=user|org`. When set, the server hits
`/users/{login}/repos` or `/orgs/{login}/repos` directly. When not
set, it falls back to the original `/user/repos` aggregated view
so existing callers keep working.
Dashboard:
- `AddRepoDialog` loads accounts as soon as a token is picked and
renders a Select with "(all accessible)" plus each user/org. The
repo list refetches whenever the account changes — typing through
the picker now shows the org's repos directly.
Tests:
- Unit: ListAccounts (user + orgs), SAML-403 swallow, account-scoped
repo endpoint dispatch (`/users/X` vs `/orgs/X`).
- Integration: round-trips through the HTTP layer including the
"no account_type with account" 422 rejection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Long full_names like atrybulkevychglobalgames/grpc-go-kubernetes-load- balancing-example were pushing the row past the dialog's max-width because Tailwind's truncate only works on a flex child that also has min-w-0. The name span had truncate but no shrink boundary, so it kept its intrinsic width and the branch span on ml-auto ended up off-screen. Wrap the name in min-w-0 flex-1 truncate, pin the icon and branch to shrink-0 so the row stays inside the dialog. Added a title= attribute on the button so hovering still surfaces the full path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A trigram-tokenized FTS5 virtual table lives alongside chromem-go so workspace search can pair dense vector retrieval with sparse keyword retrieval. The sparse signal recovers two things pure-dense fan-out loses: short-token precision (acronyms like "XYZ" get diffuse cosine scores) and project-relevance gating (chromem returns the N nearest vectors regardless of semantic distance, leaving projects that share zero vocabulary with the query at chunk_score ~0.25 false-positive). chunks_fts can only filter by rowid; chunks_meta is the indexed shadow that lets us delete by (project_path, file_path) and project_path without a full FTS5 scan. The two stay consistent inside the indexer's per-file SQL transaction, and they cascade away on project deletion and on full-reindex wipe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each project now runs dense (chromem cosine) and sparse (FTS5 BM25 over the chunks_fts mirror added in the previous commit) in parallel. Per project the two ranked lists are fused via Reciprocal Rank Fusion (k=60). Across projects an α-blended candidacy score (with per-query min-max normalization on both signals) plus a relative threshold (`candidacy >= best * 0.4`) gates the result set so projects that share no semantic and no lexical overlap with the query drop out entirely — pure-dense fan-out leaked every workspace repo at noise-level cosine similarity because chromem returns the N nearest vectors regardless of how far away "nearest" actually is. Live XYZ probe over 8 ACME repos: three repos with literally zero "XYZ" mentions previously surfaced 50 chunks each at dense scores 0.17-0.27. With the gate they drop out; the chunks list is then built by round-robin interleaving across surviving projects so each relevant repo gets its top hit before the dominant repo's tail entries appear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a reindex Repos indexed before the chunks_fts mirror landed have file_hashes rows (chromem populated, project marked indexed) but an empty chunks_meta — the BM25 side of hybrid search returns nothing for them and the algorithm degrades to pure dense for those entries. Observable failure mode: live workspace shows the new bm25_score field at 0.000 for every project and the result set looks identical to the old pure-dense fan-out. WorkspaceSearch now probes chunks_meta vs file_hashes per project and bubbles stale repos up via a new stale_fts_repos field on the response. The dashboard renders a banner naming the affected repos with a reindex hint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…workflow Replaces the old centroid-routing playbook (which described tools we ripped out in the workspaces refactor) with a workflow that matches the hybrid BM25+dense algorithm: how to phrase queries so the BM25 gate fires, how to read project_score / bm25_score / dense_score, when to spawn parallel Explore sub-agents over surviving projects, and how to synthesize the per-repo change plan. The skill is goal-driven: every workspace-search interaction has to answer (1) which repos are in scope, (2) which code in those repos is relevant, (3) what changes need to land and in what order. It also names the "primary project" pattern — the agent is usually cd'd into one specific repo and the user's task is rooted there; workspace search defines the surrounding context. Includes a worked retro on the "Add sell flow to XYZ" failure that motivated the hybrid algorithm — pure-dense fan-out routed three zero-mention repos as relevant on noise-level cosine similarity. Aligns the CLI (`cix ws … search`) with the new server API: drops the `--top-communities` flag in favour of `--top-projects`, switches the response renderer to projects + bm25/dense breakdown, surfaces stale_fts_repos as an inline warning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces internal product/repo names that leaked from a real production
debugging session into test fixtures, code comments, and the
cix-workspace skill doc:
- XYZ / XYZOrder / processXYZOrderEvent → XYZ / XYZOrder / processXYZOrderEvent
- acme-backend / acme-shared / acme-models / acme-worker /
acme-notifier / acme-directory / acme-inventory / acme-platform
→ acme-backend / acme-shared / acme-models / acme-worker /
acme-notifier / acme-directory / acme-inventory / acme-platform
- "internal product code" → "internal product code"
- "shared-models migration", "shared data models" → generic
shared-models / data-model phrasing
- README .cixignore example switched from api/generated/ to
api/generated/
Working-tree-only sanitization; a follow-up history rewrite will scrub
the same strings from older commits. Tests green (chunksfts, db,
httpapi, projectconfig).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ency Three changes to workspace and per-project search, plus targeted anonymization of eval-derived references in adjacent comments and test fixtures. Server behaviour: - Workspace search default `min_score` raised from 0 to 0.4, matching the per-project SemanticSearch default. Cross-project sweeps that need long-tail recall must now pass `min_score=0` explicitly. - Per-project SemanticSearch default `min_score` lowered from 0.4 to 0.2 — abstract NL queries (e.g. "end-to-end workflow lifecycle") used to silently return empty even when relevant chunks scored in [0.25, 0.35]. 0.2 keeps a light noise floor. - Fix: workspace `chunks[]` round-robin now uses only the projects that survived the `top_projects` truncation. Previously a 12-project workspace at default `top_projects=10` could surface chunks from the 11th/12th project that weren't in the `projects[]` panel — clients had no way to look up the chunk's bm25/dense scores. Tests added: - TestWorkspaceSearch_ChunksOnlyFromPanelProjects — 12 surviving repos + top_projects=10; every chunk's project must appear in the panel. - TestWorkspaceSearch_DefaultMinScoreIs04 — geometry calibrated so chunks at cos=0.3 are filtered by default and admitted at min_score=0. - TestSemanticSearch_DefaultMinScoreIs02 — fakeEmbedder geometry producing a cos≈0.25 chunk that the old default would have rejected. OpenAPI spec descriptions updated for both `min_score` defaults. Anonymization (carried over from previous workspace-eval analysis): adjacent comments and test fixtures that named specific repos / product acronyms / sell-flow scenarios are replaced with neutral placeholders (WIDGET / ping / generic repo descriptions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…subagent
Two additions to the cix-workspace skill:
1. Ten "trust rules" for interpreting workspace_search responses,
derived from internal calibration testing:
- chunk.score>=0.4 trust threshold (rule 1)
- chunk.score==0 = BM25-only literal match, not low confidence (rule 2)
- top-1 of projects[] is correct ~70% of the time in real tasks (rule 3)
- drop down to per-project search for depth (rule 4)
- min_score=0 explicitly for cross-project sweeps (rule 5)
- careful disambiguator selection — prefer meta-tokens over tech
guesses (rule 6)
- "change X in production" → manifests/config repo, not code repo
(rule 7)
- scan ranks 2-5 before reformulating (rule 8)
- explicit min_score=0 for per-project NL drill-down (rule 9)
- words live ≠ change location (rule 10)
2. Dedicated `cix-workspace-investigator` sub-agent at
`skills/cix-workspace/agents/cix-workspace-investigator.md`:
- Thin read-only shell around cix search/def/refs + Read + Grep
- Scope-isolated: one repo per spawn, no edits, no recursion
- Methodology + output format are the main agent's call per spawn,
not baked into the sub-agent's system prompt
- System prompt is ~60 lines; main agent's per-spawn prompt
handles the actual task
SKILL.md's "Sub-agent fan-out pattern" section rewritten around the
new sub-agent with a four-part prompt template (task verbatim,
project_path, seed chunks WITH the main agent's commentary, explicit
deliverable) and an anti-patterns list. The existing worked example is
preserved but rewritten without specific repo composition.
skills/README.md updated with the bundled-subagent description and
install command (additional copy into ~/.claude/agents/).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New workspaces.md at the repo root — sub-document linked from README.md. Covers everything an operator or agent needs to know about the workspaces feature: - What a workspace is, what's experimental about it today - Enabling via CIX_WORKSPACES_ENABLED + supplementary env vars - Concepts: owned vs linked repos, GitHub tokens, project path format - Quick start: end-to-end walkthrough with curl examples - Adding repositories (Dashboard staged dialog + REST + status transitions) - GitHub tokens lifecycle, AES-256-GCM at-rest encryption, scopes - Searching: Dashboard / `cix ws` CLI / REST endpoint with response shape - Search algorithm — pipeline diagram, tunable parameters table, min_score semantics, hybrid BM25+dense rationale, stale-FTS handling - Webhooks: disabled/manual/auto modes, HMAC signature, delivery endpoint - Strengths and weaknesses (honest assessment) - Configuration reference, REST API reference, troubleshooting - Agent integration pointer to cix-workspace skill README.md updated: - "What you get" bullet for Workspaces (experimental) with link to workspaces.md - Dashboard table gains two new rows: Workspaces and GitHub Tokens (both flagged experimental) - New "Workspaces and external repositories" subsection after the Disabled-embeddings mode subsection, summarising the feature and linking to workspaces.md - Agent Integration section adds the cix-workspace skill + bundled investigator subagent install snippet Feature is marked experimental in every public-facing reference. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- New workflow `.github/workflows/prerelease-server.yml`: on push to `develop`, builds `server/Dockerfile.cuda` (amd64) and pushes the floating `dvcdsys/code-index:develop-cu128` tag. CPU image is intentionally skipped — pre-release stages on RTX 3090 only. - Extend `ci-server.yml` / `ci-cli.yml` to also run on push and PR against `develop`, so vet/test/build gates pre-release merges the same way they gate main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commit 33da39b accidentally removed the placeholder that makes the `//go:embed all:dist` directive in dashboard/embed.go resolve on a fresh clone (no `make dashboard-build`). `go vet ./...` then fails with `pattern all:dist: no matching files found`, breaking the CI gate on every PR. The root `.gitignore` already has a negation rule for this exact path; restoring the file is enough. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the workspaces feature track on top of the new
developpre-release branch:workspace_repos, jobs, clone/index pipeline, GitHub webhooks, call-edges + eval harness, Louvain communities + workspace centroids, two-stage workspace search endpoint, CLI + skill + dashboard search dialog, name-first CLI grammar, in-dashboard add-repo flow with live progress, GitHub token scope derivation, account/org selector.cix-workspaceskill rebuilt around the hybrid + 3-question workflow; trust rules +cix-workspace-investigatorsubagent.workspaces.mdguide; dashboard notes in README.prerelease-server.ymlworkflow builds the CUDA-only image on push todevelopand pushesdvcdsys/code-index:develop-cu128.ci-server.yml/ci-cli.ymlnow also gate PRs intodevelop.Test plan
servervet/test/build CI passes on this PR (now gated ondevelopPRs too).clivet/test/build CI passes.develop: prerelease workflow triggers and pushesdvcdsys/code-index:develop-cu128to Docker Hub.nvidia-smishows GPU memory used (silent CPU fallback is a real failure mode).🤖 Generated with Claude Code