diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..3d3a9a5 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,18 @@ +## Linked issue + +Closes # + + + +## Summary + + + +## Checklist + +- [ ] I have read [CONTRIBUTING.md](../blob/main/CONTRIBUTING.md) +- [ ] This PR is linked to an existing issue (above) +- [ ] `make test` passes +- [ ] `make lint` and `make typecheck` pass (or `make pre-commit`) +- [ ] Added or updated tests for any behaviour changes +- [ ] Updated docstrings / docs for any public API changes diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 547cac2..61da255 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,27 +4,28 @@ Thanks for your interest in semble. This document explains how contributions wor ## tl;dr -- **Bug fix or typo?** Open a PR directly. -- **New feature or behaviour change?** Open an issue first to discuss with us. +- **Every PR must link to an existing issue.** Open an issue to discuss before writing code, then link it from your PR (e.g. `Closes #123`). - **AI-generated PRs** will be closed without review if they weren't discussed beforehand. --- ## Discuss before building -Our libraries are small and focused by design. We care a lot about keeping it that way. Before you invest time writing code for a new feature, please open an issue describing: +Our libraries are small and focused by design. We care a lot about keeping it that way. Before you invest time writing code, please open an issue describing: - What problem you're solving - Why it belongs in semble (as opposed to a wrapper or separate tool) -- What API or behaviour change it would involve +- What API or behaviour change it would involve, if any - A minimal (code) example of how it would work -**PRs that add features without a prior issue will be closed.** +This applies to small PRs (e.g. bug fixes and documentation updates) as well. A quick issue lets us confirm the fix is wanted and aligned with how we'd want to solve it, so you don't waste time on a PR we'd need to reject or rework. -## What we welcome +**PRs without a linked issue will be closed.** -- Bug fixes (with a test that reproduces the issue) -- Documentation improvements and example fixes +## What we generally welcome + +- Bug fixes (with a linked issue and a test that reproduces the issue) +- Documentation improvements and example fixes (with a linked issue) ## What we generally won't accept @@ -47,6 +48,7 @@ If you want a feature, include the things listed under "Discuss before building" Before opening a PR: +- [ ] Link to an existing issue (e.g. `Closes #123`). PRs without one will be closed - [ ] Run `make test` and confirm all tests pass - [ ] Run `make lint` and `make typecheck` - [ ] Run `make fix` to auto-fix any lint issues diff --git a/README.md b/README.md index e6e1b5a..6a583c2 100644 --- a/README.md +++ b/README.md @@ -18,35 +18,41 @@ [Quickstart](#quickstart) • [MCP Server](#mcp-server) • -[Bash / AGENTS.md](#bash-agentsmd) • +[AGENTS.md](#agentsmd) • [CLI](#cli) • [Benchmarks](#benchmarks) -Semble is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](#benchmarks)). Everything runs on CPU with no API keys, GPU, or external services. Run it as an [MCP server](#mcp-server) or call it from the shell via [AGENTS.md](#bash-agentsmd) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo. +Semble is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](#benchmarks)). Everything runs on CPU with no API keys, GPU, or external services. Run it as an [MCP server](#mcp-server) or call it from the shell via [AGENTS.md](#agentsmd) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo. ## Quickstart -Your agent queries Semble in natural language (e.g. `"How is authentication handled?"`) and gets back only the relevant code snippets, without grepping or reading full files. Set it up as an MCP server or via AGENTS.md: +Your agent queries Semble in natural language (e.g. `"How is authentication handled?"`) and gets back only the relevant code snippets, without grepping or reading full files. -### MCP (Claude Code) +Semble has three complementary setup paths. The recommended setup is using all three (but you can pick and choose based on your needs): -Add Semble to Claude Code (requires [uv](https://docs.astral.sh/uv/getting-started/installation/)): +- **[MCP server](#mcp-server)**: an MCP server for your agent. +- **[AGENTS.md](#agentsmd)**: an AGENTS.md snippet with instructions for calling Semble via the CLI. +- **[Sub-agent](#sub-agent-setup)**: a dedicated `semble-search` sub-agent for harnesses that support it. + +### MCP + +Expose Semble as a native tool via MCP so your agent can call it directly. Add it to Claude Code (requires [uv](https://docs.astral.sh/uv/getting-started/installation/)): ```bash claude mcp add semble -s user -- uvx --from "semble[mcp]" semble ``` -Using another agent harness? See [MCP Server](#mcp-server) below for per-agent setup. +See [MCP Server](#mcp-server) below for other harnesses (Cursor, Codex, OpenCode, etc.). -### Bash / AGENTS.md +### AGENTS.md -Install Semble, then add the snippet below to your `AGENTS.md` or `CLAUDE.md`: +Add Semble usage instructions to your agent's context so it knows when and how to call the CLI. Install the Semble CLI, then add the snippet below to your `AGENTS.md` or `CLAUDE.md`: ```bash -pip install semble # Install with pip -uv tool install semble # Or install with uv +uv tool install semble # Install with uv (recommended) +pip install semble # Or with pip ```
@@ -109,15 +115,23 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
-Note that sub-agents cannot call MCP tools directly, see [Bash / AGENTS.md](#bash-agentsmd) and [sub-agent setup](#sub-agent-setup) below for details. +### Sub-agent + +For harnesses that support sub-agents, install a dedicated `semble-search` sub-agent so search runs in its own context (requires the CLI): + +```bash +semble init # Claude Code → .claude/agents/semble-search.md +``` + +See [Sub-agent setup](#sub-agent-setup) below for other harnesses (Cursor, Codex, OpenCode, etc.).
Updating Semble ```bash -pip install --upgrade semble # with pip uv tool upgrade semble # with uv uv cache clean semble # for MCP users (restart your MCP client after) +pip install --upgrade semble # with pip ```
@@ -316,70 +330,7 @@ Add to `~/.config/zed/settings.json` (or `.zed/settings.json` in your project): By default the MCP server indexes only code files. To also index documentation, config, or everything, append `--content docs`, `--content config`, or `--content all` to the server command, or a combination, e.g. `--content code docs`. For example, in Claude Code: `claude mcp add semble -s user -- uvx --from "semble[mcp]" semble --content all`. - - -## Bash / AGENTS.md - -An alternative to MCP is to invoke Semble via Bash. Sub-agents cannot call MCP tools directly, so this is the only option for sub-agent support; it can also be used alongside MCP for the top-level agent. - -To add Bash support, append the following to your `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, or equivalent: - -```markdown -## Code Search - -Use `semble search` to find code by describing what it does or naming a symbol/identifier, instead of grep: - -​```bash -semble search "authentication flow" ./my-project -semble search "save_pretrained" ./my-project -semble search "save model to disk" ./my-project --top-k 10 -​``` - -If you anticipate doing more than one search, use `semble index` to create an index. - -​```bash -semble index ./my-project -o my_index -​``` - -You can then reuse this index later on: - -​```bash -semble search "save_pretrained" --index my_index -​``` - -An index is not automatically updated, so if the code changes significantly, reindex. If you notice stale results while resolving searches to files, reindex. - -Use `--content docs` to search documentation and prose, `--content config` for config files (yaml, toml, etc.), or `--content all` to search code, docs, and config: - -​```bash -semble search "deployment guide" ./my-project --content docs -semble search "database host port" ./my-project --content config -semble search "authentication" ./my-project --content all -​``` - -Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result): - -​```bash -semble find-related src/auth.py 42 ./my-project -​``` - -Like search, `find-related` also accepts an `--index` argument. - -`path` defaults to the current directory when omitted; git URLs are accepted. - -If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its place. - -### Workflow - -1. Index the repo using `semble index -o cached_index`. -2. Start with `semble search` to find relevant chunks. Pass the index to achieve results faster. -3. Use `--content docs` for documentation, `--content config` for config files, or `--content all` for everything. -4. Inspect full files only when the returned chunk does not give enough context. -5. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations. -6. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string. -``` - -### Sub-agent setup +## Sub-agent setup Claude Code, Gemini CLI, Cursor, OpenCode, GitHub Copilot CLI, and Kiro all support a dedicated semble search sub-agent. Run `semble init` once in your project root: @@ -399,17 +350,12 @@ If semble is not on `$PATH`, prefix the command with `uvx --from "semble[mcp]"`. Semble also ships as a standalone CLI. This is useful in scripts or anywhere you want search results without an MCP session. ```bash -# Index a local repository -semble index ./my-project -o my-index - # Search a local repo semble search "authentication flow" ./my-project -# Or with index (significantly faster) -# the index flag applies to all commands below. -semble search "authentication flow" --index my-index -# Search for a symbol or identifier -semble search "save_pretrained" ./my-project +# Index first for faster repeated searches (--index works with any command below) +semble index ./my-project -o my-index +semble search "authentication flow" --index my-index # Search a remote repo (cloned on demand) semble search "save model to disk" https://github.com/MinishLab/model2vec @@ -417,14 +363,8 @@ semble search "save model to disk" https://github.com/MinishLab/model2vec # Limit results semble search "save model to disk" ./my-project --top-k 10 -# Search docs and prose (markdown, rst, etc.) instead of code -semble search "deployment guide" ./my-project --content docs - -# Search config files (yaml, toml, terraform, etc.) -semble search "database host port" ./my-project --content config - -# Search everything (code, docs, and config) -semble search "authentication" ./my-project --content all +# Search docs/config/everything instead of just code +semble search "deployment guide" ./my-project --content docs # or: config, all # Find code similar to a known location semble find-related src/auth.py 42 ./my-project