Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Linked issue

Closes #

<!-- Every PR must link to an existing issue. PRs without a linked issue will be closed. See CONTRIBUTING.md. -->

## Summary

<!-- What does this PR change, and why? -->

## Checklist

- [ ] I have read [CONTRIBUTING.md](../blob/main/CONTRIBUTING.md)
- [ ] This PR is linked to an existing issue (above)
- [ ] `make test` passes
- [ ] `make lint` and `make typecheck` pass (or `make pre-commit`)
- [ ] Added or updated tests for any behaviour changes
- [ ] Updated docstrings / docs for any public API changes
18 changes: 10 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,28 @@ Thanks for your interest in semble. This document explains how contributions wor

## tl;dr

- **Bug fix or typo?** Open a PR directly.
- **New feature or behaviour change?** Open an issue first to discuss with us.
- **Every PR must link to an existing issue.** Open an issue to discuss before writing code, then link it from your PR (e.g. `Closes #123`).
- **AI-generated PRs** will be closed without review if they weren't discussed beforehand.

---

## Discuss before building

Our libraries are small and focused by design. We care a lot about keeping it that way. Before you invest time writing code for a new feature, please open an issue describing:
Our libraries are small and focused by design. We care a lot about keeping it that way. Before you invest time writing code, please open an issue describing:

- What problem you're solving
- Why it belongs in semble (as opposed to a wrapper or separate tool)
- What API or behaviour change it would involve
- What API or behaviour change it would involve, if any
- A minimal (code) example of how it would work

**PRs that add features without a prior issue will be closed.**
This applies to small PRs (e.g. bug fixes and documentation updates) as well. A quick issue lets us confirm the fix is wanted and aligned with how we'd want to solve it, so you don't waste time on a PR we'd need to reject or rework.

## What we welcome
**PRs without a linked issue will be closed.**

- Bug fixes (with a test that reproduces the issue)
- Documentation improvements and example fixes
## What we generally welcome

- Bug fixes (with a linked issue and a test that reproduces the issue)
- Documentation improvements and example fixes (with a linked issue)

## What we generally won't accept

Expand All @@ -47,6 +48,7 @@ If you want a feature, include the things listed under "Discuss before building"

Before opening a PR:

- [ ] Link to an existing issue (e.g. `Closes #123`). PRs without one will be closed
- [ ] Run `make test` and confirm all tests pass
- [ ] Run `make lint` and `make typecheck`
- [ ] Run `make fix` to auto-fix any lint issues
Expand Down
124 changes: 32 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,35 +18,41 @@

[Quickstart](#quickstart) •
[MCP Server](#mcp-server) •
[Bash / AGENTS.md](#bash-agentsmd) •
[AGENTS.md](#agentsmd) •
[CLI](#cli) •
[Benchmarks](#benchmarks)

</div>

Semble is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](#benchmarks)). Everything runs on CPU with no API keys, GPU, or external services. Run it as an [MCP server](#mcp-server) or call it from the shell via [AGENTS.md](#bash-agentsmd) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo.
Semble is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](#benchmarks)). Everything runs on CPU with no API keys, GPU, or external services. Run it as an [MCP server](#mcp-server) or call it from the shell via [AGENTS.md](#agentsmd) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo.

## Quickstart

Your agent queries Semble in natural language (e.g. `"How is authentication handled?"`) and gets back only the relevant code snippets, without grepping or reading full files. Set it up as an MCP server or via AGENTS.md:
Your agent queries Semble in natural language (e.g. `"How is authentication handled?"`) and gets back only the relevant code snippets, without grepping or reading full files.

### MCP (Claude Code)
Semble has three complementary setup paths. The recommended setup is using all three (but you can pick and choose based on your needs):

Add Semble to Claude Code (requires [uv](https://docs.astral.sh/uv/getting-started/installation/)):
- **[MCP server](#mcp-server)**: an MCP server for your agent.
- **[AGENTS.md](#agentsmd)**: an AGENTS.md snippet with instructions for calling Semble via the CLI.
- **[Sub-agent](#sub-agent-setup)**: a dedicated `semble-search` sub-agent for harnesses that support it.

### MCP

Expose Semble as a native tool via MCP so your agent can call it directly. Add it to Claude Code (requires [uv](https://docs.astral.sh/uv/getting-started/installation/)):

```bash
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
```

Using another agent harness? See [MCP Server](#mcp-server) below for per-agent setup.
See [MCP Server](#mcp-server) below for other harnesses (Cursor, Codex, OpenCode, etc.).

### Bash / AGENTS.md
### AGENTS.md

Install Semble, then add the snippet below to your `AGENTS.md` or `CLAUDE.md`:
Add Semble usage instructions to your agent's context so it knows when and how to call the CLI. Install the Semble CLI, then add the snippet below to your `AGENTS.md` or `CLAUDE.md`:

```bash
pip install semble # Install with pip
uv tool install semble # Or install with uv
uv tool install semble # Install with uv (recommended)
pip install semble # Or with pip
```

<details>
Expand Down Expand Up @@ -109,15 +115,23 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac

</details>

Note that sub-agents cannot call MCP tools directly, see [Bash / AGENTS.md](#bash-agentsmd) and [sub-agent setup](#sub-agent-setup) below for details.
### Sub-agent

For harnesses that support sub-agents, install a dedicated `semble-search` sub-agent so search runs in its own context (requires the CLI):

```bash
semble init # Claude Code → .claude/agents/semble-search.md
```

See [Sub-agent setup](#sub-agent-setup) below for other harnesses (Cursor, Codex, OpenCode, etc.).

<details>
<summary>Updating Semble</summary>

```bash
pip install --upgrade semble # with pip
uv tool upgrade semble # with uv
uv cache clean semble # for MCP users (restart your MCP client after)
pip install --upgrade semble # with pip
```

</details>
Expand Down Expand Up @@ -316,70 +330,7 @@ Add to `~/.config/zed/settings.json` (or `.zed/settings.json` in your project):
By default the MCP server indexes only code files. To also index documentation, config, or everything, append `--content docs`, `--content config`, or `--content all` to the server command, or a combination, e.g. `--content code docs`. For example, in Claude Code: `claude mcp add semble -s user -- uvx --from "semble[mcp]" semble --content all`.


<a id="bash-agentsmd"></a>

## Bash / AGENTS.md

An alternative to MCP is to invoke Semble via Bash. Sub-agents cannot call MCP tools directly, so this is the only option for sub-agent support; it can also be used alongside MCP for the top-level agent.

To add Bash support, append the following to your `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, or equivalent:

```markdown
## Code Search

Use `semble search` to find code by describing what it does or naming a symbol/identifier, instead of grep:

​```bash
semble search "authentication flow" ./my-project
semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
​```

If you anticipate doing more than one search, use `semble index` to create an index.

​```bash
semble index ./my-project -o my_index
​```

You can then reuse this index later on:

​```bash
semble search "save_pretrained" --index my_index
​```

An index is not automatically updated, so if the code changes significantly, reindex. If you notice stale results while resolving searches to files, reindex.

Use `--content docs` to search documentation and prose, `--content config` for config files (yaml, toml, etc.), or `--content all` to search code, docs, and config:

​```bash
semble search "deployment guide" ./my-project --content docs
semble search "database host port" ./my-project --content config
semble search "authentication" ./my-project --content all
​```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

​```bash
semble find-related src/auth.py 42 ./my-project
​```

Like search, `find-related` also accepts an `--index` argument.

`path` defaults to the current directory when omitted; git URLs are accepted.

If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its place.

### Workflow

1. Index the repo using `semble index -o cached_index`.
2. Start with `semble search` to find relevant chunks. Pass the index to achieve results faster.
3. Use `--content docs` for documentation, `--content config` for config files, or `--content all` for everything.
4. Inspect full files only when the returned chunk does not give enough context.
5. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
6. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
```

### Sub-agent setup
## Sub-agent setup

Claude Code, Gemini CLI, Cursor, OpenCode, GitHub Copilot CLI, and Kiro all support a dedicated semble search sub-agent. Run `semble init` once in your project root:

Expand All @@ -399,32 +350,21 @@ If semble is not on `$PATH`, prefix the command with `uvx --from "semble[mcp]"`.
Semble also ships as a standalone CLI. This is useful in scripts or anywhere you want search results without an MCP session.

```bash
# Index a local repository
semble index ./my-project -o my-index

# Search a local repo
semble search "authentication flow" ./my-project
# Or with index (significantly faster)
# the index flag applies to all commands below.
semble search "authentication flow" --index my-index

# Search for a symbol or identifier
semble search "save_pretrained" ./my-project
# Index first for faster repeated searches (--index works with any command below)
semble index ./my-project -o my-index
semble search "authentication flow" --index my-index

# Search a remote repo (cloned on demand)
semble search "save model to disk" https://github.com/MinishLab/model2vec

# Limit results
semble search "save model to disk" ./my-project --top-k 10

# Search docs and prose (markdown, rst, etc.) instead of code
semble search "deployment guide" ./my-project --content docs

# Search config files (yaml, toml, terraform, etc.)
semble search "database host port" ./my-project --content config

# Search everything (code, docs, and config)
semble search "authentication" ./my-project --content all
# Search docs/config/everything instead of just code
semble search "deployment guide" ./my-project --content docs # or: config, all

# Find code similar to a known location
semble find-related src/auth.py 42 ./my-project
Expand Down
Loading