Vector similarity library. Core package is zero-dependency pure Go; optional pkg/onnx runs local neural models.
pkg/vector/ ← core library (zero-dependency, pure Go)
vector.go Vector type, Dot, Norm, Normalize, Add, Sub, Scale, Equal, EqualEps, Clone, Dims
similarity.go Metric enum, Cosine, CosineDist, Euclidean, Manhattan, Distance
store.go Store: NN search + Gob/JSON persistence (Save/Load/SaveJSON/LoadJSON)
embedder.go Embedder interface
http_embedder.go HTTPEmbedder: OpenAI-compatible embeddings API adapter (stdlib net/http)
random_projections.go RandomProjections: sparse JL projection + tokenizer
pkg/onnx/ ← local neural embeddings (depends on onnxruntime_go + x/text, CGo)
embedder.go Embedder: ONNX session, mean pooling, L2 normalization
tokenizer.go Pure-Go BERT WordPiece tokenizer (vocab.txt)
testdata/ Model files for tests (gitignored; fetch with `make model`)
cmd/go-vector/ ← minimal CLI demo
docs/ ← GitHub Pages landing page
index.html Dark-themed single-page site
.nojekyll GitHub Pages raw HTML flag
pkg/vectorstays zero-dependency — it must never import anything beyond stdlib:math,sort,encoding/gob,encoding/json,os,strings,unicode,math/rand,net/http(HTTPEmbedder). Heavyweight integrations (CGo, third-party) live in sibling packages likepkg/onnxso users who don't import them pay nothing.pkg/onnxcarries the only third-party deps —github.com/yalue/onnxruntime_go(CGo binding) andgolang.org/x/text(NFD for accent stripping). It also needs the ONNX Runtime shared library at runtime (brew install onnxruntime).- Vector = []float32 — no struct, no interface, just a slice.
- Mismatched lengths → zero — return zero/nil rather than panicking.
- Clone on output — Get() and Search() return copies. Store.Add() clones on insert.
- Zero-alloc distance — Cosine and Euclidean compute in a single pass.
- Gob persistence —
storeDatainternal struct bridges unexported fields to encoder. - Deterministic embeddings — RandomProjections uses fixed seed 42.
- Tests in
_test.gofiles — packagevector.
Sparse random projection (Achlioptas 2003):
- Entries: {-1, 0, +1} × sqrt(3/D) with probabilities {1/6, 2/3, 1/6}
- Vocabulary built from corpus via
Fit() - Tokenizer: split on non-letter/digit, lowercase, min 2 chars
- Output always L2-normalized
// storeData bridges unexported Store fields to encoder
type storeData struct {
Vectors []Vector
IDs []string
Metric Metric
}init() registers gob.Register(Vector{}) so []float32 serializes correctly.
docker exec -i projects-dev bash -c 'export PATH=$PATH:/usr/local/go/bin && cd /workspace/go-vector && make ci'
# Benchmark
docker exec -i projects-dev bash -c 'export PATH=$PATH:/usr/local/go/bin && cd /workspace/go-vector && go test ./pkg/vector/ -bench=. -benchmem'Target: >95% coverage. 40 tests, 96.8% currently.
- Implement
Embedderinterface (Embed(text) (Vector, error),Dims() int) - Add constructor function
- Add test file with Fit/Embed/determinism/similarity tests
- Document in README under Text Embedding section
- Add constant to
Metricenum insimilarity.go - Add case to
Distance()switch, updateAscending()if needed - Add direct function — zero-alloc, single-pass
- Add test + benchmark
- No panics. Return zero/nil for invalid input.
- No unsafe. Pure Go, no CGo, no syscalls.
- Clone hygiene. Internal state never exposed.
- Persistence:
Loadreplaces all data. Caller handles atomicity.