feat(vectordb): add Qdrant backend support by mclamee · Pull Request #232 · volcengine/OpenViking

mclamee · 2026-02-20T09:28:58Z

Summary

Add Qdrant as an alternative open-source vector database backend, giving users a self-hosted option alongside VikingDB/Volcengine.

QdrantCollection: Full ICollection implementation — dense/sparse hybrid search, scalar search with native order_by, full-text keyword search via payload indexes, multimodal search, aggregation with pagination, and proper filter translation
QdrantProject: Collection lifecycle management with configurable distance metrics and vector dimensions
QdrantConfig: Configuration model (URL, API key, gRPC, timeout)
Backend factory: Register qdrant backend type in viking_vector_index_backend.py
Optional dependency: pip install openviking[qdrant] (qdrant-client >= 1.9.0)

Design Decisions

1. ID System — Dual-Tracking (String ↔ UUID)

Problem: VikingDB uses arbitrary string IDs (e.g. "doc_123"), but Qdrant requires UUID or uint64 point IDs.

Solution: Deterministic UUID5 mapping with original ID preservation.

string_to_qdrant_id(id) converts string IDs to UUID5 using a fixed namespace (f47ac10b-58cc-4372-a567-0e02b2c3d479), ensuring the same string always maps to the same UUID — stable across processes and restarts.
The original string ID is stored in _original_id payload field for round-trip fidelity.
All query results reconstruct the original string ID from the payload, so callers never see UUIDs.

2. Vector Storage Model — Named Vectors

Problem: VikingDB separates "Index" (search config) from "Store" (data container). Each index has its own vector field. Qdrant uses a flat point model with named vectors.

Solution: Map VikingDB index names to Qdrant named vectors.

Dense vector → "default" named vector
Sparse vector → "sparse" named vector
The vector_field_name from schema's VectorIndex maps to named vectors at collection creation time.
Qdrant collection is created with VectorParams per dense index and SparseVectorParams per sparse index.

3. Filter DSL Translation

Problem: OpenViking uses a JSON-based filter DSL ({"op": "must", "field_name": ..., "conds": [...]}), while Qdrant uses typed Pydantic models (Filter, FieldCondition, MatchValue, Range, etc.).

Solution: Recursive _build_qdrant_filter() translates the full DSL:

must / must_not → Filter(must=[...]) / Filter(must_not=[...])
and / or → Nested Filter with must / should
range (dict with gt/gte/lt/lte) → Range(...)
in (list of values) → MatchAny(any=[...])
Scalar equality → MatchValue(value=...)
Full-text match → MatchText(text=...)
must_not conditions at any nesting level are properly collected and applied.

4. Hybrid Search — RRF Fusion

Problem: VikingDB has built-in hybrid search (dense + sparse + rerank in one call). Qdrant requires explicit orchestration.

Solution: Qdrant's prefetch + Fusion.RRF pattern:

client.query_points(
    prefetch=[
        Prefetch(query=dense_vector, using="default", limit=limit),
        Prefetch(query=SparseVector(...), using="sparse", limit=limit),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
)

This achieves Reciprocal Rank Fusion natively in Qdrant without an external reranker.

5. Sparse Vector Key Hashing

Problem: VikingDB sparse vectors use string term keys (e.g. {"hello": 0.5, "world": 0.3}), but Qdrant sparse vectors require integer indices.

Solution: Stable hashing via hashlib.md5:

def _stable_sparse_index(key: str) -> int:
    return int(hashlib.md5(key.encode()).hexdigest()[:8], 16) % (2**31)

Using MD5 (truncated to 8 hex chars) ensures:

Cross-process stability (unlike Python's hash() which is randomized per process via PYTHONHASHSEED)
Deterministic mapping across restarts
Sufficient range (2^31) to minimize collisions for typical vocabulary sizes

6. Schema Flexibility

Problem: VikingDB enforces strict field schemas. Qdrant is schema-less by default (any payload field can be stored).

Solution:

Field schema from create_collection is used to configure vector dimensions and named vectors, but payload fields are stored freely.
TextIndex payload indexes are auto-created for string fields to enable full-text MatchText search.
Index tracking via _created_indexes set to avoid redundant index creation.

7. Sorting & Aggregation

Problem: VikingDB supports order_by in fetch and aggregation natively. Qdrant added order_by in v1.9.0.

Solution:

fetch_data_by_sort → Qdrant's native scroll(order_by=...) with OrderBy(key, direction).
aggregate_data → Paginated scroll() with client-side grouping and counting (Qdrant lacks server-side aggregation).
search_by_random → Random unit vector search (matching LocalCollection approach), since Qdrant has no native random sampling.

8. search_by_id — Self-Exclusion

VikingDB's search_by_id returns neighbors excluding the query point itself. Our implementation:

Retrieves the vector of the given ID
Performs a query_points search with limit + 1
Filters out the query ID from results and trims to limit

Known Limitations

Feature	Status	Notes
Path/nested field filters	Not supported	Qdrant payloads are flat; nested access requires flattening at ingest
Geo/datetime filters	Not supported	Qdrant supports `GeoRadius`/`DatetimeRange` but no DSL mapping yet
TTL (auto-expiry)	Not supported	Qdrant has no built-in TTL; would need external cron/scheduler
Group-by aggregation	Client-side	Qdrant lacks server-side GROUP BY; large collections may be slow
Reranking	Via RRF only	No external reranker integration; RRF fusion handles hybrid ranking

Type of Change

New feature (feat)

Testing

Comprehensive unit tests for QdrantCollection (44 tests, ~770 lines)
Unit tests for QdrantProject (7 tests, 266 lines)
All 119 existing vectordb tests continue to pass

# Run Qdrant tests (requires Qdrant server on localhost:6333)
pytest tests/vectordb/test_qdrant_collection.py tests/vectordb/test_qdrant_project.py -v

# Run all vectordb tests
pytest tests/vectordb/ -v

Usage Example

{
  "vectordb": {
    "backend": "qdrant",
    "name": "context",
    "dimension": 1024,
    "qdrant": {
      "url": "http://localhost:6333"
    }
  }
}

Checklist

Code follows project style guidelines
Tests added for new functionality (44 + 7 = 51 tests)
All existing interfaces preserved (backward compatible)
Optional dependency — no impact on existing installations
Stable cross-process hashing for sparse vectors
Proper filter DSL translation with must_not support
search_by_id excludes self from results

CLAassistant · 2026-02-20T09:42:41Z

All committers have signed the CLA.

ZaynJarvis · 2026-02-22T13:04:37Z

looks good, help to resolve uv.lock conflicts & ruff lint issues.

we shall test this before merge it.

Add Qdrant as an alternative open-source vector database backend alongside existing VikingDB/Volcengine backends. Key changes: - QdrantCollection: full VectorDBCollection implementation with support for dense/sparse hybrid search, scalar search with native order_by, full-text keyword search via payload indexes, multimodal search, aggregate data with pagination, and proper filter translation - QdrantProject: VectorDBProject implementation for collection lifecycle management with configurable distance metrics and vector dimensions - QdrantConfig: configuration model with URL, API key, gRPC, and timeout settings - Backend factory: register 'qdrant' backend type in viking_vector_index_backend.py - Auto-create TextIndex for text fields in create_index - Use FilterSelector for delete_all_data instead of drop/recreate - Paginate aggregate_data scroll to handle large collections - Track created indexes properly in has_index Dependencies: - qdrant-client >= 1.9.0 (optional extra: `pip install openviking[qdrant]`) Tests: - Comprehensive unit tests for QdrantCollection (752 lines) - Unit tests for QdrantProject (266 lines)

mclamee · 2026-02-23T13:49:34Z

Thanks for the review @ZaynJarvis!

Both issues addressed:

uv.lock conflict — Rebased onto latest main (3d2d05a) and regenerated uv.lock.
ruff lint — Fixed 4 issues (unused imports in qdrant_project.py and test_qdrant_collection.py, import sorting).

All Qdrant tests pass locally (51 tests: 44 collection + 7 project). Happy to help set up a test environment or add integration test instructions if needed.

MaojiaSheng · 2026-02-24T07:32:07Z

@mclamee we will offer a plugin mechanism for vector database, and would you like to help review when our code released

kkkwjx07 · 2026-02-24T13:39:24Z

I think the path and datetime field types need to be supported. You can try converting them to string and float types.
By the way, I'm planning to revise the API code recently, as the integration cost is currently a bit high.

github-project-automation bot moved this to Backlog in OpenViking project Feb 20, 2026

github-project-automation bot added this to OpenViking project Feb 20, 2026

mclamee force-pushed the feature/qdrant-backend-support branch from 58a3b88 to e883acc Compare February 20, 2026 09:44

ZaynJarvis requested a review from kkkwjx07 February 22, 2026 13:02

mclamee force-pushed the feature/qdrant-backend-support branch from e883acc to 78a351f Compare February 23, 2026 13:46

mclamee force-pushed the feature/qdrant-backend-support branch from 78a351f to db38e45 Compare February 23, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(vectordb): add Qdrant backend support#232

feat(vectordb): add Qdrant backend support#232
mclamee wants to merge 1 commit intovolcengine:mainfrom
mclamee:feature/qdrant-backend-support

mclamee commented Feb 20, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 20, 2026 •

edited

Loading

Uh oh!

ZaynJarvis commented Feb 22, 2026 •

edited

Loading

Uh oh!

mclamee commented Feb 23, 2026

Uh oh!

MaojiaSheng commented Feb 24, 2026

Uh oh!

kkkwjx07 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

mclamee commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Decisions

1. ID System — Dual-Tracking (String ↔ UUID)

2. Vector Storage Model — Named Vectors

3. Filter DSL Translation

4. Hybrid Search — RRF Fusion

5. Sparse Vector Key Hashing

6. Schema Flexibility

7. Sorting & Aggregation

8. search_by_id — Self-Exclusion

Known Limitations

Type of Change

Testing

Usage Example

Checklist

Uh oh!

CLAassistant commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZaynJarvis commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mclamee commented Feb 23, 2026

Uh oh!

MaojiaSheng commented Feb 24, 2026

Uh oh!

kkkwjx07 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mclamee commented Feb 20, 2026 •

edited

Loading

CLAassistant commented Feb 20, 2026 •

edited

Loading

ZaynJarvis commented Feb 22, 2026 •

edited

Loading