feat(vectordb): add Qdrant backend support#232
Open
mclamee wants to merge 1 commit intovolcengine:mainfrom
Open
feat(vectordb): add Qdrant backend support#232mclamee wants to merge 1 commit intovolcengine:mainfrom
mclamee wants to merge 1 commit intovolcengine:mainfrom
Conversation
58a3b88 to
e883acc
Compare
Collaborator
|
looks good, help to resolve uv.lock conflicts & ruff lint issues. we shall test this before merge it. |
e883acc to
78a351f
Compare
Add Qdrant as an alternative open-source vector database backend alongside existing VikingDB/Volcengine backends. Key changes: - QdrantCollection: full VectorDBCollection implementation with support for dense/sparse hybrid search, scalar search with native order_by, full-text keyword search via payload indexes, multimodal search, aggregate data with pagination, and proper filter translation - QdrantProject: VectorDBProject implementation for collection lifecycle management with configurable distance metrics and vector dimensions - QdrantConfig: configuration model with URL, API key, gRPC, and timeout settings - Backend factory: register 'qdrant' backend type in viking_vector_index_backend.py - Auto-create TextIndex for text fields in create_index - Use FilterSelector for delete_all_data instead of drop/recreate - Paginate aggregate_data scroll to handle large collections - Track created indexes properly in has_index Dependencies: - qdrant-client >= 1.9.0 (optional extra: `pip install openviking[qdrant]`) Tests: - Comprehensive unit tests for QdrantCollection (752 lines) - Unit tests for QdrantProject (266 lines)
78a351f to
db38e45
Compare
Author
|
Thanks for the review @ZaynJarvis! Both issues addressed:
All Qdrant tests pass locally (51 tests: 44 collection + 7 project). Happy to help set up a test environment or add integration test instructions if needed. |
Collaborator
|
@mclamee we will offer a plugin mechanism for vector database, and would you like to help review when our code released |
Collaborator
|
I think the path and datetime field types need to be supported. You can try converting them to string and float types. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Qdrant as an alternative open-source vector database backend, giving users a self-hosted option alongside VikingDB/Volcengine.
ICollectionimplementation — dense/sparse hybrid search, scalar search with nativeorder_by, full-text keyword search via payload indexes, multimodal search, aggregation with pagination, and proper filter translationqdrantbackend type inviking_vector_index_backend.pypip install openviking[qdrant](qdrant-client >= 1.9.0)Design Decisions
1. ID System — Dual-Tracking (String ↔ UUID)
Problem: VikingDB uses arbitrary string IDs (e.g.
"doc_123"), but Qdrant requires UUID or uint64 point IDs.Solution: Deterministic UUID5 mapping with original ID preservation.
string_to_qdrant_id(id)converts string IDs to UUID5 using a fixed namespace (f47ac10b-58cc-4372-a567-0e02b2c3d479), ensuring the same string always maps to the same UUID — stable across processes and restarts._original_idpayload field for round-trip fidelity.2. Vector Storage Model — Named Vectors
Problem: VikingDB separates "Index" (search config) from "Store" (data container). Each index has its own vector field. Qdrant uses a flat point model with named vectors.
Solution: Map VikingDB index names to Qdrant named vectors.
"default"named vector"sparse"named vectorvector_field_namefrom schema'sVectorIndexmaps to named vectors at collection creation time.VectorParamsper dense index andSparseVectorParamsper sparse index.3. Filter DSL Translation
Problem: OpenViking uses a JSON-based filter DSL (
{"op": "must", "field_name": ..., "conds": [...]}), while Qdrant uses typed Pydantic models (Filter,FieldCondition,MatchValue,Range, etc.).Solution: Recursive
_build_qdrant_filter()translates the full DSL:must/must_not→Filter(must=[...])/Filter(must_not=[...])and/or→ NestedFilterwithmust/shouldrange(dict with gt/gte/lt/lte) →Range(...)in(list of values) →MatchAny(any=[...])MatchValue(value=...)MatchText(text=...)must_notconditions at any nesting level are properly collected and applied.4. Hybrid Search — RRF Fusion
Problem: VikingDB has built-in hybrid search (dense + sparse + rerank in one call). Qdrant requires explicit orchestration.
Solution: Qdrant's
prefetch+Fusion.RRFpattern:This achieves Reciprocal Rank Fusion natively in Qdrant without an external reranker.
5. Sparse Vector Key Hashing
Problem: VikingDB sparse vectors use string term keys (e.g.
{"hello": 0.5, "world": 0.3}), but Qdrant sparse vectors require integer indices.Solution: Stable hashing via
hashlib.md5:Using MD5 (truncated to 8 hex chars) ensures:
hash()which is randomized per process viaPYTHONHASHSEED)6. Schema Flexibility
Problem: VikingDB enforces strict field schemas. Qdrant is schema-less by default (any payload field can be stored).
Solution:
create_collectionis used to configure vector dimensions and named vectors, but payload fields are stored freely.TextIndexpayload indexes are auto-created forstringfields to enable full-textMatchTextsearch._created_indexesset to avoid redundant index creation.7. Sorting & Aggregation
Problem: VikingDB supports
order_byin fetch and aggregation natively. Qdrant addedorder_byin v1.9.0.Solution:
fetch_data_by_sort→ Qdrant's nativescroll(order_by=...)withOrderBy(key, direction).aggregate_data→ Paginatedscroll()with client-side grouping and counting (Qdrant lacks server-side aggregation).search_by_random→ Random unit vector search (matchingLocalCollectionapproach), since Qdrant has no native random sampling.8. search_by_id — Self-Exclusion
VikingDB's
search_by_idreturns neighbors excluding the query point itself. Our implementation:query_pointssearch withlimit + 1limitKnown Limitations
GeoRadius/DatetimeRangebut no DSL mapping yetType of Change
Testing
Usage Example
{ "vectordb": { "backend": "qdrant", "name": "context", "dimension": 1024, "qdrant": { "url": "http://localhost:6333" } } }Checklist