refactor: SVS shared thread pool with rental model (MOD-9881)#925
refactor: SVS shared thread pool with rental model (MOD-9881)#925
Conversation
…l terminology** Rename `setNumThreads`/`getNumThreads` → `setParallelism`/`getParallelism` and `getThreadPoolCapacity` → `getPoolSize` across VectorSimilarity. Public info API fields (`numThreads`, `lastReservedThreads`, `NUM_THREADS`, `LAST_RESERVED_NUM_THREADS`) remain unchanged. No behavioral changes.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #925 +/- ##
==========================================
- Coverage 96.98% 96.91% -0.08%
==========================================
Files 129 129
Lines 7574 7651 +77
==========================================
+ Hits 7346 7415 +69
- Misses 228 236 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
thpool tests fix fp16 tests
align bm
rfsaliev
left a comment
There was a problem hiding this comment.
LGFM, but some suggestions.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit df969a1. Configure here.
| size_t num_threads = VecSimSVSThreadPoolImpl::instance()->beginScheduledJob(); | ||
| return createJobs(allocator, jobType, callback, index, num_threads, threads_wait_timeout, | ||
| registry, /*scheduled=*/true); | ||
| } |
There was a problem hiding this comment.
Leaked pending_jobs_ counter if createJobs allocation fails
Medium Severity
createScheduledJobs calls beginScheduledJob() (incrementing pending_jobs_) before createJobs. If createJobs throws (e.g., std::bad_alloc from vector or new (allocator) for job objects), no SVSMultiThreadJob is created, so endScheduledJob() is never called. The pending_jobs_ counter stays permanently elevated, causing all future pool shrink operations to be deferred indefinitely — the pool can never shrink again for the lifetime of the process.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit df969a1. Configure here.
| // Resize the shared SVS thread pool. resize() clamps to minimum size 1. | ||
| // new_size == 0 → pool size 1 (only calling thread, write-in-place). | ||
| // new_size > 0 → pool size new_size (new_size - 1 worker threads). | ||
| VecSimSVSThreadPool::resize(std::max(new_size, size_t{1})); |
There was a problem hiding this comment.
Redundant std::max clamping before resize call
Low Severity
VecSim_UpdateThreadPoolSize applies std::max(new_size, size_t{1}) before calling VecSimSVSThreadPool::resize(), but VecSimSVSThreadPoolImpl::resize() already applies the identical clamp internally. The outer clamp is redundant and slightly obscures which layer is responsible for the minimum-size-1 invariant.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit df969a1. Configure here.


Overview
Replaces per-index SVS thread pools with a single shared thread pool that can be physically resized at runtime, and introduces a rental-based execution model for
parallel_for. This eliminates over-provisioning (previously K indexes × N threads = K×N OS threads) and enables the SVS pool to actually grow/shrink when the external pool capacity changes — something the old design could not do.Also adds a new C API
VecSim_UpdateThreadPoolSize()that combines write-mode toggling with pool resizing, and deprecatesSVSParams::num_threads.Key Guarantees and Assumptions
The fundamental scheduling invariant
The shared SVS thread pool has the same size as the RediSearch worker thread pool (
search-workers). Every SVS multi-threaded operation (update, GC) is dispatched viaSVSMultiThreadJob, which submits N reserve jobs to the RediSearch worker pool — one per worker thread. Only the threads that actually check in (are not busy with other work) participate in the SVS operation. This means:rent()requests can never exceed the pool capacity, because each rented SVS thread corresponds to a RediSearch worker that checked in via a reserve job. If a worker is busy, it doesn't check in, so it's never counted inavailableThreadsand never requested from the SVS pool.setParallelism(n)is always called withn = min(availableThreads, problemSize), whereavailableThreads ≤ poolSize(). This is why the assertionn <= pool->size()is safe.ThreadSlots are held viashared_ptr; shrinking drops the pool's reference but the renter keeps the slot (and its OS thread) alive until~RentedThreadsreleases it. Resize never blocks on in-flight operations.Other invariants
VecSimSVSThreadPoolImpl::instance()). There is nonullptrstate; write-in-place mode uses a pool of size 1 (0 worker threads, calling thread only).parallelism_ >= 1— the calling thread always participates, so parallelism can never be 0.ThreadSlots are held viashared_ptr; shrinking drops the pool's reference but the renter keeps the slot (and OS thread) alive until~RentedThreads.shared_ptr<atomic<size_t>> parallelism_;setParallelism()on one index never affects another.Relationship with RediSearch
VecSim_UpdateThreadPoolSize(N)when the worker count changes, so VecSim can create/destroy the matching number of OS threads.Changes from Old Implementation
VecSimSVSThreadPoolImplVecSimSVSThreadPoolImplshared across all indexes viashared_ptrresize()clamped a logical counterresize()spawns/destroys OS threadsparallel_forthreadingatomic<bool>setNumThreads(n)/resize(n)— mutated the poolsetParallelism(n)— sets a per-indexatomic<size_t>, pool unchangedcapacity()(max pre-allocated)poolSize()(current shared pool size)manage_exception_during_run()threw unconditionallymanage_workers_after_run()only throws if an error occurredAPI Changes
Renamed methods
setNumThreads(n)setParallelism(n)SVSIndex,SVSIndexBasevirtual interfacegetNumThreads()getParallelism()SVSIndex,SVSIndexBasevirtual interfacegetThreadPoolCapacity()getPoolSize()SVSIndex,SVSIndexBasevirtual interfaceRemoved methods
VecSimSVSThreadPool::capacity()poolSize()for shared pool sizeVecSimSVSThreadPool::resize()setParallelism()for per-index control,VecSimSVSThreadPool::resize(size_t)(static) for global poolNew C API
New internal APIs
VecSimSVSThreadPoolImpl::instance()shared_ptr<VecSimSVSThreadPoolImpl>by valueVecSimSVSThreadPool(void* log_ctx)VecSimSVSThreadPool::resize(size_t)VecSimSVSThreadPool::poolSize()VecSimSVSThreadPoolImpl::rent(count, log_ctx)RentedThreads)Deprecated
VecSim_SetWriteMode()VecSim_UpdateThreadPoolSize()VecSim_UpdateThreadPoolSize) and unit tests. Not called by RediSearch — RediSearch will use onlyVecSim_UpdateThreadPoolSizeSVSParams::num_threadsVecSim_UpdateThreadPoolSize()New types
ThreadSlot— wrapssvs::threads::Thread+atomic<bool> occupied; non-copyable, non-movableRentedThreads— RAII guard; move-only; releases slots via lock-free atomic stores in destructorbool isScheduledflag onSVSMultiThreadJob; its destructor callsendScheduledJob()on the shared pool (no separateScheduledJobTokenclass)Behavioral Changes
setParallelism(0)resize(0)silently clamped to 1setParallelism(n > poolSize)search-workersn <= pool->size()— scheduling bugn > parallelisminparallel_forThreadingExceptionifn > size_parallelism_initial valuelogCallback, asserts in debug, usesrented.count()for graceful degradation in releaseparallel_formanage_exception_during_run()always threwmanage_workers_after_run()only throws if an error actually occurredupdateSVSIndexWrapperlabels_to_moveis emptySVSParams::num_threadsset to non-zeroPublic info API fields — unchanged
The debug info field names
numThreadsandlastReservedThreads(and their string keysNUM_THREADS,LAST_RESERVED_NUM_THREADS) are unchanged. Their semantics shift slightly:numThreads→ now reportsgetPoolSize()(shared pool size, not per-index capacity)lastReservedThreads→ now reportsgetParallelism()(per-index parallelism for last operation)What Does NOT Change
deps/ScalableVectorSearch/) — no modificationsSVSMultiThreadJobpattern — still submits N reserve jobs to the external worker poolThreadstate machine — used as-is; lifecycle managed viashared_ptrFiles Changed
src/VecSim/algorithms/svs/svs_utils.hThreadSlot,RentedThreads, rental-basedVecSimSVSThreadPoolImpl, per-indexVecSimSVSThreadPoolwrappersrc/VecSim/algorithms/svs/svs.hnum_threadsdeprecation warningsrc/VecSim/algorithms/svs/svs_tiered.hupdateSVSIndexWrapper,SVSMultiThreadJobusesbool isScheduled+ destructor-basedendScheduledJob()instead ofScheduledJobToken,createJobsImplmerged intocreateJobssrc/VecSim/vec_sim.h/vec_sim.cppVecSim_UpdateThreadPoolSize()C APIsrc/VecSim/vec_sim_common.hSVSParams::num_threadsmarked as deprecated in commentstests/unit/test_svs_tiered.cppVecSim_UpdateThreadPoolSizein setup, removednum_threadsfrom params, updated assertionstests/unit/test_svs_fp16.cpptests/unit/test_svs.cppnumThreadsinfo field expects default; newNumThreadsParamIgnoredtesttests/benchmark/bm_utils.h/bm_vecsim_svs.hTesting
testThreadPoolrewritten to cover: default parallelism,setParallelismboundary assertions (ASSERT_DEATH),parallel_forwithn < parallelism, write-in-place mode, exception handlingTestDebugInfoThreadCount/TestDebugInfoThreadCountWriteInPlaceupdated:lastReservedThreadsnow starts at 1 (notnum_threads)ThreadsReservationtest usestieredIndexMock(num_threads)to properly size the shared poolNumThreadsParamIgnored— verifies that settingSVSParams::num_threadsemits a deprecation warning, does not affect the shared pool size, and that omitting it produces no warningnum_threads = 1inSVSParamsnow usetieredIndexMock(1)to resize the shared pool insteadMark if applicable
Note
High Risk
High risk because it rewrites SVS threading and job scheduling, introducing shared mutable concurrency primitives (rental, deferred shrink, per-index parallelism) that can affect correctness and stability under load.
Overview
SVS threading is refactored to a shared, physically resizable thread pool with a rental-based
parallel_for, allowing multiple indexes/jobs to concurrently rent disjoint worker threads and making shrink/grow safe during active operations.The SVS public/virtual threading knobs are renamed from threads to parallelism (e.g.,
get/setParallelism,getPoolSize),SVSParams::num_threadsis deprecated/ignored (with warning log when set), and SVS debug info now reports shared pool size asnumThreadsand last-used per-index parallelism aslastReservedThreads.Adds
VecSim_UpdateThreadPoolSize()C API to toggle write mode (0→ in-place,>0→ async) while resizing the shared SVS pool, updates tiered SVS update/GC scheduling to snapshot pool size and defer pool shrink until scheduled jobs complete, and expands/adjusts benchmarks and unit tests (including newtest_svs_threadpool.cpp) to cover resize/rental/deferred-shrink behavior and the deprecated param handling.Reviewed by Cursor Bugbot for commit df969a1. Bugbot is set up for automated code reviews on this repo. Configure here.