Skip to content

MOD-14826 Use try_lock in debugInfo() to avoid blocking main thread during SVS training#931

Merged
meiravgri merged 2 commits intomainfrom
meiravg_fix_debug_info_lock_svs
Apr 14, 2026
Merged

MOD-14826 Use try_lock in debugInfo() to avoid blocking main thread during SVS training#931
meiravgri merged 2 commits intomainfrom
meiravg_fix_debug_info_lock_svs

Conversation

@meiravgri
Copy link
Copy Markdown
Collaborator

@meiravgri meiravgri commented Apr 14, 2026

TieredSVSIndex::debugInfo() used a blocking std::lock_guard on updateJobMutex. Since updateSVSIndexWrapper holds that mutex for the entire duration of SVS training (which can take tens of seconds), any call to debugInfo() — including via svs_label_count() in the Python bindings — would block until training completed.

This means that reading debug info (e.g., BACKGROUND_INDEXING, INDEX_UPDATE_SCHEDULED) could stall the calling thread for the full training duration, defeating the purpose of non-blocking status queries.

Fix

Replace std::lock_guard with std::unique_lock using std::try_to_lock:

  • If the lock is acquired: read indexUpdateScheduled normally.
  • If the lock is not acquired (training is in progress): report indexUpdateScheduled = true, since the mutex being held by updateSVSIndexWrapper means an update is actively running.

Flow test update

The search_insert flow test was implicitly relying on the old blocking behavior — svs_label_count() (which calls debugInfo()) acted as an accidental synchronization point, blocking until background indexing finished. With the fix, debugInfo() returns immediately, so the test could observe intermediate states (e.g., flat buffer not yet drained while SVS already has vectors).

Updated the test to remove the per-iteration assert bf_curr_size < prev_bf_size check and instead assert bf_size == 0 after wait_for_index() completes.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Low Risk
Low risk: changes only debug/observability behavior and a flow test assertion, without affecting indexing or query logic. Main risk is slightly altered BACKGROUND_INDEXING reporting while training is in progress.

Overview
Prevents TieredSVSIndex::debugInfo() from blocking for the full SVS training duration by switching the updateJobMutex acquisition to try_to_lock; when the mutex is held (training running), it reports indexUpdateScheduled=true to keep BACKGROUND_INDEXING on.

Updates test_search_insert to stop asserting the flat-buffer size monotonically decreases during background indexing, and instead asserts the buffer is fully drained (get_curr_bf_size()==0) after wait_for_index() completes.

Reviewed by Cursor Bugbot for commit d9edf21. Bugbot is set up for automated code reviews on this repo. Configure here.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 14, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@meiravgri meiravgri changed the title use try lock MOD-14826 Use try_lock in debugInfo() to avoid blocking main thread during SVS training Apr 14, 2026
@meiravgri meiravgri requested a review from ofiryanai April 14, 2026 08:18
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.99%. Comparing base (32cb7f2) to head (d9edf21).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #931      +/-   ##
==========================================
- Coverage   97.01%   96.99%   -0.03%     
==========================================
  Files         129      129              
  Lines        7574     7576       +2     
==========================================
  Hits         7348     7348              
- Misses        226      228       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ofiryanai
ofiryanai previously approved these changes Apr 14, 2026
@meiravgri meiravgri enabled auto-merge April 14, 2026 08:48
Copy link
Copy Markdown
Collaborator

@ofiryanai ofiryanai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please un-flak

@meiravgri meiravgri added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit d7a5afb Apr 14, 2026
17 checks passed
@meiravgri meiravgri deleted the meiravg_fix_debug_info_lock_svs branch April 14, 2026 15:24
github-actions bot pushed a commit that referenced this pull request Apr 14, 2026
…uring SVS training (#931)

* use try lock

* fix test

(cherry picked from commit d7a5afb)
@github-actions
Copy link
Copy Markdown

github-actions bot pushed a commit that referenced this pull request Apr 14, 2026
…uring SVS training (#931)

* use try lock

* fix test

(cherry picked from commit d7a5afb)
@github-actions
Copy link
Copy Markdown

meiravgri added a commit that referenced this pull request Apr 14, 2026
…read during SVS training (#933)

MOD-14826 Use try_lock in debugInfo() to avoid blocking main thread during SVS training (#931)

* use try lock

* fix test

(cherry picked from commit d7a5afb)

Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Apr 15, 2026
…read during SVS training (#932)

* MOD-14826 Use try_lock in debugInfo() to avoid blocking main thread during SVS training (#931)

* use try lock

* fix test

(cherry picked from commit d7a5afb)

* remove assert

---------

Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
Co-authored-by: meiravgri <meirav.grimberg@redis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants