[BUG] Filtered Search Recall Collapse on WIKI-1M under Single, Identical Query Labels

**Describe the bug**
While performing filtered ANNS on `WIKI-1M` dataset with uniform query labels (i.e., every query point has the same one label) to evaluate query performance under a specific level of specificity, the system returns identical neighbor lists for every query. This results in a total recall collapse toward **0**. This behavior is observed on both IVF-Graph (high-specificity) and IVF-BFS (low-specificity) indexes.

**Steps/Code to reproduce bug**
1. Load `WIKI-1M` dataset.
2. Generate a `query_labels.txt` where each line is identical (e.g., `lbls[0] = lbls[1] = ... = lbls[n-1] = [0]`).
3. Run search using the following configuration (change `spec_threshold` to `2000` to evaluate IVF-BFS query for label `3079`):
```json
{
 "data_dir": "/data/ann/wiki_1M/",
 "data_fname": "base.fbin",
 "query_fname": "query.fbin",
 "data_label_fname": "base_labels.txt",
 "query_label_fname": "query_labels1.txt",
 "itopk_size": 32,
 "spec_threshold": 1500,
 "graph_degree": 32,
 "topk": 10,
 "num_runs": 1000,
 "warmup_runs": 10,
 "force_rebuild": true,
 "ivf_graph_fname": "ivf_graph.bin",
 "ivf_bfs_fname": "ivf_bfs.bin",
 "ground_truth_fname": "ground_truth_k10.ibin"
}
```

**Environment details**
- Dataset: [WIKI-1M](https://huggingface.co/2024annonymous/wiki-ann)
- CPU: AMD Ryzen 7 5700G
- GPU: NVIDIA RTX 2080 Ti
- OS: Ubuntu 22.04 LTS
- NVIDIA Driver: 12.2
- CUDA Compiler: 12.9.86
- Host Compiler: g++ 11.4.0

**Observed behavior**
Host validation confirms (first two) query vectors are distinct ($L_2$ distance $\approx 0.62$), yet neighbors are duplicated (`nbhs[0] = nbhs[1] = ... = nbhs[n-1]`).
Cross-path confirmation:
- IVF-Graph: Label `0` and `1` (`spec=15.33%` and `13.05%`), specificity > threshold — `FAILED`
- IVF-BFS: Label `3079` (0.20%), specificity < threshold — `FAILED`

**Supporting evidence**
IVF-Graph (`WIKI-1M` label `1`, `spec=13.05%`) query results snippet:
```
IVF-Graph Index Stats:
 Total vectors:  980312
 Number of labels: 3814
 Graph size:     [21768457 × 32]
 Graph degree:   32

IVF-BFS Index Stats:
 Number of labels: 186
 Number of rows:  269772

QPS: 380420.79
Recall: 0.0000

=== Search Results Monitor (first 3 queries) ===
Query 0 (label=0):
 neighbors: 541004 316361 314017 331946 604415 448044 539479 291316 344698 103825
 gt:       370208 251743 484555 597579 386190 368896 860059 781968 802148 401846
 recall@10: 0/10
Query 1 (label=0):
 neighbors: 541004 316361 314017 331946 604415 448044 539479 291316 344698 103825
 gt:       341406 160363 370208 251743 484555 874429 712954 785765 457944 517840
 recall@10: 0/10
Query 2 (label=0):
 neighbors: 541004 316361 314017 331946 604415 448044 539479 291316 344698 103825
 gt:       549821 533776 915603 251743 484555 597579 573700 882094 276817 331536
 recall@10: 0/10
```

IVF-BFS (`WIKI-1M` label `3079` `spec=0.20%`) query results snippet:
```
IVF-Graph Index Stats:
  Total vectors:  980312
  Number of labels: 2976
  Graph size:     [20314630 × 32]
  Graph degree:   32

IVF-BFS Index Stats:
  Number of labels: 1024
  Number of rows:  1723599

QPS: 302732.20
Recall: 0.0023

=== Search Results Monitor (first 3 queries) ===
Query 0 (label=3079):
  neighbors: 430417 279428 926084 439948 62295 368557 266782 439498 559943 547068 
  gt:       430417 279428 926084 439948 62295 368557 266782 439498 559943 547068 
  recall@10: 10/10
Query 1 (label=3079):
  neighbors: 430417 279428 926084 439948 62295 368557 266782 439498 559943 547068 
  gt:       751135 773048 427405 742655 944674 345688 398253 943112 272594 565167 
  recall@10: 0/10
Query 2 (label=3079):
  neighbors: 430417 279428 926084 439948 62295 368557 266782 439498 559943 547068 
  gt:       226262 778697 417094 104588 281525 47437 862004 267955 913929 809383 
  recall@10: 0/10
```

**Additional context**
For comparison, I also generated synthetic labels in Zipfian distribution for the `SIFT-1M` dataset and ran the same uniform filtered query with label `1` (`spec=75%`). This yielded normal results.
The IVF-Graph results snippet:
```
IVF-Graph Index Stats:
 Total vectors:  1000000
 Number of labels: 51
 Graph size:     [3385661 × 32]
 Graph degree:   32

IVF-BFS Index Stats:
 Number of labels: 0
 Number of rows:  0

QPS: 1068508.05
Recall: 0.9292

=== Search Results Monitor (first 3 queries) ===
Query 0 (label=1):
  neighbors: 932085 934876 561813 695756 701258 455537 562594 908244 600499 893601 
  gt:       932085 934876 561813 708177 706771 695756 435345 701258 455537 562594 
  recall@10: 7/10
Query 1 (label=1):
  neighbors: 413071 880592 249062 400194 942339 880462 941776 586780 248426 849742 
  gt:       413071 706838 880592 249062 400194 942339 880462 941776 420802 586780 
  recall@10: 8/10
Query 2 (label=1):
  neighbors: 408764 408462 861882 406273 406324 551743 861530 402106 239766 823095 
  gt:       408764 408462 861882 406273 406324 551743 861530 402106 239766 823095 
  recall@10: 10/10
```
Interestingly, this does NOT occur on SIFT-1M (128D), even with uniform labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Filtered Search Recall Collapse on WIKI-1M under Single, Identical Query Labels #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Filtered Search Recall Collapse on WIKI-1M under Single, Identical Query Labels #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions