Add offheap memory tracking to Bench YAML reports by MarkWolters · Pull Request #671 · datastax/jvector

MarkWolters · 2026-06-02T17:26:14Z

This PR updates the tracking of offheap memory usage by calculating max usage comparing the pre and post phase snapshots instead of just using the final snapshot, and adds offheap memory usage to the tabular results generated by BenchYAML.

Note that the reported offheap usage will appear static across configurations for a given dataset in this test. This is because off-heap memory is allocated during index loading (once per dataset). It includes memory-mapped index files and Direct ByteBuffers for vector storage. All query configurations reuse the same loaded index, so off-heap usage remains constant but different datasets have different index sizes, hence different off-heap usage.

Example output:

Disk Usage Summary Graph Index Build:
  [testDirectory]:
    Total Disk Used: 115.99 MB
    Total Files: 1
    Net Change: 115.99 MB, +1 files
  [indexCache]:
    Total Disk Used: 0 B
    Total Files: 0
    Net Change: 0 B, +0 files
  [Overall Total]:
    Total Disk Used: 115.99 MB
    Total Files: 1
    Net Change: 115.99 MB, +1 files
Index build time: 26.442781 seconds

cohere-english-v3-100k: ProductQuantization(M=128, clusters=256, centered=false) codebooks loaded from PQ_cohere-english-v3-100k_128_256_false_-1.0
cohere-english-v3-100k: ProductQuantization(M=128, clusters=256, centered=false) encoded 99685 vectors [13.17 MB] in 1.75s
cohere-english-v3-100k: Using OnDiskGraphIndex(layers=[LayerInfo{size=99685, degree=32}, LayerInfo{size=3224, degree=32}, LayerInfo{size=95, degree=32}, LayerInfo{size=2, degree=32}], entryPoint=NodeAtLevel(level=3, node=75059), features=NVQ_VECTORS):

Index configuration:
  featureSetForIndex   [NVQ_VECTORS]
  M                    32
  efConstruction       100
  neighborOverflow     1.2
  addHierarchy         true
  refineFinalGraph     true

Query configuration:
  usePruning           true

Overquery    Avg QPS         ± Std Dev    CV %        Mean Latency    Avg Visited    Recall@10    Max heap        Max offheap    
             (of 3)                                   (ms)                                        usage (MB)      usage (MB)     
---------------------------------------------------------------------------------------------------------------------------------
1.00         52315.1         378.7        0.7         0.163           355.3          0.78         755.2           116.4          
2.00         44155.5         619.7        1.4         0.201           515.2          0.92         834.3           116.4          
5.00         28832.6         264.1        0.9         0.306           951.1          0.98         834.3           116.4          
10.00        17835.6         1134.1       6.4         0.504           1536.7         0.99         786.6           116.4          


Overquery    Avg QPS         ± Std Dev    CV %        Mean Latency    Avg Visited    Recall@100    Max heap        Max offheap    
             (of 3)                                   (ms)                                         usage (MB)      usage (MB)     
----------------------------------------------------------------------------------------------------------------------------------
1.00         18204.6         119.4        0.7         0.522           1536.7         0.85          803.5           116.4          
2.00         11269.1         83.9         0.7         0.846           2495.2         0.97          803.5           116.4

github-actions · 2026-06-02T17:26:32Z

Before you submit for review:

Does your PR follow guidelines from CONTRIBUTIONS.md?
Did you summarize what this PR does clearly and concisely?
Did you include performance data for changes which may be performance impacting?
Did you include useful docs for any user-facing changes or features?
Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
Did you trigger and review regression testing results against the base branch via Run Bench Main?
Did you adhere to the code formatting guidelines (TBD)
Did you group your changes for easy review, providing meaningful descriptions for each commit?
Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

corrected offheap calculation and add to reported statistics

a53e4fb

MarkWolters requested review from ashkrisk, jshook and tlwillke as code owners June 2, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add offheap memory tracking to Bench YAML reports#671

Add offheap memory tracking to Bench YAML reports#671
MarkWolters wants to merge 1 commit into
mainfrom
bench_offheap_tracking

MarkWolters commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MarkWolters commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant