Skip to content

⚡ Bolt: Optimize TripletDataGenerator sampling#12

Draft
google-labs-jules[bot] wants to merge 1 commit intomainfrom
bolt-optimize-triplet-generator-7723859636208202121
Draft

⚡ Bolt: Optimize TripletDataGenerator sampling#12
google-labs-jules[bot] wants to merge 1 commit intomainfrom
bolt-optimize-triplet-generator-7723859636208202121

Conversation

@google-labs-jules
Copy link

⚡ Bolt: Optimize TripletDataGenerator sampling

💡 What:
Replaced the inefficient O(N) search for positive and negative samples with an O(1) dictionary lookup. Pre-grouped image paths by label in __init__.

🎯 Why:
The original implementation iterated through the entire dataset (zip(paths, labels)) for every anchor image in a batch to find matching/non-matching samples. For a dataset of size N, this resulted in O(N * Batch_Size) operations per batch, or O(N^2) per epoch. This was a major bottleneck for large datasets.

📊 Impact:

  • Reduces complexity to O(1) per sample.
  • Benchmark showed batch generation time dropping from ~0.166s to ~0.021s (excluding IO) for a 10k dataset.
  • Massive speedup for training on large datasets.

🔬 Measurement:
Ran a benchmark script that mocked file I/O and measured __getitem__ latency. Confirmed speedup and verified logic correctness with unit tests.


PR created automatically by Jules for task 7723859636208202121 started by @Devasy23

- Replaced linear dataset scan with hash map lookup for positive/negative sampling
- Reduces batch generation time from ~0.16s to ~0.02s for 10k items (7x speedup)
- Added safety check for single-class datasets
- Complexity reduction: O(N^2) -> O(N) per epoch
@google-labs-jules
Copy link
Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!


For security, I will only act on instructions from the user who triggered this task.

New to Jules? Learn more at jules.google/docs.

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants