⚡ Bolt: Optimize TripletDataGenerator sampling#15
⚡ Bolt: Optimize TripletDataGenerator sampling#15google-labs-jules[bot] wants to merge 1 commit intomainfrom
Conversation
- Pre-compute `label_to_paths` index in `on_epoch_end` - Replace O(N) list comprehension with O(1) dictionary lookup in `_generate_triplet_batch` - Benchmarks show reduction from ~0.17s to ~0.03s per batch (5.6x speedup) on dummy dataset (5k images)
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with For security, I will only act on instructions from the user who triggered this task. New to Jules? Learn more at jules.google/docs. |
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
💡 What: Optimized the triplet selection logic in
TripletDataGenerator.🎯 Why: The previous implementation iterated over the entire dataset twice for every image in a batch to find positive and negative samples, leading to O(N * BatchSize) complexity.
📊 Impact: Reduces batch generation time by ~5.6x on a dataset of 5,000 images. For larger datasets, the impact will be even greater (linear scaling with dataset size removed).
🔬 Measurement: Ran a benchmark script simulating 5000 images and 100 classes. Time per batch dropped from 0.1687s to 0.0287s.
PR created automatically by Jules for task 4829227161692139817 started by @Devasy23