Skip to content

Batch Open-vocabulary Detection with Grounding Models#21

Open
NetZissou wants to merge 15 commits intomainfrom
feature/detection_grounding
Open

Batch Open-vocabulary Detection with Grounding Models#21
NetZissou wants to merge 15 commits intomainfrom
feature/detection_grounding

Conversation

@NetZissou
Copy link
Copy Markdown
Collaborator

@NetZissou NetZissou commented Nov 21, 2025

Add a batch pipeline that takes

  • (a) an image corpus (folder or Parquet of binary images/URIs) and,
  • (b) one or more text labels, and returns detection boxes (with scores + optional masks) for each image/label using an open-vocabulary grounding model such as OWLv2

Close #18

NetZissou and others added 15 commits July 18, 2025 14:53
…ated authors list; updated project URL sections
- Added initial implementation using OWLv2 for zero-shot batch detection using
  text labels
- Added SLURM scripts and config templates
- modified `ParquetImageDataset` & `ImageFolderDataset` to add option for
  returning image size, which will later used for OWLv2 detection parse
- replaced PIL transform with Owlv2Processor wrapper, offload
  preprocessing from the main thread to the sub-workers
- added `owlv2_collate` to collate output from Dataset object
@NetZissou NetZissou self-assigned this Nov 21, 2025
@NetZissou NetZissou added the enhancement New feature or request label Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batch Open-vocabulary Detection with Grounding Models

1 participant