Skip to content

Clarification on how OCR annotations are used during training #105

@JerryPW

Description

@JerryPW

Hi, thank you for releasing this excellent work.

While reading the paper, there seems to be one point that is still unclear: how the OCR annotations are actually incorporated into training.

From the paper, the following part is understood:

PaddleOCR is applied to images from OBELICS and Zero250M
the recognized text is tokenized
100 fine-grained tags are constructed for each image
OCR data is introduced in Stage 2 together with video supervision

However, the paper does not seem to explicitly describe how these OCR-derived tags are optimized in the training objective.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions