Logical conflict between data loading and collation.

In _load_ultrachat_conversations, the messages are constructed using only the user role:
msgs = [{"role": "user", "content": prompt}]

However, LanguageDataCollator.__call__ implements a mandatory check that skips any sample missing an assistant turn:
```python
if not any(m.get("role") == "assistant" for m in messages):
    continue
```

So, this causes all samples to be skipped during training because no assistant responses exist in the pre-processed data. Is that right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logical conflict between data loading and collation. #1319

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Logical conflict between data loading and collation. #1319

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions