Add Question 189: Compute Direct Preference Optimization Loss by zhenhuan-yang · Pull Request #583 · Open-Deep-ML/DML-OpenProblem

zhenhuan-yang · 2026-02-06T07:35:40Z

Summary

This PR adds a new medium-difficulty Deep Learning question on computing Direct Preference Optimization (DPO) loss for language model alignment.

Question Details

ID: 189
Title: Compute Direct Preference Optimization Loss
Difficulty: Medium
Category: Deep Learning

Implementation

✅ Complete solution with proper numerical stability using np.log1p
✅ Comprehensive educational content covering DPO theory and Bradley-Terry model
✅ Mathematical formulation with LaTeX
✅ 4 diverse test cases with varying parameters
✅ Example with detailed reasoning

Validation

✅ Build successful
✅ Schema validation passed
✅ All test cases pass

Educational Value

Covers an important modern technique for LLM alignment that's simpler and more stable than traditional RLHF, making it highly relevant for current ML practitioners.

moe18 · 2026-03-05T02:14:29Z

questions/189_compute-direct-preference-loss/solution.py

+
+    # Compute loss using log-sigmoid for numerical stability
+    # Loss = -log(sigmoid(logits)) = log(1 + exp(-logits))
+    losses = np.log1p(np.exp(-logits))


This overflows to inf when logits is large and negative
losses = np.logaddexp(0, -logits)
this might be more stable

I would also add a test case to exploit this

compute dpo loss

8d7ccc0

moe18 reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Question 189: Compute Direct Preference Optimization Loss#583

Add Question 189: Compute Direct Preference Optimization Loss#583
zhenhuan-yang wants to merge 1 commit intoOpen-Deep-ML:mainfrom
zhenhuan-yang:zhy-dpo

zhenhuan-yang commented Feb 6, 2026

Uh oh!

moe18 Mar 5, 2026

Uh oh!

moe18 Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhenhuan-yang commented Feb 6, 2026

Summary

Question Details

Implementation

Validation

Educational Value

Uh oh!

moe18 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

moe18 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants