Is your feature request related to a problem? Please describe.
I need to compare responses from two models (or two prompts) on the same question, but Chainlit has no way to show them side-by-side.
Describe the solution you'd like
A/B testing mode: one user message → two responses displayed in split view, with two variants running in parallel on the same conversation. The user picks the better one (A, B, or tie, with optional comment). Preference stored in the data layer.
A variant can be:
- Two different Chat Profiles (model A vs model B, or prompt A vs B)
- Same Chat Profile with different Chat Settings (e.g., temperature 0.3 vs 0.9)
- Two independent runs of the same config (to measure variance)
Describe alternatives you've considered
- Chat Profiles + manual switching between two separate conversations — no side-by-side, no structured preference capture.
- External A/B tooling (LangSmith, PromptFoo…) — works offline but doesn't capture real in-app user preferences.
Prerequisite
This feature depends on the ability to run two Chat Profiles in parallel on the same conversation. As a first step, hot Chat Profile swapping (changing profile without creating a new chat) is needed — tracked in #2899.
Example :

Is your feature request related to a problem? Please describe.
I need to compare responses from two models (or two prompts) on the same question, but Chainlit has no way to show them side-by-side.
Describe the solution you'd like
A/B testing mode: one user message → two responses displayed in split view, with two variants running in parallel on the same conversation. The user picks the better one (A, B, or tie, with optional comment). Preference stored in the data layer.
A variant can be:
Describe alternatives you've considered
Prerequisite
This feature depends on the ability to run two Chat Profiles in parallel on the same conversation. As a first step, hot Chat Profile swapping (changing profile without creating a new chat) is needed — tracked in #2899.
Example :