Skip to content

A/B testing — side-by-side model/prompt comparison in chat #2895

@Thomas-Fernandes

Description

@Thomas-Fernandes

Is your feature request related to a problem? Please describe.

I need to compare responses from two models (or two prompts) on the same question, but Chainlit has no way to show them side-by-side.

Describe the solution you'd like

A/B testing mode: one user message → two responses displayed in split view, with two variants running in parallel on the same conversation. The user picks the better one (A, B, or tie, with optional comment). Preference stored in the data layer.

A variant can be:

  • Two different Chat Profiles (model A vs model B, or prompt A vs B)
  • Same Chat Profile with different Chat Settings (e.g., temperature 0.3 vs 0.9)
  • Two independent runs of the same config (to measure variance)

Describe alternatives you've considered

  • Chat Profiles + manual switching between two separate conversations — no side-by-side, no structured preference capture.
  • External A/B tooling (LangSmith, PromptFoo…) — works offline but doesn't capture real in-app user preferences.

Prerequisite

This feature depends on the ability to run two Chat Profiles in parallel on the same conversation. As a first step, hot Chat Profile swapping (changing profile without creating a new chat) is needed — tracked in #2899.

Example :

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions