[OAI] Allow forcing Responses API for non-gpt-5 model names by wong-codaio · Pull Request #190 · braintrustdata/autoevals

Kenny Wong (wong-codaio) · 2026-05-29T11:59:22Z

Summary

[OAI] Allow forcing Responses API for non-gpt-5 model names

per-call use_responses_api (py) / useResponsesApi (js) flag forces the Responses API. routing becomes isGPT5Model(model) || useResponsesApi; flag is stripped before the request.
motivation: internal proxies may rewrite the model name for routing (e.g. a service-tier prefix), so a model that requires the Responses API can arrive under a name that doesn't start with gpt-5. the name check then sends it to Chat Completions and it fails, with no way to override. this flag lets such a model work regardless of its name.
per-call, not global: the model is chosen per call, so a global switch can't say "this model yes, that model no". keeps it next to model, like temperature/maxTokens.
also fixes a Responses-API bug found while testing: reasoning_effort was sent top-level (the API wants reasoning.effort), so any reasoning call routed to Responses 400'd.

PTAL:
FYI:

Test plan

unit tests (js + py, incl. built-in named scorers and reasoning.effort)
manual smoke test — scratch scripts below, each runs a scorer 3 ways and prints the endpoint hit:

OPENAI_API_KEY=sk-... [OPENAI_BASE_URL=https://us.api.openai.com/v1] python test.py
OPENAI_API_KEY=sk-... [OPENAI_BASE_URL=https://us.api.openai.com/v1] node test.mjs   # after `pnpm run build`

test.py

"""Scratch check: gpt-4.1 supports both Chat Completions and Responses APIs.
Run with OPENAI_API_KEY set. The request hook prints which endpoint each call hits.
If your org is region-pinned, also set OPENAI_BASE_URL (e.g. https://us.api.openai.com/v1):
  OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://us.api.openai.com/v1 python test.py
"""

import os

import httpx
from openai import OpenAI

from autoevals import Factuality, LLMClassifier, init

init(
    OpenAI(
        base_url=os.environ.get("OPENAI_BASE_URL"),  # None → SDK default (api.openai.com)
        http_client=httpx.Client(event_hooks={"request": [lambda r: print("  request →", r.url.path)]}),
    )
)

data = dict(output="6", expected="6", input="Add the numbers 1, 2, 3")

print("gpt-4.1 (default → expect /chat/completions):")
print("  score =", Factuality(model="gpt-4.1").eval(**data).score)

print("gpt-4.1 + use_responses_api=True (→ expect /responses):")
print("  score =", Factuality(model="gpt-4.1", use_responses_api=True).eval(**data).score)

# Built-in named scorers don't forward reasoning_effort yet, so use LLMClassifier here.
print("gpt-5.4 + medium reasoning (gpt-5 family → expect /responses):")
clf = LLMClassifier(
    name="match",
    prompt_template="Is the submission {{output}} equal to {{expected}}? Answer Y or N.",
    choice_scores={"Y": 1, "N": 0},
    model="gpt-5.4",
    reasoning_effort="medium",
)
print("  score =", clf.eval(**data).score)

test.mjs

// Scratch check: gpt-4.1 supports both Chat Completions and Responses APIs.
// Run with OPENAI_API_KEY set. The fetch wrapper prints which endpoint each call hits.
// If your org is region-pinned, also set OPENAI_BASE_URL (e.g. https://us.api.openai.com/v1):
//   OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://us.api.openai.com/v1 node test.mjs
import { OpenAI } from "openai";
import { Factuality, LLMClassifierFromTemplate, init } from "./jsdist/index.mjs";

const client = new OpenAI({
  baseURL: process.env.OPENAI_BASE_URL, // undefined → SDK default (api.openai.com)
  fetch: (url, opts) => {
    const u = typeof url === "string" ? url : url.url;
    console.log("  request →", new URL(u).pathname);
    return fetch(url, opts);
  },
});
init({ client });

const data = { output: "6", expected: "6", input: "Add the numbers 1, 2, 3" };

console.log("gpt-4.1 (default → expect /chat/completions):");
console.log("  score =", (await Factuality({ ...data, model: "gpt-4.1" })).score);

console.log("gpt-4.1 + useResponsesApi:true (→ expect /responses):");
console.log(
  "  score =",
  (await Factuality({ ...data, model: "gpt-4.1", useResponsesApi: true })).score,
);

// Built-in named scorers don't forward reasoningEffort yet, so use LLMClassifierFromTemplate here.
console.log("gpt-5.4 + medium reasoning (gpt-5 family → expect /responses):");
const clf = LLMClassifierFromTemplate({
  name: "match",
  promptTemplate: "Is the submission {{output}} equal to {{expected}}? Answer Y or N.",
  choiceScores: { Y: 1, N: 0 },
  model: "gpt-5.4",
  reasoningEffort: "medium",
});
console.log("  score =", (await clf({ ...data })).score);

Proxy/internal setups can serve a GPT-5 model under a name that doesn't start with "gpt-5", so the name-based isGPT5Model() check alone can't route them to the Responses API. Add a per-call use_responses_api / useResponsesApi flag (camelCase at the scorer layer, snake_case in CachedLLMParams) so callers can force it; the flag is stripped before the request is sent.

SpecFileClassifier.__new__ has a fixed kwarg list, so Factuality(use_responses_api=True) and the other named scorers raised TypeError. Forward the flag like the other model knobs.

The Responses API rejects a top-level reasoning_effort param ("moved to reasoning.effort"), so reasoning calls routed to it 400'd. Nest it correctly in both languages.

Kenny Wong (wong-codaio) added 3 commits May 29, 2026 07:56

[OAI] Thread use_responses_api through built-in named scorers

ad9d5a5

SpecFileClassifier.__new__ has a fixed kwarg list, so Factuality(use_responses_api=True) and the other named scorers raised TypeError. Forward the flag like the other model knobs.

[OAI] Map reasoning_effort to reasoning.effort for the Responses API

a780f0a

The Responses API rejects a top-level reasoning_effort param ("moved to reasoning.effort"), so reasoning calls routed to it 400'd. Nest it correctly in both languages.

Kenny Wong (wong-codaio) marked this pull request as ready for review May 29, 2026 12:28

ekeith (evanmkeith) requested review from Erin McNulty (erin2722) and Olmo Maldonado (ibolmo) and removed request for Olmo Maldonado (ibolmo) May 29, 2026 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OAI] Allow forcing Responses API for non-gpt-5 model names#190

[OAI] Allow forcing Responses API for non-gpt-5 model names#190
Kenny Wong (wong-codaio) wants to merge 3 commits into
braintrustdata:mainfrom
wong-codaio:wong/oai/force-responses-api

Kenny Wong (wong-codaio) commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kenny Wong (wong-codaio) commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kenny Wong (wong-codaio) commented May 29, 2026 •

edited

Loading