You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
- **Fix false-positive verification**: The old verify tool ran
`promptfoo eval` against a config with `redteam` section but no `tests`
array — zero tests ran, zero failures, reported success. Now replaced
with a 3-step process: direct provider smoke test → session test → eval
with 2 real test cases.
- **Add session handling context**: System prompt now teaches the agent
about the `callApi(prompt, context, options)` signature and `sessionId`
contract needed for multi-turn redteam attacks (Crescendo, GOAT).
- **Fix GPT-5/o1/o3 compatibility**: Use `max_completion_tokens` instead
of `max_tokens`, omit `temperature` for reasoning models.
- **Fix Node.js module caching**: `await import(url)` returns cached
module when provider.js is rewritten. Added `?t=timestamp` cache buster.
## Changed files
| File | What changed |
|------|-------------|
| `generator/config.ts` | Replace `redteam` section with `prompts` +
`tests` + `defaultTest.assert` |
| `agent/loop.ts` | Rewrite `verify` tool: smoke test + session test +
eval with proper parsing |
| `agent/tools.ts` | Update verify description, remove unused `numTests`
param |
| `agent/system-prompt.ts` | Add session handling section, update
`callApi` signature in example |
| `agent/providers.ts` | GPT-5/o1/o3 compat (`max_completion_tokens`, no
`temperature`) |
## Test plan
- [ ] Run `crab pf` against a simple HTTP target — verify eval shows `2
passed, 0 failed`
- [ ] Run against a session-based target — verify provider gets correct
`callApi` signature and returns `sessionId`
- [ ] Run with `--provider openai:gpt-5` — verify no API errors
- [ ] Break provider.js intentionally — verify smoke test catches it
(not false-positive)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4. Promptfoo stores the sessionId and passes it back on turn 2+ via \`context.vars.sessionId\`
84
+
5. Provider reads \`context.vars.sessionId\` and reuses the existing conversation
85
+
86
+
**If the target is stateful (uses sessions, conversation IDs, etc.), the provider MUST support this flow.** Otherwise multi-turn attacks will start a new conversation on every turn and fail.
87
+
88
+
For **custom providers**: Accept the \`context\` parameter, check \`context.vars.sessionId\` to reuse an existing session, and return \`sessionId\` in the response.
89
+
90
+
For **HTTP providers**: Use \`sessionParser\` in the config to extract the session ID from the response (e.g. \`sessionParser: json.session_id\`). Promptfoo handles the rest automatically.
91
+
72
92
## Workflow
73
93
74
94
1. Read the target spec to understand the API
75
95
2. Probe to verify connectivity and response format
76
96
3. Decide: HTTP provider (simple) or custom provider (complex)
77
97
4. Write config (and provider.js if needed)
78
-
5. Verify with promptfoo eval
98
+
5. Verify — runs provider smoke test + session test, then promptfoo eval with 2 test cases
79
99
6. Call done() with results
80
100
81
101
Be intelligent. Figure out the target's protocol, auth, request/response format from probing. Generate configs that work.`;
0 commit comments