Skip to content

feat(traces): opt-in consent-gated server-side conversation trace storage#18

Merged
albanm merged 50 commits into
mainfrom
feat-store-traces
Jun 9, 2026
Merged

feat(traces): opt-in consent-gated server-side conversation trace storage#18
albanm merged 50 commits into
mainfrom
feat-store-traces

Conversation

@albanm

@albanm albanm commented Jun 9, 2026

Copy link
Copy Markdown
Member

Add opt-in, consent-gated server-side storage of conversation traces, with admin review tooling, replacing the old in-browser session recorder and trace upload/handoff.

What changed:

  • New trace-requests collection storing each physical gateway request/response; written fire-and-forget only when the org enables storeTraces and the user consents (x-trace-consent). Auto-expires after 30 days via a TTL index.
  • Gateway tags and records turn / sub-agent / compaction / moderation requests; the trace is reconstructed into a reviewable session on the client.
  • Admin API under /api/traces: paginated conversation list, fetch-by-conversation, delete-conversation, and GDPR per-user erasure (DELETE /:type/:id?userId=).
  • New read-only activity page (config + usage + paginated traces) and per-trace review page; two-tab chat debug dialog with an admin review link and a consent sheet/toggle.
  • Removed the in-browser session recorder, trace upload, and the old trace-review page/handoff.
  • Merges main: adopts the usage/enforce.ts quota refactor (feat(limits): add untrusted pool quota for anonymous + external usage #16) and the parallel-tool-call streaming fix (fix(gateway): preserve parallel tool calls in the streaming path #17), keeping trace capture in the same stream loop.

Why: give admins a way to review real conversations for debugging/quality, while keeping it strictly opt-in per org and per user, with bounded retention.

Regression risks (reviewer focus):

  • settings PUT now omits models when the form sends none (documented form-diff fix); with replaceOne this clears a stored empty models. Confirm no consumer reads settings.models/settings.quotas without optional chaining — both are now optional in the schema.
  • Streaming path now interleaves the parallel-tool-call indexing (from main) with trace tool-call accumulation — covered by the parallel-tool-calls API test.
  • New TTL index means stored traces self-delete after 30 days; five new indexes are created on the trace-requests collection at startup.
  • GET /traces/conversation/:id looks up across owners before authorizing (returns 403 with no data to non-admins of the owner).

albanm and others added 30 commits June 8, 2026 11:46
Gateway-side recording of physical requests, gated by org setting +
client consent cookie; 30-day TTL; reconstruct SessionTrace at view
time; OTel GenAI vocabulary alignment; admin read/delete + GDPR erasure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add boolean storeTraces to settings schema for server-side trace storage opt-in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add storeTraces: false to emptySettings so the GET endpoint returns a
consistent value before the first PUT. Rewrite the persists the storeTraces
flag test to use shared mockModel/defaultQuotas helpers and add a second
assertion that verifies the false default path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Write one trace document per model call when storeTraces is on AND the request
carries x-trace-consent: yes. Recording is strictly fire-and-forget (service
swallows all errors). Also advertises availability via x-trace-storage: available
response header. Covers both streaming and non-streaming completion paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing in the response path

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds GET/DELETE endpoints for trace-requests under /api/traces/:type/:id
(conversation list + per-conversation detail/delete + per-user erasure).
Mounts the router in app.ts and extends the test-env cleanup to clear
trace-requests between test runs. Adds api spec verifying gateway recording
end-to-end (consent gate, storeTraces toggle, auth guard, delete).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lper

add a compound index on (owner.type, owner.id, userId) to support the
per-user erasure query without a collection scan. add two API tests:
one asserting the 400 guard when ?userId is omitted, and a real end-to-end
erasure test using an org owner where trackPerUser=true stores the member's
userId. also make waitForConversations() throw instead of silently returning
an empty result on timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds pure `reconstructTrace(requests)` function that rebuilds a `SessionTrace`
from an array of stored `StoredTraceRequest` documents so the existing
trace viewer can display server-side traces without modification.
Includes three unit tests covering system-prompt extraction, tool-call/result
pairing, and sub-agent grouping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ble-quoted previews

safeJson keeps non-JSON string outputs as-is. Make the tool-result test
assertion robust to object/string output forms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to gateway

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add TraceConsentSheet component that shows a bottom-sheet asking the user
to accept or decline server-side trace storage when the gateway advertises
x-trace-storage: available and no consent cookie exists. Mount it in
AgentChat.vue. Add a consent toggle in the Settings tab of AgentChatDebugDialog
for users to revisit their choice. Backed by e2e tests covering accept/persist,
settings-toggle visibility, and no-sheet-when-disabled scenarios.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce a module-level `consentRef` in trace-consent.ts, updated by
`writeConsent`, so accepting/declining the bottom-sheet is immediately
reflected in the Settings-tab toggle without a page reload. Also switch
the bottom-sheet from `v-model` to `:model-value` on a read-only
computed, and strengthen the e2e test with an immediate toBeHidden()
assertion after Accept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route fetchStored/loadStored/deleteStored failures through the page's
existing loadError ref so admins see feedback on 4xx/5xx/network errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…se UI

Fix A: strip leading `subagent_` from tool-call names before matching against
`subByKey` keys (which are built from the display name, e.g. `Researcher:0`),
so that two distinct sub-agents firing in the same step each map to the correct
sub-agent block by name rather than falling back to arrival order.

Fix B: add a per-user GDPR erase action (mdiAccountRemove icon button) on each
stored-conversation row that has a userId.  Calls DELETE /api/traces/:type/:id?userId=…
after window.confirm, then re-fetches the list.  Bilingual i18n keys added.
Updated the trace-review e2e to use `.first()` on the button locator now that
rows can have two buttons.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gs form

The settings vjsf form mishandled the conditionally-hidden model/quota
sections and persisted-data round-tripping:

- models/quotas are required at the root but their sections are hidden
  until a provider exists; vjsf prunes the hidden empty data on first
  edit, so toggling the new "store traces" switch on an empty config
  raised a global "information obligatoire" error. Drop models/quotas
  from the root required array (both have defaults).
- the server kept re-injecting models: {} (emptySettings + PUT
  body.models ?? {}) which vjsf then strips from the hidden section,
  producing a permanent diff that made Save reappear on every reload.
  Round-trip models exactly instead: emptySettings omits it and PUT
  stores it only when present. The API already preserves submitted data.
- extract the duplicated defaultQuotas into settings/service.ts and use
  it (with optional chaining) in gateway/summary/usage now that
  models/quotas are optional on the Settings type.

Regression tests cover: no required error when toggling store-traces on
an empty config, empty config converges after save, and a saved config
converges after one save.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cation

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mplification

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the fixed \$limit:200 with a \$facet-based pagination.
Honor ?page and ?size query params (size clamped 1-200, default 20).
Response now includes a count field alongside results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t recorder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove all live-capture methods (reset, setSystemPrompt, snapshotTools,
startTurn, startToolCall, finishToolCall, startSubAgent, recordSubAgentStep,
recordCompaction, recordModerationDecision, recordPhysicalRequest, finishStep,
addStepMessages) and the transient private state they used (currentTurn,
currentStep, pendingToolCalls). The class now exposes only fromTrace(),
getTrace(), getTraceOverview(), getTraceEntry(), and getTraceEntries().

Rewrite the unit spec to use fromTrace() with hardcoded SessionTrace fixtures
instead of the removed recording API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
albanm and others added 20 commits June 9, 2026 11:32
…ted traces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eview link

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w page

The handoff bridge and the /:type/:id/trace-review page are superseded
by server-stored traces (activity page + /traces/:id/review). Removed
their unit/e2e test files and updated typed-router.d.ts accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests for compaction, moderation, hidden-context and tools-changed
overview entries that were lost in the read-only surface rewrite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation-in-trace (out of scope)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… trace

Tag the moderator gateway call with a moderation:<turnId> context so it is
stored, derive contextKind 'moderation' server-side, and reconstruct it into a
moderation trace entry (re-parsing the stored verdict). Also fixes systemPrompt
reconstruction to use a turn request, not the moderation/compaction prompt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The consent bottom-sheet (shown when trace storage is available but consent is
undefined) overlays the chat and blocks message sends. Move storeTraces:true out
of the shared settings into the review-page tests that grant consent, so the
functionality tests (multi-message / compaction) aren't blocked. Also retry once
locally to absorb dev-server load flakes during the long sequential e2e run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The review-page assertions duplicated the dedicated 'Subagent trace appears on
the review page' test, making :77 a double-flow test prone to timeout. It now
only verifies the live UI chain (no trace storage / consent needed).
A turn blocked by moderation aborts its assistant stream before the gateway
records the turn request, leaving only the moderation request stored. Reconstruct
now surfaces such orphaned moderation/compaction requests as their own turn, so the
moderation entry appears on the review page regardless of whether the turn was
stored. Fixes the intermittent moderation review-page test.
…resAt

Retention is a single fixed 30-day policy, so store createdAt as a BSON Date and
put the TTL index on it (expireAfterSeconds: 30d) instead of computing a separate
expiresAt at write time. mongoLib.configure drops+recreates the ttl-keys index.
GET /traces/conversation/:id queries by conversation.id with no owner, which no
existing index covered as a prefix (list-keys has it 3rd) — it was a collection
scan. Add { conversation.id: 1, createdAt: 1 }.
A physical request and the semantic entries reconstructed from its
response share one timestamp, so the timestamp-only sort left the
request after its derived info. Add a tie-break rank: inputs, then the
physical request, then the extracted entries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reconcile the trace-storage work with main's quota refactor (#16) and
streaming parallel-tool-call fix (#17):
- gateway/summary routers adopt resolveUsageIdentity/enforceQuotas from
  usage/enforce.ts while keeping trace recording and the streamed
  tool-call capture; merged the per-id toolCallIndex with streamedToolCalls
- defaultQuotas (now incl. untrusted) stays centralized in settings/service.ts;
  routers fall back to it, enforce.ts uses NonNullable<Settings['quotas']>
- settings.quotas/models remain optional (storeTraces form work)
- regenerated put-req validate.js and vjsf components from the merged schema

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@albanm albanm merged commit f5a599c into main Jun 9, 2026
3 checks passed
@albanm albanm deleted the feat-store-traces branch June 9, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant