You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up from the v0.8.3 multi-model code review (GPT 5.4 + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + Claude). All reviewers verdict: ship — these are non-blocking hardening items found post-merge.
Findings to address
Nested // altimate_change markers break findMarkers tracking (Gemini, verified against script/upstream/analyze.ts:499-520). The v0.8.3 marker-fix wrapped the reworded processor.ts warning in an inner marker block nested inside the existing outer plan-refusal block. findMarkers uses a single openBlock with no nesting stack, so the inner start clobbers the outer block and the outer end is dropped — the whole outer block falls out of marker tracking. Passes the strict CI gate (hunk-based) and count balance, so it didn't fail the release, but it degrades upstream-bridge accuracy. Fix: remove the nested markers (the warning text is already inside the outer block, so the strict gate still passes on pure deletion).
Behavioral trust-boundary tests use session: {} as any + depend on the default plan-mode path (consensus: all 5 reviewers; Gemini rated MAJOR). If OPENCODE_EXPERIMENTAL_PLAN_MODE ever defaults true, Session.plan(session) throws on {} and the exact-string reminder assertion wouldn't match the experimental prompt — tests fail confusingly or stop covering the intended path. Fix: structurally valid session ({ slug, time: { created } }) + an explicit expect(Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE).toBe(false) precondition that fails loudly on a flag flip.
Tests stop one layer short of the model-input sink (GPT). They prove the intermediate trustedReminderParts, not the final system array + MessageV2.toModelMessages user-role payload. Fix: add an end-to-end sink test asserting attacker text never reaches system and the hoisted reminder is not duplicated into the user role.
Exact-phrase wording assertions are fragile (MiniMax). "too thin to act on" negative guard is good, but the positive checks pin exact copy that a legitimate improvement would break. Fix: concept/synonym-tolerant assertions.
Mislabeled source-regex test (MiniMax). The insertReminders return shape test actually asserts the InsertRemindersResult type alias exists. Fix: rename for honesty.
Pre-existing: // altimate_change markers in plan.txt are LLM-visible (imported raw). Flagged by GPT/Gemini/Kimi as a NIT — needs a build-time strip or lint rule. Will file separately.
Follow-up from the v0.8.3 multi-model code review (GPT 5.4 + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + Claude). All reviewers verdict: ship — these are non-blocking hardening items found post-merge.
Findings to address
Nested
// altimate_changemarkers breakfindMarkerstracking (Gemini, verified againstscript/upstream/analyze.ts:499-520). The v0.8.3 marker-fix wrapped the rewordedprocessor.tswarning in an inner marker block nested inside the existing outer plan-refusal block.findMarkersuses a singleopenBlockwith no nesting stack, so the innerstartclobbers the outer block and the outerendis dropped — the whole outer block falls out of marker tracking. Passes the strict CI gate (hunk-based) and count balance, so it didn't fail the release, but it degrades upstream-bridge accuracy. Fix: remove the nested markers (the warning text is already inside the outer block, so the strict gate still passes on pure deletion).Behavioral trust-boundary tests use
session: {} as any+ depend on the default plan-mode path (consensus: all 5 reviewers; Gemini rated MAJOR). IfOPENCODE_EXPERIMENTAL_PLAN_MODEever defaults true,Session.plan(session)throws on{}and the exact-string reminder assertion wouldn't match the experimental prompt — tests fail confusingly or stop covering the intended path. Fix: structurally valid session ({ slug, time: { created } }) + an explicitexpect(Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE).toBe(false)precondition that fails loudly on a flag flip.Tests stop one layer short of the model-input sink (GPT). They prove the intermediate
trustedReminderParts, not the finalsystemarray +MessageV2.toModelMessagesuser-role payload. Fix: add an end-to-end sink test asserting attacker text never reachessystemand the hoisted reminder is not duplicated into the user role.Exact-phrase wording assertions are fragile (MiniMax).
"too thin to act on"negative guard is good, but the positive checks pin exact copy that a legitimate improvement would break. Fix: concept/synonym-tolerant assertions.Mislabeled source-regex test (MiniMax). The
insertReminders return shapetest actually asserts theInsertRemindersResulttype alias exists. Fix: rename for honesty.plan.txt marker description was edited (Gemini chore(deps): Bump @gitlab/gitlab-ai-provider from 3.6.0 to 4.1.0 #5, minor). Reverted the 4-word addition since plan.txt is imported raw into the LLM prompt; the prose already documents the escape hatch.
Not in scope (separate tracking)
// altimate_changemarkers inplan.txtare LLM-visible (imported raw). Flagged by GPT/Gemini/Kimi as a NIT — needs a build-time strip or lint rule. Will file separately.