Skip to content

test: harden v0.8.3 plan-mode trust-boundary tests + fix nested marker tracking #898

@anandgupta42

Description

@anandgupta42

Follow-up from the v0.8.3 multi-model code review (GPT 5.4 + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + Claude). All reviewers verdict: ship — these are non-blocking hardening items found post-merge.

Findings to address

  1. Nested // altimate_change markers break findMarkers tracking (Gemini, verified against script/upstream/analyze.ts:499-520). The v0.8.3 marker-fix wrapped the reworded processor.ts warning in an inner marker block nested inside the existing outer plan-refusal block. findMarkers uses a single openBlock with no nesting stack, so the inner start clobbers the outer block and the outer end is dropped — the whole outer block falls out of marker tracking. Passes the strict CI gate (hunk-based) and count balance, so it didn't fail the release, but it degrades upstream-bridge accuracy. Fix: remove the nested markers (the warning text is already inside the outer block, so the strict gate still passes on pure deletion).

  2. Behavioral trust-boundary tests use session: {} as any + depend on the default plan-mode path (consensus: all 5 reviewers; Gemini rated MAJOR). If OPENCODE_EXPERIMENTAL_PLAN_MODE ever defaults true, Session.plan(session) throws on {} and the exact-string reminder assertion wouldn't match the experimental prompt — tests fail confusingly or stop covering the intended path. Fix: structurally valid session ({ slug, time: { created } }) + an explicit expect(Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE).toBe(false) precondition that fails loudly on a flag flip.

  3. Tests stop one layer short of the model-input sink (GPT). They prove the intermediate trustedReminderParts, not the final system array + MessageV2.toModelMessages user-role payload. Fix: add an end-to-end sink test asserting attacker text never reaches system and the hoisted reminder is not duplicated into the user role.

  4. Exact-phrase wording assertions are fragile (MiniMax). "too thin to act on" negative guard is good, but the positive checks pin exact copy that a legitimate improvement would break. Fix: concept/synonym-tolerant assertions.

  5. Mislabeled source-regex test (MiniMax). The insertReminders return shape test actually asserts the InsertRemindersResult type alias exists. Fix: rename for honesty.

  6. plan.txt marker description was edited (Gemini chore(deps): Bump @gitlab/gitlab-ai-provider from 3.6.0 to 4.1.0 #5, minor). Reverted the 4-word addition since plan.txt is imported raw into the LLM prompt; the prose already documents the escape hatch.

Not in scope (separate tracking)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions