test: harden v0.8.3 plan-mode trust-boundary tests + fix nested marker tracking by anandgupta42 · Pull Request #899 · AltimateAI/altimate-code

anandgupta42 · 2026-06-05T23:54:13Z

What does this PR do?

Fast-follow hardening from the v0.8.3 multi-model code review (GPT 5.4 + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + Claude). All reviewers' verdict on the shipped v0.8.3 was ship — would not have blocked; these are the non-blocking items they surfaced.

Fix nested-marker tracking (processor.ts). The v0.8.3 marker fix wrapped the reworded plan-no-tool warning in an inner // altimate_change block nested inside the outer plan-refusal block. script/upstream/analyze.ts findMarkers (lines 499–520) uses a single openBlock with no nesting stack, so the inner start clobbered the outer block and its end was dropped — the outer block fell out of marker tracking. It passed the strict CI gate (hunk-based) and the count balance, so it never failed the release, but it degraded upstream-bridge accuracy. The warning text is already inside the outer block, so removing the inner markers keeps the strict gate green (pure deletion) and restores correct tracking. (Gemini; verified against analyze.ts.)
Harden the behavioral trust-boundary tests. Replaced session: {} as any with a structurally valid { slug, time: { created } } and added an explicit Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE === false precondition, so a future flag flip fails loudly instead of throwing an opaque TypeError in Session.plan (and silently dropping the intended coverage). (Consensus of all 5 reviewers; Gemini rated MAJOR.)
Add an end-to-end SINK test. The existing tests proved the intermediate trustedReminderParts; this drives the result through the real system-array hoist and MessageV2.toModelMessages, asserting attacker text never reaches system and the hoisted reminder is not duplicated into the user role. (GPT.)
Make the wording-fix guards concept/synonym-tolerant (keeping the load-bearing negative too thin to act on guard) so legitimate copy improvements don't break the suite. (MiniMax.)
Rename a mislabeled source-regex test — it asserts the InsertRemindersResult type alias, not runtime behaviour. (MiniMax.)
Revert the plan.txt start-marker description edit — plan.txt is imported raw into the LLM prompt, so the added words were model-visible; the prose already documents the escape hatch. (Gemini.)

Type of change

Bug fix (non-breaking change which fixes an issue)
Test improvement

Issue for this PR

Closes #898

How did you verify your code works?

bun test test/session/plan-layer-e2e.test.ts test/skill/release-v0.8.3-adversarial.test.ts test/session/processor.test.ts test/session/system.test.ts test/provider/family.test.ts — 192 pass, 0 fail (incl. the new end-to-end sink test and flag preconditions).
bun test test/session/ test/skill/ test/provider/ — 1619 pass, 0 fail.
bun run typecheck — clean.
bun run script/upstream/analyze.ts --markers --base origin/main --strict — All custom code properly marked (confirms the nested-marker removal keeps the gate green and the outer block is no longer clobbered).
Verified the nesting is gone: processor.ts now has a single outer block (345→411), no inner markers.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (n/a — test/tooling only)
My changes generate no new warnings
I have added tests that prove my fix is effective
New and existing unit tests pass locally with my changes

Summary by cubic

Hardened v0.8.3 plan‑mode trust boundary and added end‑to‑end sink coverage, while fixing nested marker tracking in processor.ts to restore accurate upstream markers. Also reverted the model‑visible plan.txt marker description to keep the prompt unchanged. Closes #898.

Bug Fixes
- Removed nested inner // altimate_change markers around the plan‑no‑tool warning so script/upstream/analyze.ts correctly tracks the outer block.
- Reverted plan.txt start‑marker description since the file is imported raw into the LLM prompt.
Test Improvements
- Added an end‑to‑end sink test to ensure attacker text never reaches system and reminders aren’t duplicated into the user role.
- Replaced session: {} as any with a valid { slug, time: { created } } and added a Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE === false precondition to fail loudly on future flips.
- Made wording‑fix guards concept/synonym‑tolerant and renamed the source‑regex test to reflect it asserts the InsertRemindersResult type alias.

^{Written for commit d1e1a4c. Summary will update on new commits.}

Summary by CodeRabbit

Tests
- Expanded safety and trust-boundary test coverage for plan-layer functionality, including edge cases and adversarial scenarios.
Chores
- Removed internal directive comments and cleaned up code organization.
- Updated prompt configuration to streamline internal directives.

…r tracking Follow-up from the v0.8.3 multi-model code review (GPT 5.4, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.7, Claude). All non-blocking; the release is correct. - processor.ts: remove the nested `// altimate_change` block around the reworded plan-no-tool warning. `script/upstream/analyze.ts` `findMarkers` uses a single `openBlock` with no nesting stack, so the inner `start` clobbered the outer plan-refusal block and dropped it from marker tracking. The warning text is already inside the outer block, so removing the inner markers keeps the strict gate green (pure deletion) and restores correct block tracking. (Gemini, verified against analyze.ts:499-520.) - plan.txt: revert the start-marker description edit. plan.txt is imported raw into the LLM prompt, so the added words were model-visible; the prose itself already documents the trivial-task escape hatch. (Gemini #5.) - plan-layer-e2e.test.ts / release-v0.8.3-adversarial.test.ts: replace `session: {} as any` with a structurally valid `{ slug, time: { created } }` and add an explicit `Flag.OPENCODE_EXPERIMENTAL_PLAN_MODE === false` precondition, so a future flag flip fails loudly instead of throwing an opaque TypeError in `Session.plan`. (Consensus; Gemini MAJOR.) - plan-layer-e2e.test.ts: add an end-to-end SINK test — drive insertReminders' output through the real `system`-array hoist and `MessageV2.toModelMessages`, asserting attacker text never reaches `system` and the hoisted reminder is not duplicated into the user role. (GPT.) - release-v0.8.3-adversarial.test.ts: make the wording-fix guards concept/synonym-tolerant (keep the load-bearing negative "too thin to act on" guard) so legitimate copy improvements don't break the suite. (MiniMax.) - plan-layer-e2e.test.ts: rename the mislabeled `insertReminders return shape` test — it asserts the `InsertRemindersResult` type alias, not runtime behaviour (which is covered by the behavioral describe). (MiniMax.) Closes #898 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-06-05T23:54:26Z

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3edeffc7-037e-480a-b1ef-d3ad1fc0aad0

📥 Commits

Reviewing files that changed from the base of the PR and between daa0351 and d1e1a4c.

📒 Files selected for processing (4)

packages/opencode/src/session/processor.ts
packages/opencode/src/session/prompt/plan.txt
packages/opencode/test/session/plan-layer-e2e.test.ts
packages/opencode/test/skill/release-v0.8.3-adversarial.test.ts

💤 Files with no reviewable changes (1)

packages/opencode/src/session/processor.ts

📝 Walkthrough

Walkthrough

The PR removes altimate_change marker infrastructure from plan-layer code and prompts, consolidates test file-reading patterns, and significantly expands trust-boundary assertions around reminder hoisting and synthetic warning behavior in both plan-layer e2e and adversarial regression test suites.

Changes

Plan-layer marker cleanup and test hardening

Layer / File(s)	Summary
Remove altimate_change marker infrastructure `packages/opencode/src/session/processor.ts`, `packages/opencode/src/session/prompt/plan.txt`	Removes altimate_change directive comment markers from the plan-no-tool warning and plan.txt escape-hatch clause without changing runtime logic or user-facing content.
Update test imports for new assertions `packages/opencode/test/session/plan-layer-e2e.test.ts`, `packages/opencode/test/skill/release-v0.8.3-adversarial.test.ts`	Adds MessageV2 and Flag imports to support telemetry sink assertions and experimental plan mode precondition checks.
Consolidate test file-reading patterns `packages/opencode/test/session/plan-layer-e2e.test.ts`	Refactors nine test suites to use simplified single-line fs.readFile format while preserving invariants about agent initialization, sessionAgentName ordering, normalizeAgentName logic, revision-cap guards, synthetic message persistence, telemetry routing, family vendor usage, and import safety.
Extend plan-action classification test cases `packages/opencode/test/session/plan-layer-e2e.test.ts`	Adds edge case assertions for empty input and whitespace-only input mapping to the "refine" action classification.
Expand plan.txt safety and content checks `packages/opencode/test/session/plan-layer-e2e.test.ts`	Updates plan.txt safety tests to verify required investigative and tool-call guidance is present, confirms absence of debug/TODO markers, and validates two-step approach and reasonable length constraints.
Strengthen trust-boundary source-level contracts `packages/opencode/test/session/plan-layer-e2e.test.ts`	Adds assertions that reminderResult hoisting consumes trustedReminderParts (not re-implemented via part.synthetic scanning) and that InsertRemindersResult and insertReminders type signatures declare the trustedReminderParts contract.
Harden behavioral trust-boundary assertions `packages/opencode/test/session/plan-layer-e2e.test.ts`	Refactors trust-boundary behavioral tests with structurally valid session shapes, explicit precondition asserting OPENCODE_EXPERIMENTAL_PLAN_MODE is false, and sink assertions ensuring malicious system-reminder parts are never promoted to trustedReminderParts, attacker text never reaches model system array, and reminders are not duplicated into user payload.
Update v0.8.3 adversarial regression test suite `packages/opencode/test/skill/release-v0.8.3-adversarial.test.ts`	Replaces placeholder dummySession with structurally valid session object, adds explicit precondition asserting OPENCODE_EXPERIMENTAL_PLAN_MODE is false, and reworks wording assertions to be synonym-tolerant while targeting specific negative phrasing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

test: harden v0.8.3 plan-mode trust-boundary tests + fix nested marker tracking #898: Both changes remove or modify the same altimate_change marker blocks in processor.ts and plan.txt, addressing nested-marker tracking.

Possibly related PRs

AltimateAI/altimate-code#888: Both PRs target plan-mode warning and reminder trust-boundary areas, updating processor.ts warning detection and plan-layer assertions around escape-hatch concepts and synthetic message handling.

Suggested labels

contributor, needs-review:blocked

Poem

🐰 A rabbit hops through test files with care,
Markers removed, trust-boundaries laid bare,
Hardened assertions on every code path,
Synthetic warnings and reminders bath,
Safety checks nested—no escape hatch here! 🔒

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description lacks the required 'PINEAPPLE' identifier at the top and is not organized per the specified template sections (Summary, Test Plan, Checklist).	Add 'PINEAPPLE' at the very top of the description, then reorganize content into Summary, Test Plan, and Checklist sections as specified in the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main changes: hardening plan-mode trust-boundary tests and fixing nested marker tracking.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/v0.8.3-review-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T23:54:48Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-06-05T23:55:51Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

cubic-dev-ai

No issues found across 4 files

_{Re-trigger cubic}

dev-punia-altimate

Multi-Persona Review — Verdict: skipped

Multi-persona review completed.

0/0 agents completed · 2s · 0 findings (0 critical, 0 high, 0 medium)

_{Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·}

claude Bot reviewed Jun 5, 2026

View reviewed changes

github-actions Bot added the contributor label Jun 5, 2026

github-actions Bot added the needs-review:blocked label Jun 5, 2026

cubic-dev-ai Bot reviewed Jun 5, 2026

View reviewed changes

dev-punia-altimate reviewed Jun 6, 2026

View reviewed changes

anandgupta42 merged commit 7e909a9 into main Jun 6, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: harden v0.8.3 plan-mode trust-boundary tests + fix nested marker tracking#899

test: harden v0.8.3 plan-mode trust-boundary tests + fix nested marker tracking#899
anandgupta42 merged 1 commit into
mainfrom
fix/v0.8.3-review-followup

anandgupta42 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

dev-punia-altimate left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anandgupta42 commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Issue for this PR

How did you verify your code works?

Checklist

Summary by cubic

Summary by CodeRabbit

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

dev-punia-altimate left a comment

Choose a reason for hiding this comment

Multi-Persona Review — Verdict: skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandgupta42 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading