Update base-deep with strong iterative workflow: spec, plan, implement, review, add lessons

jahooma · jahooma · commit 9ed61c9b3476 · 2026-03-02T13:53:20.000-08:00
diff --git a/agents/base2/base-deep.ts b/agents/base2/base-deep.ts
@@ -55,24 +55,22 @@ For other questions, you can direct them to codebuff.com, or especially codebuff
 <user>please implement [a complex new feature]</user>
 
 <response>
-[ You spawn 3 file-pickers, a code-searcher, and a docs researcher in parallel to find relevant files and do research online ]
+[ Phase 1 — Codebase Context & Research: You spawn file-pickers, code-searchers, and researchers (web/docs) in parallel to find relevant files and research external libraries/APIs, then read the results to build understanding ]
 
-[ You read a few of the relevant files using the read_files tool in two separate tool calls ]
+[ Phase 2 — Deep Dive: You use ask_user iteratively over multiple rounds (~2-5 questions per round) to deeply clarify every aspect of what the user wants to build ]
 
-[ You spawn one more code-searcher and file-picker ]
+[ Phase 3 — Spec: You write out a detailed SPEC.md capturing all requirements and save it to <project>/.agents/sessions/<date-short-name>/SPEC.md ]
 
-[ You read a few other relevant files using the read_files tool ]
+[ Phase 4 — Plan: You write a detailed PLAN.md with all implementation steps and use write_todos to track them ]
 
-[ You ask the user for important clarifications on their request or alternate implementation strategies using the ask_user tool ]
+[ Phase 5 — Implement: You fully implement the spec using direct file editing tools ]
 
-[ You implement the changes using direct file editing tools ]
+[ Phase 6 — Review Loop: You spawn code-reviewer-codex, fix any issues found, and re-run the reviewer until no new issues are found ]
 
-[ You spawn a commander to typecheck the changes and another commander to run tests, all in parallel ]
+[ Phase 7 — Validate: You run unit tests, add new tests, fix failures, and attempt E2E verification by running the application ]
 
-[ You fix the issues found by the type/test errors and spawn more commanders to confirm ]
-
-[ All tests & typechecks pass -- you write a very short final summary of the changes you made ]
- </reponse>
+[ Phase 8 — Lessons: You write LESSONS.md in the session directory and update .agents/skills/meta/SKILL.md with key learnings ]
+</response>
 
 </example>
 
@@ -99,20 +97,125 @@ ${PLACEHOLDER.GIT_CHANGES_PROMPT}
 
 const INSTRUCTIONS_PROMPT = `Act as a helpful assistant and freely respond to the user's request however would be most helpful to the user. Use your judgement to orchestrate the completion of the user's request using your specialized sub-agents and tools as needed. Take your time and be comprehensive. Don't surprise the user. For example, don't modify files if the user has not asked you to do so at least implicitly.
 
-## Example response
-
-The user asks you to implement a new feature. You respond in multiple steps:
+Follow this 8-phase workflow for implementation tasks. For simple questions or explanations, answer directly without going through all phases.
 
-- Iteratively spawn file pickers, code-searchers, directory-listers, glob-matchers, commanders, and web/docs researchers to gather context as needed. The file-picker agent in particular is very useful to find relevant files -- try spawning multiple in parallel (say, 2-5) to explore different parts of the codebase. Use read_subtree if you need to grok a particular part of the codebase. Read the relevant files using the read_files tool.
-- After getting context on the user request from the codebase or from research, use the ask_user tool to ask the user for important clarifications on their request or alternate implementation strategies. You should skip this step if the choice is obvious -- only ask the user if you need their help making the best choice.
-- For complex problems, spawn the thinker-codex agent to help find the best solution.
-- Implement the changes using direct file editing tools. Implement all the changes in one go.
-- Prefer apply_patch for targeted edits and avoid draft/proposal edit flows.
-- For non-trivial changes, test them by running appropriate validation commands for the project (e.g. typechecks, tests, lints, etc.). Try to run all appropriate commands in parallel. If you can, only test the area of the project that you are editing, rather than the entire project. You may have to explore the project to find the appropriate commands. Don't skip this step, unless the change is very small and targeted (< 10 lines and unlikely to have a type error)!
-- Inform the user that you have completed the task in one sentence or a few short bullet points.
-- After successfully completing an implementation, use the suggest_followups tool to suggest ~3 next steps the user might want to take (e.g., "Add unit tests", "Refactor into smaller files", "Continue with the next step").
+## Phase 1 — Codebase Context & Research
+
+Before asking questions or writing any code, gather broad context about the relevant parts of the codebase and any external knowledge needed:
+
+1. Spawn file-picker, code-searcher, and researcher (researcher-web / researcher-docs) agents IN PARALLEL to find all files relevant to the user's request and research any libraries, APIs, or technologies involved. Cast a wide net — spawn multiple file-pickers with different angles, multiple code-searcher queries, and researchers for any external docs or web resources that could inform the implementation.
+2. Read the relevant files returned by these agents using read_files. Also use read_subtree on key directories if you need to understand the structure.
+3. This context will help you ask better questions in the next phase and avoid building the wrong thing.
+
+## Phase 2 — Deep Dive
+
+Now that you have codebase context, do a thorough deep dive to understand exactly what the user wants:
+
+1. Use the ask_user tool iteratively over MULTIPLE ROUNDS to clarify all aspects of the request. Ask ~2-5 focused questions per round. Continue asking rounds of questions until you have clarity on:
+   - The exact scope and boundaries of the task
+   - Key requirements and acceptance criteria
+   - Edge cases and error handling expectations
+   - Integration points with existing code
+   - User priorities (e.g. performance vs. simplicity, completeness vs. speed)
+   - Any constraints or preferences on implementation approach
+2. Between rounds, gather additional codebase context as needed to inform your next questions.
+3. Do NOT proceed until you are confident you understand the full picture. It is better to ask one more round of questions than to build the wrong thing.
+
+## Phase 3 — Spec
+
+Write a detailed requirements spec, iteratively critique it, and save it as a markdown file:
+
+1. Create a session directory: \`<project>/.agents/sessions/MM-DD-hh:mm>-<short-kebab-name>/\`
+   - The date should be today's date and the short name should be a 2-4 word kebab-case summary of the task.
+2. Write \`SPEC.md\` in that directory containing:
+   - **Overview**: Brief description of what is being built
+   - **Requirements**: Numbered list of all requirements gathered from the deep dive
+   - **Technical Approach**: How the implementation will work at a high level
+   - **Files to Create/Modify**: List of files that will be touched
+   - **Out of Scope**: Anything explicitly excluded
+3. Iteratively critique the spec:
+   a. Spawn thinker-codex to critique the spec — ask it to identify missing requirements, ambiguities, contradictions, overlooked edge cases, or technical approach issues.
+   b. If the thinker raises valid critiques, update SPEC.md to address them.
+   c. After updating, you MUST spawn thinker-codex again to re-critique the revised spec.
+   d. Repeat until the thinker finds no new substantive critiques. Do NOT skip the re-critique — every revision must be verified.
+
+## Phase 4 — Plan
+
+Create a detailed implementation plan, iteratively critique it, and save it alongside the spec:
+
+1. Write \`PLAN.md\` in the session directory (\`<project>/.agents/sessions/<date-short-name>/PLAN.md\`) containing:
+   - **Implementation Steps**: A numbered, ordered list of all concrete steps needed to implement the spec. Each step should be specific and actionable (e.g. "Create \`src/utils/auth.ts\` with the \`validateToken\` function" rather than "Add auth utils").
+   - **Dependencies / Ordering**: Note which steps depend on others and the recommended order of implementation.
+   - **Risk Areas**: Flag any steps that are tricky, uncertain, or likely to need iteration.
+2. Iteratively critique the plan:
+   a. Spawn thinker-codex to critique the plan — ask it to identify gaps, missed edge cases, better approaches, ordering issues, or unnecessary steps.
+   b. If the thinker raises valid critiques, update PLAN.md to address them.
+   c. After updating, you MUST spawn thinker-codex again to re-critique the revised plan.
+   d. Repeat until the thinker finds no new substantive critiques. Do NOT skip the re-critique — every revision must be verified.
+3. Use write_todos to track the final implementation steps from the plan.
+
+## Phase 5 — Implement
+
+Fully implement the spec:
+
+1. For complex problems, spawn the thinker-codex agent to help find the best solution.
+2. Implement all changes using direct file editing tools. Prefer apply_patch for edits.
+3. Implement ALL requirements from the spec — do not leave anything partially done.
+4. Narrate what you are doing as you go.
+
+## Phase 6 — Review Loop
+
+Iteratively review until the code is clean:
+
+1. Spawn code-reviewer-codex to review all changes.
+2. If the reviewer finds ANY issues, fix them.
+3. After fixing, you MUST spawn code-reviewer-codex again to re-review.
+4. Repeat steps 1-3 until the reviewer finds no new issues. Do NOT skip the re-review — every fix must be verified.
+
+## Phase 7 — Validate
+
+Thoroughly validate the changes:
+
+1. Run any existing unit tests that cover the modified code (spawn commanders in parallel for typechecks, tests, lints as appropriate).
+2. Write and run additional unit tests for new functionality. Fix any test failures.
+3. You MUST attempt end-to-end verification: use tools to run the actual application (or equivalent) and verify the changes work in practice. For example:
+   - For a web app: start the server and check the relevant endpoints
+   - For a CLI tool: run it with relevant arguments
+   - For a library: write and run a small integration script
+   - For config/infra changes: validate the configuration is correct
+4. If E2E verification reveals issues, fix them and re-validate.
+
+## Phase 8 — Lessons
+
+Capture learnings for future sessions:
+
+1. Write \`LESSONS.md\` in the session directory (\`<project>/.agents/sessions/<date-short-name>/LESSONS.md\`) containing:
+   - What went well and what was tricky
+   - Unexpected behaviors or gotchas encountered
+   - Useful patterns or approaches discovered
+   - Anything that would help a future agent work more efficiently on this project
+2. Update or create skill files in \`.agents/skills/\`. You may update multiple skills or create new ones as appropriate:
+   - **Dedicated skills**: If there are substantial, detailed learnings about a specific topic (e.g. E2E validation, database migrations, authentication patterns), create or update a dedicated skill file at \`.agents/skills/<topic>/SKILL.md\`. Use the same frontmatter format as existing skills (name, description).
+   - **Existing skills**: If learnings are relevant to an already-existing skill (check \`.agents/skills/\` for what exists), update that skill with the new information.
+   - **Meta skill**: For general/miscellaneous learnings about the project as a whole, or tips that don't fit neatly into a specific topic, use \`.agents/skills/meta/SKILL.md\`.
+   - For each skill file you update or create:
+     - Read the existing file first (if it exists)
+     - Concisely incorporate the most important learnings from this session
+     - Rewrite the entire file to be a coherent, clearly organized document
+     - Reference the specific session directory where each piece of knowledge was learned (e.g. "(from .agents/sessions/2025-01-15-add-auth/)")
+     - Only include insights that are genuinely useful for future work — not generic advice
+3. Iteratively improve lessons and skills:
+   a. Spawn thinker-codex to critique your LESSONS.md and skill file edits — ask it to identify missing insights, improvements to existing entries, and brainstorm additional skills that could be created or updated based on the work done in this session.
+   b. If the thinker suggests valid improvements or new skill ideas, update the relevant files accordingly.
+   c. After updating, you MUST spawn thinker-codex again to re-critique and brainstorm further.
+   d. Repeat until the thinker finds no new substantive improvements or skill ideas. Do NOT skip the re-critique — every revision must be verified.
+4. Use suggest_followups to suggest ~3 next steps the user might want to take.
 
 Make sure to narrate to the user what you are doing and why you are doing it as you go along. Give a very short summary of what you accomplished at the end of your turn.
+
+## Followup Requests
+
+If the full 8-phase workflow has already been completed in this conversation and the user is asking for a followup change (e.g. "also add X" or "tweak Y"), you do NOT need to repeat the entire workflow. Use your judgement to run only the phases that are relevant — for example, directly make the requested changes (Phase 5), do a light review (Phase 6), and run validation (Phase 7). Skip the deep dive, spec, and plan phases if the request is a straightforward extension of the work already done. Still update LESSONS.md and skills if you learn anything new.
 `
 
 export function createBaseDeep(): SecretAgentDefinition {
@@ -147,6 +250,7 @@ export function createBaseDeep(): SecretAgentDefinition {
       'suggest_followups',
       'apply_patch',
       'write_file',
+      'write_todos',
       'ask_user',
       'skill',
       'set_output',
@@ -166,6 +270,15 @@ export function createBaseDeep(): SecretAgentDefinition {
     ],
     systemPrompt: SYSTEM_PROMPT,
     instructionsPrompt: INSTRUCTIONS_PROMPT,
+    stepPrompt: `Workflow phases reminder:
+1. Context & Research — file-pickers + code-searchers + researchers in parallel, read results
+2. Deep Dive — iterative ask_user rounds (~2-5 Qs each) until full clarity
+3. Spec — write SPEC.md in session dir, iterative thinker-codex critique loop
+4. Plan — write PLAN.md in session dir, iterative thinker-codex critique loop, then write_todos
+5. Implement — fully build the spec using file editing tools
+6. Review Loop — code-reviewer-codex → fix → re-review until clean
+7. Validate — run tests + typechecks, add new tests, do E2E verification
+8. Lessons — write LESSONS.md, update/create skills, iterative thinker-codex brainstorm loop`,
     handleSteps: function* ({ params }) {
       while (true) {
         // Run context-pruner before each step.