diff --git a/.prettierignore b/.prettierignore index 1b21864..27a3237 100644 --- a/.prettierignore +++ b/.prettierignore @@ -11,6 +11,7 @@ build # Cache .cache +.mypy_cache *.tsbuildinfo # Logs diff --git a/CLAUDE.md b/CLAUDE.md index f5ec408..8f40506 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -268,8 +268,6 @@ export default async function PostPage({ params }) { Keep this document up to date as the source of truth for how this blog is structured and extended. - - # Thoughts: -All thoughs (even in sub-projects) should be in thoughts/shared/ (root), not in the sub projects. \ No newline at end of file +All thoughs (even in sub-projects) should be in thoughts/shared/ (root), not in the sub projects. diff --git a/TONE_OF_VOICE.md b/TONE_OF_VOICE.md new file mode 100644 index 0000000..9fc8db3 --- /dev/null +++ b/TONE_OF_VOICE.md @@ -0,0 +1,294 @@ +# Tone of Voice Guide — addcommitpush.io + +This document captures the writing voice and style for blog posts on addcommitpush.io. Use it as a reference when writing new content. + +--- + +## Core Voice Attributes + +### 1. Personal & First-Person + +Write from direct experience. Use "I" liberally. Share what you've actually built, failed at, and learned. + +**Do:** + +> "As someone who's worked as a data lead or head of engineering for the past years, I've learned a lot about finding and hiring great engineers." + +> "In September 2018 I joined Debricked as the very interesting combination of a Machine Learning Engineer and Sales with the perception that the SaaS product was 'done' and ready to take on the world. We just needed to add some more ML-enabled capabilities and off we go! We were wrong.. so so wrong." + +**Don't:** + +> "Organizations should consider the following best practices when hiring engineers..." + +### 2. Honest About Mistakes + +Openly admit when you were wrong. This builds credibility and makes lessons more impactful. + +**Do:** + +> "We were wrong.. so so wrong." + +> "If I got to re-live the beginning of our endeavor, I would have pruned these dimensions A LOT from the start." + +> "In the beginning, we did not do this right." + +**Don't:** + +> "After careful analysis, we pivoted our strategy to better align with market conditions." + +### 3. Humble but Opinionated + +Take strong stances, but acknowledge you're still learning. Never position yourself as having all the answers. + +**Do:** + +> "I feel like Emil 2.0... It also feels like I'm level 2 out of 100, I wonder what the next thing could be ;)" + +> "In my opinion, the best indicator of a good hire is a successful result from the case..." + +> "This is at least the state late 2025, in a year or so... who knows." + +**Don't:** + +> "As industry experts, we recommend the following definitive approach..." + +### 4. Give Credit Openly + +When ideas aren't yours, say so explicitly. Link to sources. This shows intellectual honesty. + +**Do:** + +> "But first, these are not my ideas! I stole with pride! This approach was pioneered by Dex and Vaibhav from AI That Works..." + +**Don't:** + +> Take credit for frameworks or ideas developed by others. + +--- + +## Structural Patterns + +### Start with Context + +Open with who you are and why you're qualified to write about this. Ground the post in real experience. + +``` +As someone who's worked as a data lead or head of engineering for the past years, I've learned a lot about finding and hiring great engineers. +``` + +### State What You'll Cover + +Give readers a preview. Use a simple list. + +``` +I'll try to reason about: +- Knowing that the unknowns are unknown +- Capability to deliver vs. delivering +- Measure, Talk, Build, Repeat +``` + +### TL;DR and Key Takeaways + +Include summary sections for scanners. Place them at the start OR end (not both). + +``` +TL;DR: Start with a noisy draft from model knowledge only. Denoise through iterative web research. Stop when findings—not the draft—are complete. +``` + +### "Who is this for?" Section + +Define your audience explicitly. Be specific about their experience level. + +``` +Who is this article for? +You have already spent more on AI coding tools the last 6 months than any other tools in your 10-year coding career. +``` + +--- + +## Language & Vocabulary + +### Use Metaphors and Vivid Language + +Make abstract concepts tangible through imagery. + +**Examples:** + +- "throwing spaghetti [and my head] against the wall to see what sticks" +- "pushing the Great Wall of China 0.01 centimeters a day" +- "polishing a stone" +- "spinning plates" + +### Casual Markers + +Sprinkle in conversational phrases that feel natural, not forced: + +- "so so wrong" +- "do it!" +- "that's ideal!" +- "who knows" +- "My point is..." +- "I stole with pride!" + +### Technical Precision Where It Matters + +When explaining technical concepts, be precise. Use code blocks, terminal examples, and exact terminology. + +``` +D₁ = U(D₀, R(D₀)) +D₂ = U(D₁, R(D₁)) +``` + +### Avoid Corporate Speak + +Never use: + +- "leverage synergies" +- "best-in-class" +- "thought leadership" +- "at the end of the day" +- "moving forward" +- "stakeholders" + +--- + +## Sentence & Paragraph Structure + +### Short Paragraphs + +2-4 sentences max. One idea per paragraph. Let readers breathe. + +### Rhetorical Questions + +Use sparingly to invite reflection: + +> "What if your innovation is not incremental?" + +> "Do they get excited when I say we use graph databases?" + +### Direct Address + +Speak to the reader occasionally: + +> "I urge you to click the links, and read through these commands." + +> "If you got this far in my post, I'm inclined to believe that you want to start your own company someday... do it!" + +### Parenthetical Asides + +Add context or wit in parentheses: + +> "the very interesting combination of a Machine Learning Engineer and Sales" + +> "(Prevent excessive searching)" + +--- + +## Formatting Conventions + +### Lists for Actionable Items + +Use bullet lists for: + +- Things to look for +- Steps in a process +- Comparisons + +### Tables for Comparisons + +When comparing concepts, tools, or approaches, use tables: + +| Classical Diffusion | Research Diffusion | +| ------------------- | ------------------ | +| Random noise | Initial draft | + +### Blockquotes for External Voices + +Use blockquotes when citing others or emphasizing key insights: + +> "The iterative nature of diffusion models naturally mirrors how humans actually conduct research—cycles of searching, reasoning, and revision." +> — Google Research + +### Code Blocks for Commands and Code + +Use terminal/code blocks liberally for anything executable: + +``` +/research_codebase +``` + +--- + +## Emotional Register + +### Enthusiasm Without Excess + +Show genuine excitement, but don't overdo punctuation or superlatives. + +**Good:** + +> "It was the most thrilling experience in my professional life so far" + +**Too much:** + +> "This is AMAZING!!! The BEST thing EVER!!!" + +### Vulnerability + +Share struggles and uncertainties: + +> "This is something that we are still struggling with, as it's hard to ask for (and get) time from someone that had a bad experience with your tool." + +### Encouragement + +End on forward-looking, encouraging notes: + +> "If you got this far in my post, I'm inclined to believe that you want to start your own company someday... do it!" + +--- + +## What NOT to Do + +1. **Don't write in third person** — "The author believes..." Never. + +2. **Don't hedge excessively** — Say what you think. "I think X" not "It could potentially be argued that X might be..." + +3. **Don't be preachy** — Share experiences, don't lecture. + +4. **Don't use emojis** — Unless explicitly requested. + +5. **Don't pad content** — If you've made your point, stop. + +6. **Don't over-explain** — Trust the reader's intelligence. + +7. **Don't use "we" when you mean "I"** — Be specific about whose experience this is. + +--- + +## Reference: Characteristic Phrases + +These phrases capture the voice. Use similar constructions: + +- "I look for three things..." +- "My point is..." +- "I've learned a thing or two about..." +- "With the hindsight hat on..." +- "It turned out that..." +- "To make X a part of our culture, we implemented..." +- "This maybe takes out a bit of the old 'fun' about developing, but I'm mostly excited about..." +- "In my experience..." +- "The key insight is that..." +- "I'll update this blog post when I have more to give in this area!" + +--- + +## Quick Checklist Before Publishing + +- [ ] Does it open with personal context or experience? +- [ ] Is there a clear "what you'll learn" section? +- [ ] Have I admitted at least one mistake or uncertainty? +- [ ] Are technical concepts explained with concrete examples? +- [ ] Have I credited all external sources and ideas? +- [ ] Are paragraphs short (2-4 sentences)? +- [ ] Does it end with forward-looking encouragement or practical next steps? +- [ ] Would I actually say this out loud to a colleague? diff --git a/app/blog/[slug]/page.tsx b/app/blog/[slug]/page.tsx index 70dde88..3172297 100644 --- a/app/blog/[slug]/page.tsx +++ b/app/blog/[slug]/page.tsx @@ -65,6 +65,11 @@ export default async function BlogPostPage({ params }: { params: Promise<{ slug: await import('@/components/blog-posts/saas-zero-to-one-hindsight'); return ; } + case 'diffusion-deep-research': { + const { DiffusionDeepResearchContent } = + await import('@/components/blog-posts/diffusion-deep-research'); + return ; + } case 'context-engineering-claude-code': { const { ContextEngineeringClaudeCodeContent } = await import('@/components/blog-posts/context-engineering-claude-code'); diff --git a/components/animations/diffusion/diffusion-loop-step.tsx b/components/animations/diffusion/diffusion-loop-step.tsx new file mode 100644 index 0000000..e40d815 --- /dev/null +++ b/components/animations/diffusion/diffusion-loop-step.tsx @@ -0,0 +1,79 @@ +'use client'; + +import { cn } from '@/lib/utils'; +import { Terminal } from '@/components/custom'; + +interface DiffusionLoopStepProps { + className?: string; +} + +export function DiffusionLoopStep({ className }: DiffusionLoopStepProps) { + return ( +
+

Diffusion Loop

+

One iteration (practical walkthrough)

+ +

Step 1: Generate research questions

+

+ Tool: think. Identify draft gaps and propose diverse research questions tied to + those gaps. +

+ + {`think: + reflection: | + Uptime claims are vague; need Cloudflare outage history and SLA terms (2023–2025). + Compare public incident reports vs. status page claims.`} + +

Expected: 3–5 targeted questions, each mapped to a draft gap with scope/priority notes.

+ +

Step 2: ConductResearch (parallel)

+

+ Tool: ConductResearch. Delegate distinct questions to sub-agents with explicit + instructions and expected returns. +

+ + {`ConductResearch: + research_topic: | + Collect primary sources on Cloudflare outages and SLA/uptime guarantees (2023–2025). + Return: URLs, outage timelines, SLA terms, and any compensations offered.`} + +

Expected: cited findings (URLs + quotes) per sub-agent, deduped URLs, short summaries.

+ +

Step 3: refine_draft_report

+

+ Tool: refine_draft_report. Fold new findings into the draft; keep structure + concise to conserve context. +

+ + {`refine_draft_report: + research_brief: "" + findings: "" + draft_report: ""`} + +

+ Expected: draft updated with citations/quotes; bullets or short paragraphs retained for + clarity and context efficiency. +

+ +

Step 4: Assess completeness

+

Heuristic: diverse new searches should stop yielding new facts. If not, loop again.

+ + {`Checklist: +- New queries tried? (global + section-specific) +- Any new sources or facts? If yes, continue loop +- If no, call ResearchComplete`} + +

+ Expected: a clear decision to continue or call ResearchComplete, with rationale + noted. +

+
+ ); +} diff --git a/components/animations/diffusion/diffusion-overview.tsx b/components/animations/diffusion/diffusion-overview.tsx new file mode 100644 index 0000000..8293e54 --- /dev/null +++ b/components/animations/diffusion/diffusion-overview.tsx @@ -0,0 +1,160 @@ +'use client'; + +import { useEffect, useRef, useState } from 'react'; +import { motion, useInView } from 'framer-motion'; +import { cn } from '@/lib/utils'; +import { FileText, FilePenLine, Repeat2, FileCheck2, type LucideIcon } from 'lucide-react'; + +interface DiffusionOverviewProps { + className?: string; +} + +const diffusionLoopStages = [ + 'Identify Gaps → ask research questions', + 'Conduct Research in parallel + citations', + 'Refine Draft Report → assess completeness', +]; + +// Per-phase dwell times (ms): brief, initial draft, diffusion loop (slower), final report (faster) +const phaseDurations = [3200, 3200, 8000, 2400]; +const loopStageDuration = 2500; // slower so all three loop stages are visible + +const phases: { label: string; icon: LucideIcon; text: string; isLoop?: boolean }[] = [ + { + label: 'Brief Generation', + icon: FileText, + text: 'Expands the user query into a structured research brief with sources, constraints, and scope.', + }, + { + label: 'Initial Draft', + icon: FilePenLine, + text: 'Creates a noisy draft from model knowledge only—no external facts yet, just structure and placeholders.', + }, + { + label: 'Diffusion Loop', + icon: Repeat2, + text: diffusionLoopStages[0], + isLoop: true, + }, + { + label: 'Final Report', + icon: FileCheck2, + text: 'Apply Insightfulness + Helpfulness rules, clean citations, and finalize into a benchmark-ready report.', + }, +]; + +export function DiffusionOverview({ className }: DiffusionOverviewProps) { + const ref = useRef(null); + const isInView = useInView(ref, { margin: '-20% 0px -20% 0px', amount: 0.3 }); + const [index, setIndex] = useState(0); + const [charIndex, setCharIndex] = useState(0); + const [loopStep, setLoopStep] = useState(0); + + // Phase advance with custom dwell times per phase + useEffect(() => { + if (!isInView) return; + const duration = phaseDurations[index] ?? 3200; + const id = setTimeout(() => { + setIndex((prev) => (prev + 1) % phases.length); + setCharIndex(0); + setLoopStep(0); + }, duration); + return () => clearTimeout(id); + }, [isInView, index]); + + const isLoopPhase = phases[index]?.isLoop; + + // Loop sub-steps advance slower than phase change so all three are visible + useEffect(() => { + if (!isLoopPhase) { + // eslint-disable-next-line react-hooks/set-state-in-effect + setLoopStep(0); + } + }, [isLoopPhase]); + + useEffect(() => { + if (!isInView || !isLoopPhase) return; + const id = setInterval(() => { + setLoopStep((prev) => { + const next = (prev + 1) % diffusionLoopStages.length; + setCharIndex(0); + return next; + }); + }, loopStageDuration); + return () => clearInterval(id); + }, [isInView, isLoopPhase]); + + useEffect(() => { + const active = phases[index]; + const activeText = active.isLoop ? diffusionLoopStages[loopStep] : active.text; + const id = setInterval(() => { + setCharIndex((p) => (p >= activeText.length ? activeText.length : p + 3)); + }, 35); + return () => clearInterval(id); + }, [index, loopStep]); + + return ( +
+
+
+

Diffusion Overview

+

4-phase pipeline

+
+
+ +
+ {phases.map(({ label, icon: Icon, text, isLoop }, i) => { + const active = i === index; + const content = isLoop ? diffusionLoopStages[loopStep] : text; + const streamed = + active && charIndex > 0 + ? content.slice(0, charIndex) + : content.slice(0, 100) + (content.length > 100 ? '…' : ''); + return ( + +
+ + {label} +
+
+ {streamed} + {active && charIndex < text.length && } +
+
+ ); + })} +
+ + + progress + +
+ ); +} diff --git a/components/animations/diffusion/draft-denoising.tsx b/components/animations/diffusion/draft-denoising.tsx new file mode 100644 index 0000000..76d74b9 --- /dev/null +++ b/components/animations/diffusion/draft-denoising.tsx @@ -0,0 +1,138 @@ +'use client'; + +import { useEffect, useRef, useState } from 'react'; +import { motion, useInView } from 'framer-motion'; +import { cn } from '@/lib/utils'; +import { FilePenLine, FileCheck2 } from 'lucide-react'; + +const stages = [ + { + label: 'Bullets', + render: ( +
    +
  • Compare OpenAI, Anthropic, DeepMind safety pillars
  • +
  • Pull 3–5 primary sources (2023–2025)
  • +
+ ), + }, + { + label: 'Masked draft', + render: ( +

+ The report covers [pillars] across labs, + highlighting [methods] with citations to + [sources]. +

+ ), + }, + { + label: 'Refined text', + render: ( +

+ OpenAI: RLHF + eval gates. Anthropic: Constitutional AI + red-team. DeepMind: + interpretability + strict evals. Cited incidents and mitigations mapped to primary URLs. +

+ ), + }, +]; + +interface DraftDenoisingProps { + className?: string; +} + +export function DraftDenoising({ className }: DraftDenoisingProps) { + const ref = useRef(null); + const isInView = useInView(ref, { amount: 0.35 }); + const [iteration, setIteration] = useState(1); + + useEffect(() => { + if (!isInView) return; + const isAtEnd = iteration >= 15; + const delay = isAtEnd ? 5000 : 700; // 5s hold on 15/15 before restarting + + const id = setTimeout(() => setIteration((prev) => (prev >= 15 ? 1 : prev + 1)), delay); + return () => clearTimeout(id); + }, [isInView, iteration]); + + const progress = Math.min(iteration / 15, 1); + const stageIndex = Math.min(2, Math.floor((iteration - 1) / 5)); // 1-5, 6-10, 11-15 + const stage = stages[stageIndex]; + + return ( +
+
+
+

Draft Denoising

+

Noisy → clean report

+
+
+ +
+ +

+ + Draft (iteration {iteration || 1}) +

+
+
+ {stage.label} +
+ + {stage.render} + +
+
+ + +

+ + Refined report +

+
+

+ The report converges toward a comprehensive, insight-rich, and readable deliverable + with clean citations that pass the FACT evaluation. +

+
+
+
+ +
+
+ Iteration {iteration || 1} / 15 + {Math.round(progress * 100)}% denoised +
+
+ +
+
+
+ ); +} diff --git a/components/animations/diffusion/index.ts b/components/animations/diffusion/index.ts new file mode 100644 index 0000000..e050531 --- /dev/null +++ b/components/animations/diffusion/index.ts @@ -0,0 +1,6 @@ +export { DiffusionOverview } from './diffusion-overview'; +export { DraftDenoising } from './draft-denoising'; +export { ParallelAgents } from './parallel-agents'; +export { TwoStageGap } from './two-stage-gap'; +export { RACEMetrics } from './race-metrics'; +export { DiffusionLoopStep } from './diffusion-loop-step'; diff --git a/components/animations/diffusion/parallel-agents.tsx b/components/animations/diffusion/parallel-agents.tsx new file mode 100644 index 0000000..2ef8bba --- /dev/null +++ b/components/animations/diffusion/parallel-agents.tsx @@ -0,0 +1,374 @@ +'use client'; + +import { useEffect, useRef, useState } from 'react'; +import { motion, useInView } from 'framer-motion'; +import { cn } from '@/lib/utils'; +import { Users, Search, ArrowDown, FilePenLine, CheckCircle2, Repeat2 } from 'lucide-react'; + +interface ParallelAgentsProps { + className?: string; +} + +const agents = [ + { + name: 'Sub-Agent 1', + focus: 'Global or section-level query', + topic: 'Topic A', + }, + { + name: 'Sub-Agent 2', + focus: 'Section-specific deep dive', + topic: 'Topic B', + }, + { + name: 'Sub-Agent 3', + focus: 'Comparative or incident-focused', + topic: 'Topic C', + }, +]; + +export function ParallelAgents({ className }: ParallelAgentsProps) { + const ref = useRef(null); + const isInView = useInView(ref, { margin: '-10% 0px -10% 0px', amount: 0.2 }); + const [stage, setStage] = useState<'assign' | 'research' | 'collect' | 'refine' | 'decide'>( + 'assign' + ); + const lastStageRef = useRef('assign'); + const startRef = useRef(null); + const rafRef = useRef(null); + + // Fixed-duration animation timeline using requestAnimationFrame to avoid drift/jitter. + useEffect(() => { + const timeline: Array<{ stage: typeof stage; duration: number }> = [ + { stage: 'assign', duration: 2000 }, + { stage: 'research', duration: 2800 }, + { stage: 'collect', duration: 1800 }, + { stage: 'refine', duration: 2200 }, + { stage: 'decide', duration: 1800 }, + ]; + const totalDuration = timeline.reduce((acc, item) => acc + item.duration, 0); + + const stop = () => { + if (rafRef.current !== null) { + cancelAnimationFrame(rafRef.current); + rafRef.current = null; + } + }; + + if (!isInView) { + startRef.current = null; + stop(); + return; + } + + startRef.current = performance.now(); + + const tick = () => { + if (startRef.current === null) return; + const now = performance.now(); + const elapsed = (now - startRef.current) % totalDuration; + + let accumulated = 0; + let nextStage: typeof stage = timeline[timeline.length - 1].stage; + for (const item of timeline) { + accumulated += item.duration; + if (elapsed < accumulated) { + nextStage = item.stage; + break; + } + } + + if (nextStage !== lastStageRef.current) { + lastStageRef.current = nextStage; + setStage(nextStage); + } + + rafRef.current = requestAnimationFrame(tick); + }; + + rafRef.current = requestAnimationFrame(tick); + + return () => { + stop(); + }; + }, [isInView]); + + // Helper functions for opacity/visibility states + const getSupervisorOpacity = () => { + if (stage === 'assign') return 1; + if (stage === 'research') return 0.8; + return 0.7; + }; + + const getSupervisorScale = () => { + if (stage === 'assign') return 1; + return 0.98; + }; + + const getSubAgentOpacity = () => { + if (stage === 'assign') return 0.4; + if (stage === 'research') return 1; + if (stage === 'collect' || stage === 'refine' || stage === 'decide') return 0.8; + return 0.4; + }; + + const getSubAgentScale = () => { + if (stage === 'research') return 1.02; + return 1; + }; + + const getSubAgentBorder = () => { + if (stage === 'research') return 'border-primary/80 bg-primary/5 shadow-lg shadow-primary/20'; + if (stage === 'collect' || stage === 'refine' || stage === 'decide') + return 'border-green-500/40 bg-green-500/5'; + return 'border-border/70 bg-background/70'; + }; + + const getSubAgentIconColor = () => { + if (stage === 'research') return 'text-primary'; + if (stage === 'collect' || stage === 'refine' || stage === 'decide') return 'text-green-500'; + return 'text-muted-foreground'; + }; + + const getResearchingOpacity = () => { + return stage === 'research' ? 1 : 0; + }; + + const getFindingsReturnedOpacity = () => { + return stage === 'collect' || stage === 'refine' || stage === 'decide' ? 1 : 0; + }; + + const getConvergingArrowsOpacity = () => { + if (stage === 'assign' || stage === 'research') return 0.3; + if (stage === 'collect' || stage === 'refine' || stage === 'decide') return 1; + return 0.3; + }; + + const getRefineBoxOpacity = () => { + if (stage === 'assign' || stage === 'research' || stage === 'collect') return 0.3; + if (stage === 'refine' || stage === 'decide') return 1; + return 0.3; + }; + + const getRefineBoxScale = () => { + if (stage === 'refine') return 1.02; + return 1; + }; + + const getRefineBoxBorder = () => { + if (stage === 'refine') return 'border-primary/60 bg-primary/5 shadow-lg shadow-primary/20'; + return 'border-border/70 bg-background/70'; + }; + + const getDecisionOpacity = () => { + if (stage === 'decide') return 1; + return 0.3; + }; + + return ( +
+
+
+

Parallel Sub-Agents

+

Supervisor coordinates up to 3 research threads

+
+
+ + {/* Flow Diagram */} +
+ {/* Supervisor */} +
+ + +
Supervisor
+

+ {stage === 'assign' && 'Assigning distinct questions...'} + {stage === 'research' && 'Monitoring parallel research...'} + {stage === 'collect' && 'Collecting findings...'} + {stage === 'refine' && 'Refining draft...'} + {stage === 'decide' && 'Assessing completeness...'} +

+
+
+ + {/* Parallel Sub-Agents */} +
+ {agents.map((agent, idx) => { + return ( +
+ {/* Sub-agent box */} + +

+ + {agent.name} +

+

+ Topic: {agent.topic} +

+

+ Focus: {agent.focus} +

+ + {/* Status messages container - always render, control opacity */} +
+ {/* Researching status */} + 0 ? 'auto' : 'none' }} + > +
+ + + + Researching... +
+
+ + {/* Findings returned status */} + 0 ? 'auto' : 'none' }} + > +
+ + Findings returned +
+
+
+
+ + {/* Arrow below center sub-agent only - always render */} + {idx === 1 && ( + + + + )} +
+ ); + })} +
+ + {/* Refine Draft Report - always render */} + 0.5 ? 0 : 10, + }} + transition={{ duration: 0.4 }} + className="flex justify-center" + > + + +
refine_draft_report
+

+ {stage === 'refine' + ? 'Incorporating findings with citations...' + : 'Draft updated with citations'} +

+
+
+ + {/* Decision point - always render */} + 0.5 ? 0 : 10, + }} + transition={{ duration: 0.4 }} + className="flex justify-center items-center" + > +
+
+ + Continue loop or + + ResearchComplete? +
+
+
+
+ + {/* Legend */} +
+

How it works:

+
+
+
+
+ 1. Assign: Supervisor generates research + questions and delegates to sub-agents (max 3 parallel) +
+
+
+
+
+ 2. Research: Sub-agents work independently with + isolated contexts, return compressed findings +
+
+
+
+
+ 3. Refine: Findings converge, draft updated with + citations, completeness assessed +
+
+
+
+
+ ); +} diff --git a/components/animations/diffusion/race-metrics.tsx b/components/animations/diffusion/race-metrics.tsx new file mode 100644 index 0000000..8a9f1ca --- /dev/null +++ b/components/animations/diffusion/race-metrics.tsx @@ -0,0 +1,78 @@ +'use client'; + +import { cn } from '@/lib/utils'; +import { Trophy } from 'lucide-react'; + +interface RACEMetricsProps { + className?: string; +} + +const metrics = [ + { key: 'Comprehensiveness', think: 52.03, gemini: 50.5, openai: 49.29, claude: 48.36 }, + { key: 'Insight', think: 53.94, gemini: 51.62, openai: 48.94, claude: 48.79 }, + { key: 'Instruction Following', think: 52.07, gemini: 51.07, openai: 50.67, claude: 49.67 }, + { key: 'Readability', think: 50.44, gemini: 50.22, openai: 48.82, claude: 48.31 }, +]; + +const maxScore = 55; + +export function RACEMetrics({ className }: RACEMetricsProps) { + return ( +
+
+
+

+ + RACE Metrics +

+

ThinkDepth.ai vs peers

+
+
+ +
+ {metrics.map(({ key, think, gemini, openai, claude }) => ( +
+
+ {key} + Score +
+
+ {[ + { label: 'ThinkDepth.ai', value: think, color: 'bg-primary' }, + { label: 'Gemini 2.5 Pro Deep Research', value: gemini, color: 'bg-secondary' }, + { label: 'OpenAI Deep Research', value: openai, color: 'bg-foreground/70' }, + { label: 'Claude Research', value: claude, color: 'bg-border' }, + ].map((row) => ( +
+ + {row.label} + +
+
+
+ + {row.value.toFixed(2)} + +
+ ))} +
+
+ ))} +
+ +
+

+ Note: At this time, Trivy tops this benchmark with a secret model. +

+
+
+ ); +} diff --git a/components/animations/diffusion/two-stage-gap.tsx b/components/animations/diffusion/two-stage-gap.tsx new file mode 100644 index 0000000..8e43e8b --- /dev/null +++ b/components/animations/diffusion/two-stage-gap.tsx @@ -0,0 +1,71 @@ +'use client'; + +import { cn } from '@/lib/utils'; +import { Search, Type } from 'lucide-react'; + +interface TwoStageGapProps { + className?: string; +} + +export function TwoStageGap({ className }: TwoStageGapProps) { + return ( +
+
+
+

Self-Balancing

+

Information gap → Generation gap

+
+
+ +
+
+
+ + Stage 1: Information Gap (what we collect) +
+
+

Outputs

+
    +
  • + Top sources (3–5): OpenAI system card, Anthropic Constitutional AI, DeepMind eval + blogs. +
  • +
  • Extracted facts: eval gates, red-team cadence, 2023–2025 incident summaries.
  • +
  • Inline quotes + URLs; duplicates removed.
  • +
+

+ Goal: close evidence gaps with primary sources before any polish. +

+
+
+ +
+
+ + Stage 2: Generation Gap (what you read) +
+
+

Outputs

+
    +
  • + Narrative: safety pillars per lab with inline citations; incidents + mitigations. +
  • +
  • Table: Lab vs eval gates vs red-team cadence vs interpretability depth.
  • +
  • + Clarity pass: removes repetition, smooth flow, instruction-following guaranteed. +
  • +
+

+ Goal: readable, insightful synthesis once facts are locked. +

+
+
+
+
+ ); +} diff --git a/components/blog-posts/context-engineering-claude-code.tsx b/components/blog-posts/context-engineering-claude-code.tsx index d8a2729..aac502c 100644 --- a/components/blog-posts/context-engineering-claude-code.tsx +++ b/components/blog-posts/context-engineering-claude-code.tsx @@ -376,8 +376,8 @@ export function ContextEngineeringClaudeCodeContent() { {`/research_codebase`}

- The command will prompt you with "what would you like to research?" Provide a detailed - prompt like: + The command will prompt you with "what would you like to research?" Provide a + detailed prompt like:

@@ -392,15 +392,15 @@ Follow @frontend/ARCHITECTURE.md guidelines and patterns.`}

This generates a research document with file references, target architecture, and - critically—a "What not to do" section that helps guide Claude in the right direction - without detours. + critically—a "What not to do" section that helps guide Claude in the right + direction without detours.

Important: Review the research document closely. Check if it found all relevant files, if the target architecture looks reasonable, and if you agree with the - "what not to do" section. In about 50% of cases, I edit these sections manually using - Cursor with a powerful model or by editing the file directly. + "what not to do" section. In about 50% of cases, I edit these sections manually + using Cursor with a powerful model or by editing the file directly.

@@ -420,9 +420,10 @@ Follow @frontend/ARCHITECTURE.md guidelines and patterns.`}

- Additional instructions might include: "make it 4 phases", "make sure to add e2e tests in - the frontend to the plan", etc. You can also add "think deeply" for higher accuracy (but - avoid "ultrathink"—it's a token burner that uses the main context to explore). + Additional instructions might include: "make it 4 phases", "make sure to + add e2e tests in the frontend to the plan", etc. You can also add "think + deeply" for higher accuracy (but avoid "ultrathink"—it's a token + burner that uses the main context to explore).

@@ -462,9 +463,9 @@ Follow @frontend/ARCHITECTURE.md guidelines and patterns.`}

Repeat this loop for each phase until all phases are complete, then run a final validation on the full plan. I typically review the code between iterations to ensure it makes sense - and guide the AI if needed. Aim for "working software" in each phase—tests should pass and - there should be no lint errors. The validation step will catch missing interface - implementations and run your linters. + and guide the AI if needed. Aim for "working software" in each phase—tests + should pass and there should be no lint errors. The validation step will catch missing + interface implementations and run your linters.

Git Management @@ -482,36 +483,36 @@ Follow @frontend/ARCHITECTURE.md guidelines and patterns.`}

In my experience, this flow completely 1-shots (after research/plan refinements) 2-5 such - features per day. I run up to 3 in parallel—one "big hairy" problem and two simpler, more - straightforward ones. + features per day. I run up to 3 in parallel—one "big hairy" problem and two + simpler, more straightforward ones.

- In the future, I want to make this a "linear workflow" where humans gather information - into Linear issues (the initial research prompts), and moving issues into different phases - would auto-trigger different steps, creating PRs with research docs, etc. + In the future, I want to make this a "linear workflow" where humans gather + information into Linear issues (the initial research prompts), and moving issues into + different phases would auto-trigger different steps, creating PRs with research docs, etc.

Codebase Requirements

I don't think this will work well in all settings and codebases. The right type of - "mid/mid+" size problems is the right fit. The better your codebase is, the better code AI - will write. Just like in boomer-coding, quality compounds into velocity over time, and - tech debt snowballs to a turd, but with AI the effects of this have increased. Prioritize - solving your tech debt! + "mid/mid+" size problems is the right fit. The better your codebase is, the + better code AI will write. Just like in boomer-coding, quality compounds into velocity + over time, and tech debt snowballs to a turd, but with AI the effects of this have + increased. Prioritize solving your tech debt!


Also, in my experience, language matters... in TS/JS you can loop in 20+ different ways or - chain useEffects in magical ways to create foot-cannons... if Cloudflare can't properly - use useEffect... are you sure our PhD-level next token predictors can? I actually like a - lot of things about TS, but too many variations confuse the AI. In my "big" codebase I'm - working on our backend is built in Go, and Claude/Cursor are simply fantastic there! - Simplicity = clarity = less hallucination = higher velocity. This is at least the state - late 2025, in a year or so... who knows. + chain useEffects in magical ways to create foot-cannons... if Cloudflare can't + properly use useEffect... are you sure our PhD-level next token predictors can? I actually + like a lot of things about TS, but too many variations confuse the AI. In my + "big" codebase I'm working on our backend is built in Go, and Claude/Cursor + are simply fantastic there! Simplicity = clarity = less hallucination = higher velocity. + This is at least the state late 2025, in a year or so... who knows.

TL;DR @@ -521,9 +522,9 @@ Follow @frontend/ARCHITECTURE.md guidelines and patterns.`} proper commands and guidance. Starting with a research and planning phase to create .md files that clearly set HIGH-value context is a great way to get more accurate results from Claude Code. Running multiple SpecDD flows at the same time... like spinning plates, is - the new name of the game in some codebases. This maybe takes out a bit of the old "fun" - about developing, but I'm mostly excited about user value and winning in the market, which - is more fun than polishing a stone. + the new name of the game in some codebases. This maybe takes out a bit of the old + "fun" about developing, but I'm mostly excited about user value and winning + in the market, which is more fun than polishing a stone.

diff --git a/components/blog-posts/diffusion-deep-research.tsx b/components/blog-posts/diffusion-deep-research.tsx new file mode 100644 index 0000000..637c19f --- /dev/null +++ b/components/blog-posts/diffusion-deep-research.tsx @@ -0,0 +1,1512 @@ +'use client'; + +import { + BlogHeading, + BlogList, + BlogListItem, + BlogLink, + Figure, + Callout, + Terminal, + Prompt, +} from '@/components/custom'; +import { + DiffusionOverview, + DraftDenoising, + ParallelAgents, + TwoStageGap, + RACEMetrics, + DiffusionLoopStep, +} from '@/components/animations/diffusion'; +import { Highlight, themes } from 'prism-react-renderer'; + +function GoCode({ code }: { code: string }) { + return ( + + {({ className, style, tokens, getLineProps, getTokenProps }) => ( +
+          {tokens.map((line, lineIndex) => {
+            const lineProps = getLineProps({ line });
+            return (
+              
+ {line.map((token, tokenIndex) => { + const tokenProps = getTokenProps({ token }); + return ; + })} +
+ ); + })} +
+ )} +
+ ); +} + +export function DiffusionDeepResearchContent() { + return ( +
+
+ +
+

+ Remember back in school when you had one of those infamous and dreaded group projects (I + kinda liked them)... +

+

+ At least a few times you probably tried the “parallel” way of working, + optimizing for a bit less collaboration and each participant owning one segment of the + report. Off you go! Each person for themselves writing extensive backgrounds, history, + theory, or whatever segments you decided on. Then you meet up 3 hours before the deadline + to “glue the report” together—how did that turn out? +

+
+

The result was probably:

+ + Repetitive + Inconsistent + Different tone of voice per segment + Vastly different quality per segment + Not the grade you hoped for + +
+

+ It turns out, when we construct our AI research agents like this (plan -> parallel + research -> glue research into report), we get the same problem! When no context of the + “evolving report” is shared across sub-agents, we get a fragmented ball of + mud. +

+

+ These sequential and isolated group projects/research agents have their perks, like high + level of autonomy and parallelism... but there are probably better ways to do it. +

+ + Diffusion Deep Research +

+ Think of diffusion agent models like brainstorming, but instead of everyone writing their + own part in isolation and building a Frankenstein report, the research spreads and + overlaps as it evolves within the team. Ideas for each segment are not isolated, as not + one person owns each segment. +

+

+ The team starts off by writing a draft, only based on their internal knowledge. Typically + in bullet point format with clear notes about missing references, knowledge gaps, outdated + information, and uncertainty. +

+

+ The students prioritize these knowledge gaps together and research different perspectives + of those gaps (in parallel isolation) and add them back to the report in an iterative + manner as a group. Gradually the draft evolves into an iteratively better report, filling + gaps and enriching knowledge. The draft grows to become the final report. In each writing + step, the students have a clear process for transforming rich knowledge into a concise + report that fits into the whole story they are trying to tell. +

+

+ To me, this makes a lot more sense! I'll explore the implementation details of text + diffusion in this blog post. Enjoy! +

+
+ + Why diffusion for research? + + The problem with single-pass research +

+ Traditional AI research agents follow a linear paradigm:{' '} + Query → Search → Synthesize → Report. This suffers from fundamental + limitations: +

+ + + Information Loss: Important context discovered late cannot influence + earlier decisions. + + + No Self-Correction: Errors or gaps in early research propagate to the + final output. + + + Static Search Strategy: The search strategy is fixed at the start and + cannot adapt. + + + Coherence Degradation: Long reports lose coherence when sections are + generated independently. + + + + The diffusion paradigm +

+ Diffusion models, originally developed for image generation, provide an elegant solution. + Instead of generating content in one pass, they start with a{' '} + noisy initial state (random noise for images, rough draft for research) and{' '} + iteratively refine through multiple denoising steps, using{' '} + guidance signals to steer the refinement. +

+ +
+ “The iterative nature of diffusion models naturally mirrors how humans actually + conduct research—cycles of searching, reasoning, and revision.” +
+ + — Google Research, Deep Researcher with Test-Time Diffusion, 2025 + +
+ + + + Core architecture: four phases +

+ The implementation consists of four primary phases, orchestrated through a state machine: +

+ + Phase 1: Research brief generation +

+ Transform the user query into a detailed research brief with sources, constraints, and + scope. This ensures all downstream research is grounded in explicit requirements. +

+ + Phase 2: Initial draft generation +

+ Generate a draft from the LLM's internal knowledge only—no external + information retrieval yet. This is the “noisy” initial state that provides + structure to guide subsequent research. It may contain outdated or incomplete information, + and that's intentional. +

+ + Phase 3: Diffusion loop (supervisor subgraph) +

The core innovation. Each iteration follows four steps:

+ + + Generate research questions to address gaps in the draft + + + Conduct Research: Retrieve external info for “denoising” + + + Refine Draft: Remove “noise” (imprecision, incompleteness) from draft + + + Assess: Are findings comprehensive? (NOT draft appearance, readability vs correctness) + + + + Phase 4: Final report generation +

+ Apply quality optimization with Insightfulness + Helpfulness rules. Deduplicate findings by + URL, add granular breakdowns, detailed mapping tables, nuanced discussion, and proper + citations. +

+ + Core algorithm overview +

+ The core innovation is the Self-Balancing Test-Time Diffusion algorithm, + encoded directly in the supervisor's system prompt. Here is the exact algorithm from + the Go implementation: +

+ +

+ The full diffusion algorithm prompt is available as a collapsible block in the code + walkthrough below. +

+ + + + Theoretical foundations + + Classical diffusion models +

+ In classical diffusion models (DDPM, Stable Diffusion), the process consists of two phases: +

+

+ Forward Diffusion: Gradually add noise to data:{' '} + x₀ → x₁ → x₂ → ... → xₜ (pure noise) +

+

+ Reverse Diffusion: Learn to denoise step by step:{' '} + xₜ → xₜ₋₁ → ... → x₁ → x₀ (clean data) +

+ +
+ +

+ For the readers that have walked the fields of Machine Learning, this feels like an + autoencoder but that goes to complete noise instead of a low-dimensional latent space + representation (that still actually means something). With key differences of course.. (for + another blog post) +

+ + Adaptation to research +

For research report generation, we reinterpret this process:

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
Classical DiffusionResearch Diffusion
Random noise (xₜ)Initial draft from model knowledge
Denoising stepResearch iteration + draft refinement
Guidance signalRetrieved information from web search
Clean output (x₀)Comprehensive, accurate research report
+
+ +

+ The key insight is that the initial draft generated purely from the + LLM's training data represents the “noisy” starting state. Each iteration + of identifying gaps, searching for information, and incorporating findings acts as a{' '} + denoising step that brings the report closer to ground truth. +

+ +

The process terminates when (in priority order):

+ + Gap-closed: Diverse queries yield no new findings. + Iteration cap: Hard stop at 15 supervisor iterations. + + Supervisor override: Allowed only with rationale tied to evidence coverage. + + + + + Guardrails: Require citations for new facts (drop uncited claims), retry + failed tool calls once then mark as a gap, and deduplicate by URL before synthesis. + Completion is about evidence coverage, not draft polish. + + + + + Diffusion loop (core) +

+ A walkthrough of how the supervisor and sub-agents iterate, including prompts, parallel + fan-out, and final synthesis. The code is complementary to the free-text explanations, to + skip or deep-dive into the details as you see fit. +

+ + Phase 1 & 2: Brief and initial draft generation +

+ The entry point is AgentLoop.Research. This function orchestrates all four + phases of the diffusion algorithm. The first two phases are straightforward LLM calls: +

+ + +
+ + Show prompt: Transform to research brief (Phase 1) + +
+ +%s + + +Today's date is %s. + +You will return a single research question that will be used to guide the research. + +Guidelines: +1. Maximize Specificity and Detail +- Include all known user preferences and explicitly list key attributes or dimensions to consider. +- It is important that all details from the user are included in the instructions. + +2. Handle Unstated Dimensions Carefully +- When research quality requires considering additional dimensions that the user hasn't specified, acknowledge them as open considerations rather than assumed preferences. +- Example: Instead of assuming "budget-friendly options," say "consider all price ranges unless cost constraints are specified." +- Only mention dimensions that are genuinely necessary for comprehensive research in that domain. + +3. Avoid Unwarranted Assumptions +- Never invent specific user preferences, constraints, or requirements that weren't stated. +- If the user hasn't provided a particular detail, explicitly note this lack of specification. +- Guide the researcher to treat unspecified aspects as flexible rather than making assumptions. + +4. Distinguish Between Research Scope and User Preferences +- Research scope: What topics/dimensions should be investigated (can be broader than user's explicit mentions) +- User preferences: Specific constraints, requirements, or preferences (must only include what user stated) +- Example: "Research coffee quality factors (including bean sourcing, roasting methods, brewing techniques) for San Francisco coffee shops, with primary focus on taste as specified by the user." + +5. Use the First Person +- Phrase the request from the perspective of the user. + +6. Sources +- If specific sources should be prioritized, specify them in the research question. +- For product and travel research, prefer linking directly to official or primary websites (e.g., official brand sites, manufacturer pages, or reputable e-commerce platforms like Amazon for user reviews) rather than aggregator sites or SEO-heavy blogs. +- For academic or scientific queries, prefer linking directly to the original paper or official journal publication rather than survey papers or secondary summaries. +- For people, try linking directly to their LinkedIn profile, or their personal website if they have one.`} + /> +
+
+ +
+ + Show prompt: Initial draft generation (Phase 2) + +
+ +%s + + +Today's date is %s. + +Please create a detailed answer to the overall research brief that: +1. Is well-organized with proper headings (# for title, ## for sections, ### for subsections) +2. Includes specific facts and insights from the research +3. References relevant sources using [Title](URL) format +4. Provides a balanced, thorough analysis. Be as comprehensive as possible, and include all information that is relevant to the overall research question. People are using you for deep research and will expect detailed, comprehensive answers. +5. Includes a "Sources" section at the end with all referenced links + +You can structure your report in a number of different ways. Here are some examples: + +To answer a question that asks you to compare two things, you might structure your report like this: +1/ intro +2/ overview of topic A +3/ overview of topic B +4/ comparison between A and B +5/ conclusion + +To answer a question that asks you to return a list of things, you might only need a single section which is the entire list. +1/ list of things or table of things +Or, you could choose to make each item in the list a separate section in the report. When asked for lists, you don't need an introduction or conclusion. +1/ item 1 +2/ item 2 +3/ item 3 + +To answer a question that asks you to summarize a topic, give a report, or give an overview, you might structure your report like this: +1/ overview of topic +2/ concept 1 +3/ concept 2 +4/ concept 3 +5/ conclusion + +If you think you can answer the question with a single section, you can do that too! +1/ answer + +REMEMBER: Section is a VERY fluid and loose concept. You can structure your report however you think is best, including in ways that are not listed above! +Make sure that your sections are cohesive, and make sense for the reader. + +For each section of the report, do the following: +- Use simple, clear language +- Use ## for section title (Markdown format) for each section of the report +- Do NOT ever refer to yourself as the writer of the report. This should be a professional report without any self-referential language. +- Do not say what you are doing in the report. Just write the report without any commentary from yourself. +- Each section should be as long as necessary to deeply answer the question with the information you have gathered. It is expected that sections will be fairly long and verbose. You are writing a deep research report, and users will expect a thorough answer. +- Use bullet points to list out information when appropriate, but by default, write in paragraph form. + +Format the report in clear markdown with proper structure and include source references where appropriate. + + +- Assign each unique URL a single citation number in your text +- End with ### Sources that lists each source with corresponding numbers +- IMPORTANT: Number sources sequentially without gaps (1,2,3,4...) in the final list regardless of which sources you choose +- Each source should be a separate line item in a list, so that in markdown it is rendered as a list. +- Example format: + [1] Source Title: URL + [2] Source Title: URL +- Citations are extremely important. Make sure to include these, and pay a lot of attention to getting these right. Users will often use these citations to look into more information. +`} + /> +
+
+ + Phase 3: Supervisor diffusion loop +

This is the heart of the algorithm. The supervisor runs an iterative loop that:

+ + Analyzes gaps in the current draft + Delegates research tasks to sub-agents (in parallel) + Incorporates findings back into the draft + Repeats until research is complete + + + +
+ + Show prompt: Diffusion algorithm (supervisor loop) + +
+ +1. generate the next research questions to address gaps in the draft report +2. **conduct_research**: retrieve external information to provide concrete delta for denoising +3. **refine_draft**: remove "noise" (imprecision, incompleteness) from the draft report +4. **research_complete**: complete research only based on conduct_research tool's findings' + completeness. it should not be based on the draft report. even if the draft report looks + complete, you should continue doing the research until all the research findings are + collected. You know the research findings are complete by running conduct_research tool + to generate diverse research questions to see if you cannot find any new findings. +`} + /> +
+
+ + Inside the supervisor: the actual diffusion iteration +

+ Now let's look inside SupervisorAgent.Coordinate to see the actual loop. + This is where tool calls are parsed, parallelism is handled, and the draft evolves: +

+ {"research_topic": "..."} + // {} + // {} + + toolCalls := runtime.ParseToolCalls(content) + + // Check for research completion FIRST + if s.hasResearchComplete(toolCalls) { + break // Exit the loop - research is done! + } + + // If no tool calls, the model decided to stop + if len(toolCalls) == 0 { + break + } + + // ==================================================================== + // STEP 4: SPLIT TOOL CALLS BY TYPE + // ==================================================================== + // This is where parallelism is set up: + // - conduct_research calls → run in PARALLEL (separate goroutines) + // - think/refine_draft calls → run SEQUENTIALLY + + var conductResearchCalls []runtime.ToolCallParsed + var otherCalls []runtime.ToolCallParsed + + for _, tc := range toolCalls { + if tc.Tool == "conduct_research" { + conductResearchCalls = append(conductResearchCalls, tc) + } else { + otherCalls = append(otherCalls, tc) + } + } + + // Execute sequential tools first (think, refine_draft) + var toolResults []string + for _, tc := range otherCalls { + result, _ := s.executeToolCall(ctx, tc, state, ...) + toolResults = append(toolResults, result) + } + + // ==================================================================== + // STEP 5: EXECUTE RESEARCH IN PARALLEL + // ==================================================================== + // This is the parallel fan-out! Each conduct_research call spawns + // a separate sub-agent in its own goroutine. + + if len(conductResearchCalls) > 0 { + researchResults, err := s.executeParallelResearch( + ctx, + conductResearchCalls, + state, + subResearcher, // <-- The callback that creates sub-agents + &researcherNum, + &totalCost, + ) + toolResults = append(toolResults, researchResults...) + } + + // ==================================================================== + // STEP 6: ADD RESULTS TO CONVERSATION HISTORY + // ==================================================================== + // Tool results are added as a "user" message so the LLM sees them + // in the next iteration. This is how context accumulates! + + state.AddMessage(llm.Message{ + Role: "user", + Content: strings.Join(toolResults, "\\n\\n---\\n\\n"), + }) + + // Loop continues to next iteration... + } + + // Return the final state after all iterations + return &SupervisorResult{ + Notes: state.Notes, + DraftReport: state.DraftReport, + IterationsUsed: state.Iterations, + SubInsights: state.GetSubInsights(), + Cost: totalCost, + }, nil +} + `} + /> + +
+ + Show prompt: Lead researcher (supervisor) + +
+ +1. generate the next research questions to address gaps in the draft report +2. **conduct_research**: retrieve external information to provide concrete delta for denoising +3. **refine_draft**: remove "noise" (imprecision, incompleteness) from the draft report +4. **research_complete**: complete research only based on conduct_research tool's findings' + completeness. it should not be based on the draft report. + + + +- **Bias towards single agent** - Use single agent unless clear parallelization opportunity +- **Stop when you can answer confidently** - Don't keep delegating for perfection +- **Limit tool calls** - Always stop after {maxIterations} tool calls + + + +**Simple fact-finding**: Use 1 sub-agent +**Comparisons**: Use sub-agent per element (max 3 parallel) +**Data Analysis**: Delegate with clear file path AND analysis objective + +**Important**: When calling conduct_research, provide complete standalone instructions - +sub-agents can't see other agents' work +`} + /> +
+
+ + Parallel research fan-out +

+ When the supervisor receives multiple conduct_research calls in one response, + they execute in parallel (maxConcurrent defaults to 3). If a batch is still + running, avoid issuing a new fan-out to reduce thrash/backpressure. +

+ + + + What each sub-researcher actually does +

+ Each sub-researcher is a complete agent with its own tool loop. It receives only the topic + (no visibility into other agents' work) and runs its own search/analysis cycle: +

+ + +
+ + Show prompt: Sub-researcher (tool loop) + +
+ +**Tool Call Budgets** (Prevent excessive searching): +- **Simple queries**: Use 2-3 search tool calls maximum +- **Complex queries**: Use up to 5 search tool calls maximum +- **Always stop**: After 5 search tool calls if you cannot find the right sources + +**Stop Immediately When**: +- You can answer the user's question comprehensively +- You have 3+ relevant examples/sources for the question +- Your last 2 searches returned similar information + + + +After each search tool call, use think to analyze the results: +- What key information did I find? +- What's missing? +- Do I have enough to answer the question comprehensively? +- Should I search more or provide my answer? +`} + /> +
+
+ +
+ + Show prompt: Research compression (sub-agent → supervisor) + +
+ +**IMPORTANT**: Focus only on substantive research content: +- **Include**: All search results and findings from web searches +- **Exclude**: think tool calls and responses - these are internal agent reflections +- **Focus on**: Actual information gathered from external sources + + + +1. Output findings should be fully comprehensive and include ALL information verbatim +2. Include inline citations for each source +3. Include a "Sources" section at the end with all sources +4. Make sure to include ALL sources - a later LLM will merge this with others + +Critical: Any information even remotely relevant must be preserved verbatim +(don't rewrite, summarize, or paraphrase it). +`} + /> +
+
+ + Phase 4: Final report synthesis +

+ After the diffusion loop completes, the final phase synthesizes everything into a polished + report. This applies the Insightfulness + Helpfulness quality rules: +

+ + +
+ + Show prompt: Final report (quality rules) + +
+ +- Granular breakdown - Does the response have granular breakdown of topics + and their specific causes and specific impacts? +- Detailed mapping table - Does the response have a detailed table mapping + causes and effects? +- Nuanced discussion - Does the response have detailed exploration and + explicit discussion? + + + +- Satisfying user intent - Does the response directly address the user's request? +- Ease of understanding - Is the response fluent, coherent, and logically structured? +- Accuracy - Are the facts, reasoning, and explanations correct? +- Appropriate language - Is the tone suitable and professional? + + + +- Assign each unique URL a single citation number in your text +- End with ### Sources that lists each source with corresponding numbers +- IMPORTANT: Number sources sequentially without gaps (1,2,3,4...) +- Citations are extremely important - users rely on these +`} + /> +
+
+ + + The key insight: Unlike traditional pipelines that generate content in one + pass, diffusion builds the report iteratively. Each supervisor iteration sees the{' '} + current draft and accumulated research, allowing the system to + self-correct and fill gaps it discovers along the way. This is why the loop checks findings + completeness, not draft polish. + + + Gap closing & context +

+ The algorithm explicitly separates information gap closing from{' '} + generation gap closing: +

+ + + + Why separate the stages? + +
+ “There is a trade-off between the two gaps. We cannot optimize the generation gap too + early when the system is still optimizing the information gap because the generation gap + tends to bring more verbose and stylistic content that can distract from finding missing + information.” +
+ — Paichun Lin, ThinkDepth.ai +
+ +

+ Stage 1 characteristics: +

+ + + Focus on what information exists, not how to present it + + Draft updates are functional, not polished + Prioritizes breadth of coverage + + Uses global-context OR section-specific queries based on gap analysis + + + +

+ Stage 2 characteristics: +

+ + All information is available + Focus on presentation, coherence, and user satisfaction + Applies full Insightfulness + Helpfulness rules + Generates final deliverable with proper citations + + + Context engineering considerations +

+ Long-horizon research tasks face several context challenges. The diffusion approach + addresses each systematically: +

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ProblemDescriptionDiffusion Solution
Context PoisoningHallucinations enter contextDraft serves as verified state
Context DistractionToo much context overwhelms focusParallel sub-agents with isolated contexts
Context ConfusionSuperfluous context influences outputStructured finding format with compression
Context ClashParts of context disagreeSupervisor resolves conflicts during refinement
+
+ + Draft as context anchor +

+ The draft serves as a persistent, verified context that: +

+ + + Evolves incrementally: Each refine_draft call is validated + + + Structures information: Prevents disorganized accumulation + + + Guides research: Makes gaps explicit + + + Maintains coherence: Narrative thread across iterations + + + + + {`Traditional RAG: Diffusion Approach: + +Query → Search → Response Query → Brief → Draft → [Research → Refine] × N → Report + +Context grows unboundedly Draft stays ~constant size +No structure Structured by sections +Can contradict itself Conflicts resolved each iteration`} + + + Multi-agent context isolation +

+ Sub-researchers operate with isolated contexts—they cannot see each + other's work. This prevents topic A's findings from biasing topic B's + research, keeps context from growing unboundedly during parallel work, and avoids confusion + from interleaved search results. +

+ + Benchmark performance (RACE + FACT) +

+ RACE (report quality) and FACT (citation quality) are the primary DeepResearch Bench lenses: + RACE judges coverage, insight, instruction-following, and readability; FACT scores citation + accuracy and effective citations. +

+

+ + DeepResearch Bench + {' '} + is the comprehensive benchmark for evaluating Deep Research Agents. It consists of{' '} + 100 PhD-level research tasks designed by domain experts across Science + & Technology, Finance & Business, Software, and other fields. +

+ + RACE framework (report quality) +

RACE evaluates report generation quality through four dimensions:

+ + + Comprehensiveness: Coverage breadth and depth (measures information gap + closing) + + + Insight / Depth: Quality, originality, logic, and value of analysis + + + Instruction Following: Adherence to task requirements and constraints + + + Readability: Clarity of structure, fluency, ease of understanding + (measures generation gap closing) + + + + FACT framework (citation quality) +

FACT evaluates information retrieval and grounding capabilities:

+ + Automatically extract statement-URL pairs from the report + Deduplicate redundant pairs + Web scrape + LLM judgment to verify support + + Calculate Citation Accuracy (% correctly supported) and Effective Citations (avg verified + per task) + + + + + + Why diffusion outperforms + + + Iterative refinement catches gaps → Higher Comprehensiveness. Each + iteration identifies and fills missing information. Traditional single-pass cannot + self-correct. + + + Parallel execution is efficient → Better Coverage. Up to 3 + sub-researchers gather diverse perspectives simultaneously with isolated contexts. + + + Explicit completion criteria → Validated Comprehensiveness. Research ends + based on findings comprehensiveness, not draft appearance. + + + Self-balancing adaptivity → Right-Sized Research. Simple topics: 2-3 + iterations. Complex topics: 10+ iterations as needed. + + + Draft as context anchor → Higher Readability. Draft serves as persistent + context across iterations, reducing the “lost in the middle” problem. + + + Quality rules in final generation → Higher Insight. Insightfulness Rules + (granular breakdown, detailed tables, nuanced discussion) applied systematically. + + + + Test it out? + +

+ I implemented a version of this in go in the blog repository, in: /go-research. + It expects API keys and runs a REPL with multiple architectures, including{' '} + /think_deep (diffusion), /storm, and /fast (just a + simple ReAct agent). It is not finished software and can execute ai generated code in your + environment... :) At your own risk! +

+
+

+ Browse the code:{' '} + + Go implementation (think_deep) + +
+ CLI with multiple architectures:{' '} + + Go Research + +

+ + + {`# Env (required) +OPENROUTER_API_KEY=sk-or-... +BRAVE_API_KEY=sk-brave-... + +# Optional +RESEARCH_VAULT=~/research-vault # Obsidian-compatible vault path +RESEARCH_VERBOSE=true # Verbose REPL logs + +# Run (from repo root) +cd go-research +cp .env.example .env # optional template; env vars still required +OPENROUTER_API_KEY=... BRAVE_API_KEY=... go run ./cmd/research + +# In the REPL (architectures are commands): +# /think_deep # diffusion/self-balancing +# /storm # STORM multi-perspective +# /fast # single-worker quick pass +# /architectures # list available +# /help # all commands`} + + + Practical takeaways + + + Start with a draft. It reveals gaps faster than a blank page and provides + structure for subsequent research. + + + Deduplicate by URL before synthesis. Keeps signal high and prevents the + same source from being cited multiple times with different wordings. + + + Completion is about evidence coverage, not aesthetics. Run diverse + queries and only stop when they yield no new facts. + + + Cap iterations and concurrency. 15 loops max, 3 agents max. Prevents + thrash and keeps costs predictable. + + + Separate information gap from generation gap. Don't polish until the + facts are locked—otherwise you're polishing hallucinations. + + + Isolate sub-agent contexts. Each sub-researcher should have complete, + standalone instructions. They can't see other agents' work. + + + Compress findings, preserve everything. When returning to the supervisor, + remove only obvious duplicates—never summarize or paraphrase. + + + + References and further reading +

+ Acknowledgment: Paichun Lin — seminal work on self-balancing agentic AI and text-time + diffusion directly inspired this implementation. Well done! +

+ + + + Google Research: Deep Researcher with Test-Time Diffusion (2025) + + + + + Paichun Lin: Self-Balancing Agentic AI: Test-Time Diffusion and Context Engineering + Re-imagined + + + + + DeepResearch Bench Leaderboard + + + + + DeepResearch Bench Paper and Documentation + + + + + ThinkDepth.ai Open Source Reference Implementation - Python. (Thanks for the open source + and innovations! I learned a lot from it.) + + + + + Richard Sutton: The Bitter Lesson (2019) + + + + + My implementation of the ThinkDepth.ai architecture in Go + + + +
+ ); +} diff --git a/components/blog-posts/diffusion.txt b/components/blog-posts/diffusion.txt new file mode 100644 index 0000000..f74dfc1 --- /dev/null +++ b/components/blog-posts/diffusion.txt @@ -0,0 +1,34 @@ +# Intro + +Remember back in school when you had one of those infamous and dreaded group projects (I kinda liked them)... +at least a few times you probably tried the "parrallel" way of working, optimizing for a bit less colaboration and each participant owning one segment of the report. +Off you go! Each person for themselves writing extencive backgrounds, history, theory, or whatever sgements you decided on. Then you meet up 3 hours before the deadline to +"glue the report" together... how did that turn out? +The result was probably: +- Repetetive +- Inconsistant +- Different tone of voice per segment +- Vastly different quality per segement +- No the grade you hoped for + + +It turns out, when we construct our AI research agents like this (plan -> parrallel research -> glue research into report), we get the same problem! When no context of the "evolving report" is shared accross sub-agents, +we get a fragmentet ball of mudd. + +These sequantial and isolated group projects/research agents have their percs, like high level of autonomy and parrelelism... but there are porbably better ways to do it. + +# Diffusion Deep Research + +Think of diffusion agents models like brainstorming, but instead of everyone writing their own part in isolation and building a Frankenstein report, +the research spreads and overlaps as it evolves within the team. Ideas for each segemnt are not isolated, as not one person owns each segemnt. + +The team starts off by writing a draft, only based on their internal knowledge. +Typically in bullet point format with clear notes about missing references, knowledge gaps, outdated information, and uncertanty. + +The studens prioritizes these knowledge gaps together and researches different perspectives of those gaps (in parrallel isolation) and adds them back to the report in an iterative manner as a group. +Gradually evolving the draft to an itterativly better report, filling gaps and enriching knowledge. The draft grows to become the final report. In each writing step, the students have a clear process +for transforming rich knowledge into a consise report that fits into the whole story they are trying to tell. + +To me, this makes a lot more sense! I'll explore the implementation details of text deffusion in this blogpost. Enjoy! + + diff --git a/components/custom/blog-heading.tsx b/components/custom/blog-heading.tsx index 0ed57e9..c752cf3 100644 --- a/components/custom/blog-heading.tsx +++ b/components/custom/blog-heading.tsx @@ -19,8 +19,7 @@ const headingVariants = cva('font-bold text-balance scroll-mt-20', { }); export interface BlogHeadingProps - extends React.HTMLAttributes, - VariantProps { + extends React.HTMLAttributes, VariantProps { level: 1 | 2 | 3 | 4 | 5 | 6; } diff --git a/components/custom/blog-list.tsx b/components/custom/blog-list.tsx index 8bec980..69d73a3 100644 --- a/components/custom/blog-list.tsx +++ b/components/custom/blog-list.tsx @@ -16,7 +16,8 @@ const listVariants = cva('my-6 space-y-2', { }); export interface BlogListProps - extends React.HTMLAttributes, + extends + React.HTMLAttributes, VariantProps { variant?: 'unordered' | 'ordered' | 'checklist'; } diff --git a/components/custom/callout.tsx b/components/custom/callout.tsx index 8026943..4e7afdf 100644 --- a/components/custom/callout.tsx +++ b/components/custom/callout.tsx @@ -1,3 +1,5 @@ +'use client'; + import * as React from 'react'; import { cva, type VariantProps } from 'class-variance-authority'; import { cn } from '@/lib/utils'; @@ -25,8 +27,7 @@ const iconMap = { }; export interface CalloutProps - extends React.HTMLAttributes, - VariantProps { + extends React.HTMLAttributes, VariantProps { variant?: 'info' | 'warning' | 'tip' | 'note'; title?: string; icon?: React.ReactNode; @@ -40,13 +41,13 @@ export function Callout({ children, ...props }: CalloutProps) { - const Icon = icon || iconMap[variant]; + const IconComponent = iconMap[variant]; return (
{title &&

{title}

} diff --git a/components/custom/index.ts b/components/custom/index.ts index bfd230c..ede8f36 100644 --- a/components/custom/index.ts +++ b/components/custom/index.ts @@ -7,3 +7,4 @@ export { Callout } from './callout'; export { Figure } from './figure'; export { Terminal } from './terminal'; export { TerminalCommand } from './terminal-command'; +export { Prompt } from './prompt'; diff --git a/components/custom/prompt.tsx b/components/custom/prompt.tsx new file mode 100644 index 0000000..a37b538 --- /dev/null +++ b/components/custom/prompt.tsx @@ -0,0 +1,65 @@ +'use client'; + +import { useState } from 'react'; +import { Check, Copy } from 'lucide-react'; +import { Button } from '@/components/ui/button'; +import { cn } from '@/lib/utils'; + +type PromptProps = { + value: string; + label?: string; + className?: string; +}; + +export function Prompt({ value, label = 'Prompt', className }: PromptProps) { + const [copied, setCopied] = useState(false); + + const handleCopy = async () => { + try { + await navigator.clipboard.writeText(value); + setCopied(true); + setTimeout(() => setCopied(false), 1500); + } catch (error) { + console.error('Copy failed', error); + } + }; + + return ( +
+
+ + {label} + + +
+
+
+          {value}
+        
+
+
+ ); +} diff --git a/feedback.md b/feedback.md new file mode 100644 index 0000000..3557ec6 --- /dev/null +++ b/feedback.md @@ -0,0 +1,65 @@ +# Diffusion Deep Research blog feedback → action TODOs + +Scope + +- Page: `http://localhost:3000/blog/diffusion-deep-research` +- Audience: agent/LLM practitioners + +✅ TODO 2 — Remove stray Pause/Play controls in reading flow + +- Removed all Pause/Play controls and related state from the diffusion animations: + - `components/animations/diffusion/draft-denoising.tsx` + - `components/animations/diffusion/diffusion-overview.tsx` + - `components/animations/diffusion/parallel-agents.tsx` +- Components now autoplay when in view; no controls are rendered in the blog post. + +TODO 4 — Restructure the spine for clarity + +- Current order: problem → diffusion → theory/math → phases → diffusion algorithm → code → benchmarks → takeaways. +- Target order inside `components/blog-posts/diffusion-deep-research.tsx`: + 1. TL;DR (TODO 3) + Why diffusion (keep) + 2. Problem with single-pass (keep) + 3. Diffusion intuition (keep) + 4. Core algorithm overview: state machine/pseudocode (brief, before math) + 5. Phases 1–4 walkthrough (existing content) + 6. Detailed code walkthrough (Go excerpts) + 7. Benchmarks (RACE/FACT) to appear before or right after the code walkthrough intro to establish credibility earlier + 8. Practical takeaways (surface earlier and also keep at end) +- Move the “Core architecture: four phases” block above “Theoretical foundations / Mathematical formulation” so readers see the shape before symbols. + +TODO 5 — Tighten headers and reduce redundancy + +- Merge overlapping headers: “The diffusion algorithm”, “How the diffusion loop actually runs”, and “Phase 3: The diffusion loop (where the magic happens)” into a single parent header like “Diffusion loop (core)” with subsections for pseudocode and code walkthrough. +- Group “Self-balancing: two-stage gap closing” and “Context engineering considerations” under one section (“Gap closing & context”) to cut header sprawl. +- Make headers promise content: e.g., “Phase 3: Diffusion loop — supervisor + parallel sub-researchers”. + +TODO 6 — Clarify completion criteria and safety/quality guardrails + +- In the termination list around lines ~234-239 of `diffusion-deep-research.tsx`, spell out priority: (1) no new findings from diverse queries, (2) hard cap 15 iterations, (3) supervisor override with rationale. +- Add a short “Guardrails” callout near the termination section: + - Hallucination control: require citation for new facts; drop uncited claims. + - Tool failure/empty SERP: backoff/retry once, then mark as gap. + - Dedup by URL before synthesis (already mentioned later—cross-link it). + +TODO 7 — Parallelism defaults and backpressure + +- Near the parallelism explanation (`executeParallelResearch` excerpt), add explicit defaults: maxConcurrent = 3, per-sub-agent search max = 5, timeouts if available. +- Brief note on backpressure: supervisor should skip new fan-out if previous conduct_research calls are still running; if not implemented, note as a TODO in code or text. + +TODO 8 — Add quickstart/config and run snippet + +- In the “Configuration reference” section, add a minimal “run it” example (e.g., `GO_BRAVE_API_KEY=... go run ./cmd/... "research prompt"` or the Node/CLI equivalent if exposed). +- List env/flags that matter for diffusion behavior: max iterations, concurrency, search limits, citation dedupe toggle. + +TODO 9 — Micro-UX and readability + +- Break a few long paragraphs (intro and diffusion intuition) into shorter blocks; add one-line micro-summaries before long bullet lists. +- Add a tiny glossary when RACE/FACT first appear; link the benchmark URL inline where it’s first mentioned. + +Execution order suggestion + +1. Fix missing-"s" rendering (TODO 1) so content is readable. +2. Hide Pause/Play controls (TODO 2) to clean UX. +3. Insert TL;DR + example and reorder sections (TODO 3 + 4 + 5). +4. Clarify completion/guardrails and parallelism defaults (TODO 6 + 7). +5. Add quickstart/config snippet and polish readability/glossary (TODO 8 + 9). diff --git a/go-research/README.md b/go-research/README.md index d661e0c..621473b 100644 --- a/go-research/README.md +++ b/go-research/README.md @@ -18,43 +18,43 @@ go run ./cmd/research ### Research Agents -| Command | Description | -|---------|-------------| -| `/fast ` | Quick single-worker research | +| Command | Description | +| ---------------- | -------------------------------------------------------------------------- | +| `/fast ` | Quick single-worker research | | `/storm ` | STORM: Multi-perspective conversations with cross-validation and synthesis | ### Active Session -| Command | Description | -|---------|-------------| -| `/expand ` | Expand on current research | -| `/workers` | Show worker/conversation status | +| Command | Description | +| ---------------- | ------------------------------- | +| `/expand ` | Expand on current research | +| `/workers` | Show worker/conversation status | ### Sessions & History -| Command | Description | -|---------|-------------| -| `/sessions` | List all sessions | -| `/load ` | Load a previous session | -| `/new` | Clear session and start fresh | -| `/rerun ` | Rerun a previous query | +| Command | Description | +| ------------- | ----------------------------- | +| `/sessions` | List all sessions | +| `/load ` | Load a previous session | +| `/new` | Clear session and start fresh | +| `/rerun ` | Rerun a previous query | ### Settings & Controls -| Command | Description | -|---------|-------------| -| `/recompile` | Hot reload the agent | -| `/verbose on\|off` | Toggle verbose mode | -| `/model ` | Switch LLM model | +| Command | Description | +| ------------------ | -------------------- | +| `/recompile` | Hot reload the agent | +| `/verbose on\|off` | Toggle verbose mode | +| `/model ` | Switch LLM model | ### Meta -| Command | Description | -|---------|-------------| -| `/help` | Show all commands | -| `/quit` | Exit the REPL | -| `/architectures` | List available research architectures | -| `/benchmark ` | Compare architecture results | +| Command | Description | +| -------------------- | ------------------------------------- | +| `/help` | Show all commands | +| `/quit` | Exit the REPL | +| `/architectures` | List available research architectures | +| `/benchmark ` | Compare architecture results | **Tip:** Just type your question to start STORM research. After research, type follow-ups to expand. @@ -225,16 +225,16 @@ When you run `/storm `, you'll see: ### Key Components -| Component | Description | -|-----------|-------------| -| **REPL** | Interactive shell with readline, command parsing, and colored output | -| **STORM Orchestrator** | Coordinates the 4-phase research flow | -| **Perspective Discovery** | Surveys topics and generates expert perspectives | -| **Conversation Simulation** | Parallel WikiWriter↔TopicExpert dialogues | -| **Analysis Agent** | Validates facts, detects contradictions, fills gaps | -| **Synthesis Agent** | Two-phase outline generation and report writing | -| **Event Bus** | Pub/sub system for real-time progress updates | -| **Session Store** | JSON persistence with Obsidian markdown export | +| Component | Description | +| --------------------------- | -------------------------------------------------------------------- | +| **REPL** | Interactive shell with readline, command parsing, and colored output | +| **STORM Orchestrator** | Coordinates the 4-phase research flow | +| **Perspective Discovery** | Surveys topics and generates expert perspectives | +| **Conversation Simulation** | Parallel WikiWriter↔TopicExpert dialogues | +| **Analysis Agent** | Validates facts, detects contradictions, fills gaps | +| **Synthesis Agent** | Two-phase outline generation and report writing | +| **Event Bus** | Pub/sub system for real-time progress updates | +| **Session Store** | JSON persistence with Obsidian markdown export | ### Data Flow @@ -398,11 +398,11 @@ Each perspective runs a simulated conversation: ### Perspective-Based Research -| Complexity | Perspectives | Use Case | -|------------|--------------|----------| -| Simple | 2-3 | Factual queries with limited scope | -| Moderate | 3-4 | Multi-aspect topics needing diverse views | -| Complex | 5-6 | Deep research requiring comprehensive coverage | +| Complexity | Perspectives | Use Case | +| ---------- | ------------ | ---------------------------------------------- | +| Simple | 2-3 | Factual queries with limited scope | +| Moderate | 3-4 | Multi-aspect topics needing diverse views | +| Complex | 5-6 | Deep research requiring comprehensive coverage | ### Session Persistence @@ -412,10 +412,11 @@ Sessions are saved in two formats: 2. **Obsidian** (`~/research-vault//`) - Markdown files with frontmatter Obsidian structure: + ``` / ├── session.md # Session overview with wiki-links ├── conversations/ # Per-perspective conversation logs └── reports/ └── report_v1.md -``` \ No newline at end of file +``` diff --git a/go-research/internal/agents/sub_researcher.go b/go-research/internal/agents/sub_researcher.go index 8120dd2..e2ea158 100644 --- a/go-research/internal/agents/sub_researcher.go +++ b/go-research/internal/agents/sub_researcher.go @@ -8,10 +8,10 @@ import ( "strings" "time" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/events" "go-research/internal/llm" "go-research/internal/session" - "go-research/internal/think_deep" "go-research/internal/tools" ) @@ -30,15 +30,16 @@ type SubResearcherAgent struct { // SubResearcherConfig configures the sub-researcher agent behavior. type SubResearcherConfig struct { - // MaxIterations is the maximum number of search iterations. - // Simple queries: 2-3, complex queries: up to 5 + // MaxIterations is a safety limit for the maximum number of search iterations. + // The actual iteration limit is controlled by the prompt (2-3 for simple, 5 for complex). + // This is a fallback to prevent runaway loops if the LLM ignores prompt instructions. MaxIterations int } // DefaultSubResearcherConfig returns sensible defaults for sub-researcher configuration. func DefaultSubResearcherConfig() SubResearcherConfig { return SubResearcherConfig{ - MaxIterations: 5, + MaxIterations: 20, } } @@ -50,7 +51,7 @@ func NewSubResearcherAgent( cfg SubResearcherConfig, ) *SubResearcherAgent { if cfg.MaxIterations == 0 { - cfg.MaxIterations = 5 + cfg.MaxIterations = 20 } return &SubResearcherAgent{ client: client, @@ -76,7 +77,7 @@ type SubResearcherResult struct { VisitedURLs []string // Insights contains structured insights extracted from search results - Insights []think_deep.SubInsight + Insights []runtime.SubInsight // Cost tracks token usage for this sub-researcher run Cost session.CostBreakdown @@ -94,12 +95,12 @@ func (r *SubResearcherAgent) Research(ctx context.Context, topic string, researc // researchWithIteration performs research with a specific diffusion iteration context. func (r *SubResearcherAgent) researchWithIteration(ctx context.Context, topic string, researcherNum int, diffusionIteration int) (*SubResearcherResult, error) { - state := think_deep.NewResearcherState(topic) + state := runtime.NewResearcherState(topic) var totalCost session.CostBreakdown // Build system prompt date := time.Now().Format("2006-01-02") - systemPrompt := think_deep.ResearchAgentPrompt(date) + systemPrompt := runtime.ResearchAgentPrompt(date) // Initialize conversation messages := []llm.Message{ @@ -134,7 +135,7 @@ func (r *SubResearcherAgent) researchWithIteration(ctx context.Context, topic st messages = append(messages, llm.Message{Role: "assistant", Content: content}) // Parse tool calls - toolCalls := think_deep.ParseToolCalls(content) + toolCalls := runtime.ParseToolCalls(content) // If no tool calls, research is complete if len(toolCalls) == 0 { @@ -228,7 +229,7 @@ func (r *SubResearcherAgent) researchWithIteration(ctx context.Context, topic st // It preserves ALL search results verbatim while filtering out think_tool calls. func (r *SubResearcherAgent) compressResearch(ctx context.Context, topic string, messages []llm.Message) (string, session.CostBreakdown, error) { date := time.Now().Format("2006-01-02") - compressPrompt := think_deep.CompressResearchPrompt(date, topic) + compressPrompt := runtime.CompressResearchPrompt(date, topic) // Build compression context from messages var researchContent strings.Builder @@ -241,7 +242,7 @@ func (r *SubResearcherAgent) compressResearch(ctx context.Context, topic string, } // Filter out think tool calls and their results - content := think_deep.FilterThinkToolCalls(m.Content) + content := runtime.FilterThinkToolCalls(m.Content) if strings.TrimSpace(content) == "" { continue } @@ -326,8 +327,8 @@ func truncateForLog(s string, maxLen int) string { // distinct finding with source attribution. // Enhanced to extract individual sources from search result blocks and capture // tool usage, query context, and full source references. -func extractInsightsFromSearchResults(topic string, rawNotes []string, researcherNum int, iteration int) []think_deep.SubInsight { - var insights []think_deep.SubInsight +func extractInsightsFromSearchResults(topic string, rawNotes []string, researcherNum int, iteration int) []runtime.SubInsight { + var insights []runtime.SubInsight insightNum := 0 for _, note := range rawNotes { @@ -447,7 +448,7 @@ func splitIntoSourceBlocks(note string) []sourceBlock { } // createInsightFromBlock creates a SubInsight from a parsed source block -func createInsightFromBlock(block sourceBlock, topic string, insightNum int, researcherNum int, iteration int, fullNote string) *think_deep.SubInsight { +func createInsightFromBlock(block sourceBlock, topic string, insightNum int, researcherNum int, iteration int, fullNote string) *runtime.SubInsight { // Determine the finding text finding := block.summary if finding == "" { @@ -463,19 +464,19 @@ func createInsightFromBlock(block sourceBlock, topic string, insightNum int, res toolUsed, queryUsed := extractToolContext(fullNote) // Determine source type based on tool used or content - sourceType := think_deep.SourceTypeWeb + sourceType := runtime.SourceTypeWeb if toolUsed == "read_document" || toolUsed == "read_xlsx" || toolUsed == "analyze_csv" { - sourceType = think_deep.SourceTypeDocument + sourceType = runtime.SourceTypeDocument } else if toolUsed == "fetch" { - sourceType = think_deep.SourceTypeWeb + sourceType = runtime.SourceTypeWeb } else if strings.Contains(fullNote, "Read document:") || strings.Contains(fullNote, "Workbook:") { - sourceType = think_deep.SourceTypeDocument + sourceType = runtime.SourceTypeDocument } // Create source reference with full content - var sources []think_deep.SourceReference + var sources []runtime.SourceReference if block.url != "" { - sources = append(sources, think_deep.SourceReference{ + sources = append(sources, runtime.SourceReference{ URL: block.url, Type: sourceType, Title: block.title, @@ -483,9 +484,9 @@ func createInsightFromBlock(block sourceBlock, topic string, insightNum int, res RawContent: block.content, FetchedAt: time.Now(), }) - } else if sourceType == think_deep.SourceTypeDocument && queryUsed != "" { + } else if sourceType == runtime.SourceTypeDocument && queryUsed != "" { // For document sources, create a file-based source reference - sources = append(sources, think_deep.SourceReference{ + sources = append(sources, runtime.SourceReference{ FilePath: queryUsed, Type: sourceType, Title: block.title, @@ -501,7 +502,7 @@ func createInsightFromBlock(block sourceBlock, topic string, insightNum int, res // Use domain as title title = extractDomain(block.url) } - if title == "" && sourceType == think_deep.SourceTypeDocument && queryUsed != "" { + if title == "" && sourceType == runtime.SourceTypeDocument && queryUsed != "" { // Use filename as title for documents parts := strings.Split(queryUsed, "/") if len(parts) > 0 { @@ -520,7 +521,7 @@ func createInsightFromBlock(block sourceBlock, topic string, insightNum int, res analysisChain = append(analysisChain, fmt.Sprintf("Tool used: %s", toolUsed)) } if queryUsed != "" { - if sourceType == think_deep.SourceTypeDocument { + if sourceType == runtime.SourceTypeDocument { analysisChain = append(analysisChain, fmt.Sprintf("Document analyzed: %s", queryUsed)) } else { analysisChain = append(analysisChain, fmt.Sprintf("Query: %s", queryUsed)) @@ -533,11 +534,11 @@ func createInsightFromBlock(block sourceBlock, topic string, insightNum int, res // For document sources, use file path as source URL for tracking sourceURL := block.url - if sourceURL == "" && sourceType == think_deep.SourceTypeDocument && queryUsed != "" { + if sourceURL == "" && sourceType == runtime.SourceTypeDocument && queryUsed != "" { sourceURL = "file://" + queryUsed } - return &think_deep.SubInsight{ + return &runtime.SubInsight{ ID: fmt.Sprintf("insight-%03d", insightNum), Topic: topic, Title: title, diff --git a/go-research/internal/agents/sub_researcher_integration_test.go b/go-research/internal/agents/sub_researcher_integration_test.go index fd714c8..f737618 100644 --- a/go-research/internal/agents/sub_researcher_integration_test.go +++ b/go-research/internal/agents/sub_researcher_integration_test.go @@ -9,10 +9,10 @@ import ( "testing" "time" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/config" "go-research/internal/events" "go-research/internal/llm" - "go-research/internal/think_deep" "go-research/internal/tools" "github.com/joho/godotenv" @@ -63,9 +63,12 @@ func TestSubResearcher_UsesDocumentReaderOnAMDataset(t *testing.T) { } dataset := datasetPath(t) + if _, err := os.Stat(dataset); err != nil { + t.Skipf("AM dataset not present locally (%s): %v", dataset, err) + } client := llm.NewClient(cfg) - registry := think_deep.SubResearcherToolRegistry(cfg.BraveAPIKey, client) + registry := runtime.SubResearcherToolRegistry(cfg.BraveAPIKey, client) loggingRegistry := &loggingToolExecutor{exec: registry} bus := events.NewBus(32) diff --git a/go-research/internal/agents/sub_researcher_test.go b/go-research/internal/agents/sub_researcher_test.go index cc75ec4..4405556 100644 --- a/go-research/internal/agents/sub_researcher_test.go +++ b/go-research/internal/agents/sub_researcher_test.go @@ -5,8 +5,8 @@ import ( "strings" "testing" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/events" - "go-research/internal/think_deep" ) func TestNewSubResearcherAgent(t *testing.T) { @@ -347,7 +347,7 @@ func TestFilterThinkToolCalls(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - result := think_deep.FilterThinkToolCalls(tt.input) + result := runtime.FilterThinkToolCalls(tt.input) for _, s := range tt.contains { if !strings.Contains(result, s) { diff --git a/go-research/internal/agents/supervisor.go b/go-research/internal/agents/supervisor.go index ef4c5cf..139e669 100644 --- a/go-research/internal/agents/supervisor.go +++ b/go-research/internal/agents/supervisor.go @@ -8,10 +8,10 @@ import ( "sync" "time" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/events" "go-research/internal/llm" "go-research/internal/session" - "go-research/internal/think_deep" ) // SupervisorAgent coordinates ThinkDeep research using the diffusion algorithm. @@ -80,7 +80,7 @@ type SupervisorResult struct { IterationsUsed int // SubInsights contains all structured insights captured during diffusion - SubInsights []think_deep.SubInsight + SubInsights []runtime.SubInsight // Cost tracks token usage for the supervisor Cost session.CostBreakdown @@ -99,7 +99,7 @@ func (s *SupervisorAgent) Coordinate( initialDraft string, subResearcher SubResearcherCallback, ) (*SupervisorResult, error) { - state := think_deep.NewSupervisorState(researchBrief) + state := runtime.NewSupervisorState(researchBrief) state.UpdateDraft(initialDraft) var totalCost session.CostBreakdown @@ -107,7 +107,7 @@ func (s *SupervisorAgent) Coordinate( // Build system prompt once date := time.Now().Format("2006-01-02") - systemPrompt := think_deep.LeadResearcherPrompt(date, s.maxConcurrent, s.maxIterations) + systemPrompt := runtime.LeadResearcherPrompt(date, s.maxConcurrent, s.maxIterations) for state.Iterations < s.maxIterations { state.IncrementIteration() @@ -139,7 +139,7 @@ func (s *SupervisorAgent) Coordinate( state.AddMessage(llm.Message{Role: "assistant", Content: content}) // Parse tool calls from response - toolCalls := think_deep.ParseToolCalls(content) + toolCalls := runtime.ParseToolCalls(content) // Check for research completion if s.hasResearchComplete(toolCalls) { @@ -152,8 +152,8 @@ func (s *SupervisorAgent) Coordinate( } // Separate conduct_research calls from other tools for parallel execution - var conductResearchCalls []think_deep.ToolCallParsed - var otherCalls []think_deep.ToolCallParsed + var conductResearchCalls []runtime.ToolCallParsed + var otherCalls []runtime.ToolCallParsed for _, tc := range toolCalls { if tc.Tool == "conduct_research" { conductResearchCalls = append(conductResearchCalls, tc) @@ -204,7 +204,7 @@ func (s *SupervisorAgent) Coordinate( } // buildMessages constructs the message list for the LLM call. -func (s *SupervisorAgent) buildMessages(systemPrompt string, state *think_deep.SupervisorState) []llm.Message { +func (s *SupervisorAgent) buildMessages(systemPrompt string, state *runtime.SupervisorState) []llm.Message { messages := []llm.Message{ {Role: "system", Content: systemPrompt}, } @@ -239,8 +239,8 @@ Analyze the current state and decide next action. Use the diffusion algorithm: // executeToolCall executes a single tool call and returns the result. func (s *SupervisorAgent) executeToolCall( ctx context.Context, - tc think_deep.ToolCallParsed, - state *think_deep.SupervisorState, + tc runtime.ToolCallParsed, + state *runtime.SupervisorState, subResearcher SubResearcherCallback, researcherNum *int, totalCost *session.CostBreakdown, @@ -268,8 +268,8 @@ func (s *SupervisorAgent) executeToolCall( // executeConductResearch delegates to a sub-researcher and accumulates findings. func (s *SupervisorAgent) executeConductResearch( ctx context.Context, - tc think_deep.ToolCallParsed, - state *think_deep.SupervisorState, + tc runtime.ToolCallParsed, + state *runtime.SupervisorState, subResearcher SubResearcherCallback, researcherNum *int, totalCost *session.CostBreakdown, @@ -312,8 +312,8 @@ func (s *SupervisorAgent) executeConductResearch( // Limited to s.maxConcurrent parallel goroutines using a semaphore. func (s *SupervisorAgent) executeParallelResearch( ctx context.Context, - calls []think_deep.ToolCallParsed, - state *think_deep.SupervisorState, + calls []runtime.ToolCallParsed, + state *runtime.SupervisorState, subResearcher SubResearcherCallback, researcherNum *int, totalCost *session.CostBreakdown, @@ -341,7 +341,7 @@ func (s *SupervisorAgent) executeParallelResearch( wg.Add(1) currentNum := researcherNums[i] - go func(idx int, toolCall think_deep.ToolCallParsed, resNum int) { + go func(idx int, toolCall runtime.ToolCallParsed, resNum int) { defer wg.Done() // Acquire semaphore @@ -430,7 +430,7 @@ func (s *SupervisorAgent) executeParallelResearch( // executeRefineDraft refines the draft report with accumulated findings. func (s *SupervisorAgent) executeRefineDraft( ctx context.Context, - state *think_deep.SupervisorState, + state *runtime.SupervisorState, totalCost *session.CostBreakdown, ) (string, error) { if len(state.Notes) == 0 { @@ -443,7 +443,7 @@ func (s *SupervisorAgent) executeRefineDraft( findings := strings.Join(state.Notes, "\n\n---\n\n") // Generate refinement prompt - prompt := think_deep.RefineDraftPrompt(state.ResearchBrief, state.DraftReport, findings) + prompt := runtime.RefineDraftPrompt(state.ResearchBrief, state.DraftReport, findings) resp, err := s.client.Chat(ctx, []llm.Message{ {Role: "user", Content: prompt}, @@ -470,7 +470,7 @@ func (s *SupervisorAgent) executeRefineDraft( } // hasResearchComplete checks if the research_complete tool was called. -func (s *SupervisorAgent) hasResearchComplete(calls []think_deep.ToolCallParsed) bool { +func (s *SupervisorAgent) hasResearchComplete(calls []runtime.ToolCallParsed) bool { for _, call := range calls { if call.Tool == "research_complete" { return true @@ -480,7 +480,7 @@ func (s *SupervisorAgent) hasResearchComplete(calls []think_deep.ToolCallParsed) } // emitIterationEvent emits a diffusion iteration event. -func (s *SupervisorAgent) emitIterationEvent(state *think_deep.SupervisorState, phase string) { +func (s *SupervisorAgent) emitIterationEvent(state *runtime.SupervisorState, phase string) { if s.bus == nil { return } @@ -526,7 +526,7 @@ func (s *SupervisorAgent) emitDelegationEvent(topic string, researcherNum, itera } // emitDraftRefinedEvent emits an event when the draft is refined. -func (s *SupervisorAgent) emitDraftRefinedEvent(state *think_deep.SupervisorState) { +func (s *SupervisorAgent) emitDraftRefinedEvent(state *runtime.SupervisorState) { if s.bus == nil { return } diff --git a/go-research/internal/agents/supervisor_test.go b/go-research/internal/agents/supervisor_test.go index e048037..3048186 100644 --- a/go-research/internal/agents/supervisor_test.go +++ b/go-research/internal/agents/supervisor_test.go @@ -5,9 +5,9 @@ import ( "strings" "testing" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/events" "go-research/internal/session" - "go-research/internal/think_deep" ) func TestNewSupervisorAgent(t *testing.T) { @@ -421,7 +421,7 @@ func TestSupervisorHasResearchComplete(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - calls := think_deep.ParseToolCalls(tt.content) + calls := runtime.ParseToolCalls(tt.content) result := agent.hasResearchComplete(calls) if result != tt.expected { t.Errorf("expected %v, got %v", tt.expected, result) diff --git a/go-research/internal/architectures/interface.go b/go-research/internal/architectures/interface.go index 05919bd..5d73047 100644 --- a/go-research/internal/architectures/interface.go +++ b/go-research/internal/architectures/interface.go @@ -8,8 +8,8 @@ import ( "time" "go-research/internal/agents" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/session" - "go-research/internal/think_deep" ) // Architecture defines the interface that all research architectures must implement. @@ -44,7 +44,7 @@ type Result struct { Facts []agents.Fact Sources []string Workers []WorkerResult - SubInsights []think_deep.SubInsight // Structured insights from ThinkDeep + SubInsights []runtime.SubInsight // Structured insights from ThinkDeep // Metrics for benchmarking Metrics Metrics diff --git a/go-research/internal/architectures/storm/README.md b/go-research/internal/architectures/storm/README.md index 0527c2b..b39bae5 100644 --- a/go-research/internal/architectures/storm/README.md +++ b/go-research/internal/architectures/storm/README.md @@ -131,6 +131,7 @@ type Perspective struct { ``` **How it works:** + 1. Survey related topics via web search 2. LLM identifies expert angles that would provide comprehensive coverage 3. Each perspective gets a name, focus area, and initial questions @@ -167,6 +168,7 @@ type Perspective struct { ``` **Each turn (see conversation.go:78-142):** + 1. `wikiWriterAsk()` - Generate question based on persona + history 2. Check for exit phrase ("Thank you so much for your help!") 3. `expertGenerateQueries()` - Convert question to 1-3 search queries @@ -223,14 +225,14 @@ Phase 4c: WRITE REPORT ## Key Files -| Component | File | Purpose | -|-----------|------|---------| -| **Orchestrator** | `internal/orchestrator/deep_storm.go` | Main workflow coordination | -| **Conversation Simulator** | `internal/agents/conversation.go` | WikiWriter↔Expert dialogues | -| **Perspective Discovery** | `internal/planning/perspectives.go` | Find expert angles | -| **Analysis Agent** | `internal/agents/analysis.go` | Cross-validation | -| **Synthesis Agent** | `internal/agents/synthesis.go` | Report generation | -| **Architecture Adapter** | `internal/architectures/storm/storm.go` | Public interface | +| Component | File | Purpose | +| -------------------------- | --------------------------------------- | --------------------------- | +| **Orchestrator** | `internal/orchestrator/deep_storm.go` | Main workflow coordination | +| **Conversation Simulator** | `internal/agents/conversation.go` | WikiWriter↔Expert dialogues | +| **Perspective Discovery** | `internal/planning/perspectives.go` | Find expert angles | +| **Analysis Agent** | `internal/agents/analysis.go` | Cross-validation | +| **Synthesis Agent** | `internal/agents/synthesis.go` | Report generation | +| **Architecture Adapter** | `internal/architectures/storm/storm.go` | Public interface | --- @@ -245,6 +247,7 @@ What are the implications of quantum computing on cryptography? ``` **Available Commands:** + - `/storm ` - Multi-perspective research with conversations - `/fast ` - Quick single-worker research (simpler, faster) - `/expand ` - Expand on current research diff --git a/go-research/internal/orchestrator/deep_storm.go b/go-research/internal/architectures/storm/loop.go similarity index 82% rename from go-research/internal/orchestrator/deep_storm.go rename to go-research/internal/architectures/storm/loop.go index c562fed..56060f5 100644 --- a/go-research/internal/orchestrator/deep_storm.go +++ b/go-research/internal/architectures/storm/loop.go @@ -1,4 +1,4 @@ -package orchestrator +package storm import ( "context" @@ -16,13 +16,19 @@ import ( "go-research/internal/tools" ) -// StormOrchestrator coordinates STORM-style deep research using: +// StormLoop coordinates STORM-style deep research using: // - Enhanced perspective discovery (with related topic survey) // - Simulated expert conversations (WikiWriter↔TopicExpert dialogues) // - Two-phase outline generation (draft + refinement) // - Analysis phase for cross-validation // - Structured report synthesis -type StormOrchestrator struct { +// +// High-level phases: +// 1) Perspective discovery: survey related topics and pick 3-5 expert angles. +// 2) Conversation simulation: per perspective, WikiWriter <-> TopicExpert loop with tools. +// 3) Analysis: cross-validate facts, contradictions, and gaps across conversations. +// 4) Synthesis: outline draft + refinement + full report with citations. +type StormLoop struct { bus *events.Bus appConfig *config.Config client llm.ChatClient @@ -34,12 +40,12 @@ type StormOrchestrator struct { tools tools.ToolExecutor } -// StormOrchestratorOption allows configuring the STORM orchestrator -type StormOrchestratorOption func(*StormOrchestrator) +// StormLoopOption allows configuring the STORM loop +type StormLoopOption func(*StormLoop) // WithStormClient injects a custom LLM client (for testing) -func WithStormClient(client llm.ChatClient) StormOrchestratorOption { - return func(o *StormOrchestrator) { +func WithStormClient(client llm.ChatClient) StormLoopOption { + return func(o *StormLoop) { o.client = client o.perspectiveDiscov = planning.NewPerspectiveDiscovererWithTools(client, o.tools) o.conversationSim = agents.NewConversationSimulatorWithBus(client, o.tools, o.bus, agents.DefaultConversationConfig()) @@ -49,18 +55,18 @@ func WithStormClient(client llm.ChatClient) StormOrchestratorOption { } // WithStormTools injects a custom tool executor (for testing) -func WithStormTools(toolExec tools.ToolExecutor) StormOrchestratorOption { - return func(o *StormOrchestrator) { +func WithStormTools(toolExec tools.ToolExecutor) StormLoopOption { + return func(o *StormLoop) { o.tools = toolExec } } -// NewStormOrchestrator creates a new STORM-style research orchestrator -func NewStormOrchestrator(bus *events.Bus, cfg *config.Config, opts ...StormOrchestratorOption) *StormOrchestrator { +// NewStormLoop creates a new STORM-style research loop +func NewStormLoop(bus *events.Bus, cfg *config.Config, opts ...StormLoopOption) *StormLoop { client := llm.NewClient(cfg) toolReg := tools.NewRegistry(cfg.BraveAPIKey) - o := &StormOrchestrator{ + o := &StormLoop{ bus: bus, appConfig: cfg, client: client, @@ -94,7 +100,7 @@ type StormResult struct { } // Research executes the STORM research workflow -func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormResult, error) { +func (o *StormLoop) Research(ctx context.Context, query string) (*StormResult, error) { startTime := time.Now() // Reset context manager for new session @@ -102,7 +108,7 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR var totalCost session.CostBreakdown - // Emit start event + // Emit start event so UI/telemetry can attach. o.bus.Publish(events.Event{ Type: events.EventResearchStarted, Timestamp: time.Now(), @@ -112,7 +118,8 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR }, }) - // 1. Enhanced Perspective Discovery (with related topic survey) + // 1. Enhanced Perspective Discovery (with related topic survey). + // Single LLM-driven discoverer that surfaces 3-5 perspectives + starter questions. o.bus.Publish(events.Event{ Type: events.EventQueryAnalyzed, Timestamp: time.Now(), @@ -155,7 +162,8 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR default: } - // 2. Execute Conversations (parallel per perspective) + // 2. Execute Conversations (parallel per perspective). + // Each perspective runs a WikiWriter↔TopicExpert loop with search/tooling until done. conversations, err := o.executeConversationPhase(ctx, query, perspectives) if err != nil { return nil, fmt.Errorf("conversation phase: %w", err) @@ -173,7 +181,8 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR default: } - // 3. Run analysis on facts from conversations + // 3. Run analysis on facts from conversations. + // Cross-validate facts across perspectives to surface contradictions and gaps. allFacts := agents.GetAllFacts(conversations) expectedCoverage := planning.CollectQuestions(perspectives) @@ -218,7 +227,8 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR default: } - // 4. Synthesize report using two-phase outline and conversations + // 4. Synthesize report using two-phase outline and conversations. + // Outline draft -> outline refine -> full report with citations. o.bus.Publish(events.Event{ Type: events.EventSynthesisStarted, Timestamp: time.Now(), @@ -260,11 +270,14 @@ func (o *StormOrchestrator) Research(ctx context.Context, query string) (*StormR // executeConversationPhase runs parallel conversations for all perspectives. // This is the core STORM mechanism - simulated WikiWriter↔TopicExpert dialogues. -func (o *StormOrchestrator) executeConversationPhase( +func (o *StormLoop) executeConversationPhase( ctx context.Context, topic string, perspectives []planning.Perspective, ) (map[string]*agents.ConversationResult, error) { + // For each perspective we spin a goroutine that runs the WikiWriter<->TopicExpert loop + // until the TopicExpert closes the dialog (“Thank you for your help!”). + // Results are collected via channel to avoid holding locks inside the agent loop. results := make(map[string]*agents.ConversationResult) var mu sync.Mutex @@ -348,10 +361,11 @@ func (o *StormOrchestrator) executeConversationPhase( return results, nil } -func (o *StormOrchestrator) emitCostEvent(scope string, cost session.CostBreakdown) { +func (o *StormLoop) emitCostEvent(scope string, cost session.CostBreakdown) { if o.bus == nil || cost.TotalTokens == 0 { return } + // Emit per-scope cost so telemetry/UX can display live burn-down. o.bus.Publish(events.Event{ Type: events.EventCostUpdated, Timestamp: time.Now(), diff --git a/go-research/internal/orchestrator/deep_storm_test.go b/go-research/internal/architectures/storm/loop_test.go similarity index 91% rename from go-research/internal/orchestrator/deep_storm_test.go rename to go-research/internal/architectures/storm/loop_test.go index 14d89cc..493e503 100644 --- a/go-research/internal/orchestrator/deep_storm_test.go +++ b/go-research/internal/architectures/storm/loop_test.go @@ -1,4 +1,4 @@ -package orchestrator +package storm import ( "context" @@ -115,21 +115,21 @@ func stormTestConfig() *config.Config { } } -// TestStormOrchestratorCreation tests that the orchestrator can be created -func TestStormOrchestratorCreation(t *testing.T) { +// TestStormLoopCreation tests that the orchestrator can be created +func TestStormLoopCreation(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(100) defer bus.Close() - orch := NewStormOrchestrator(bus, cfg) + orch := NewStormLoop(bus, cfg) if orch == nil { t.Fatal("expected orchestrator to be created") } } -// TestStormOrchestratorWithOptions tests option injection -func TestStormOrchestratorWithOptions(t *testing.T) { +// TestStormLoopWithOptions tests option injection +func TestStormLoopWithOptions(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(100) defer bus.Close() @@ -137,7 +137,7 @@ func TestStormOrchestratorWithOptions(t *testing.T) { mockClient := newStormMockLLMClient() mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockClient), ) @@ -150,8 +150,8 @@ func TestStormOrchestratorWithOptions(t *testing.T) { } } -// TestStormOrchestratorFullWorkflow tests the complete STORM workflow -func TestStormOrchestratorFullWorkflow(t *testing.T) { +// TestStormLoopFullWorkflow tests the complete STORM workflow +func TestStormLoopFullWorkflow(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -220,7 +220,7 @@ func TestStormOrchestratorFullWorkflow(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -269,8 +269,8 @@ func TestStormOrchestratorFullWorkflow(t *testing.T) { } } -// TestStormOrchestratorParallelConversations tests that conversations run in parallel -func TestStormOrchestratorParallelConversations(t *testing.T) { +// TestStormLoopParallelConversations tests that conversations run in parallel +func TestStormLoopParallelConversations(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -304,7 +304,7 @@ func TestStormOrchestratorParallelConversations(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -345,8 +345,8 @@ collectLoop: } } -// TestStormOrchestratorEmitsEvents tests that all expected events are emitted -func TestStormOrchestratorEmitsEvents(t *testing.T) { +// TestStormLoopEmitsEvents tests that all expected events are emitted +func TestStormLoopEmitsEvents(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -377,7 +377,7 @@ func TestStormOrchestratorEmitsEvents(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -422,8 +422,8 @@ collectLoop: } } -// TestStormOrchestratorContextCancellation tests cancellation handling -func TestStormOrchestratorContextCancellation(t *testing.T) { +// TestStormLoopContextCancellation tests cancellation handling +func TestStormLoopContextCancellation(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(100) defer bus.Close() @@ -435,7 +435,7 @@ func TestStormOrchestratorContextCancellation(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -450,8 +450,8 @@ func TestStormOrchestratorContextCancellation(t *testing.T) { } } -// TestStormOrchestratorTwoPhaseOutline tests that outline is generated in two phases -func TestStormOrchestratorTwoPhaseOutline(t *testing.T) { +// TestStormLoopTwoPhaseOutline tests that outline is generated in two phases +func TestStormLoopTwoPhaseOutline(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -479,7 +479,7 @@ func TestStormOrchestratorTwoPhaseOutline(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -523,8 +523,8 @@ collectLoop: } } -// TestStormOrchestratorConversationDataInResult tests that conversation data is in result -func TestStormOrchestratorConversationDataInResult(t *testing.T) { +// TestStormLoopConversationDataInResult tests that conversation data is in result +func TestStormLoopConversationDataInResult(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -550,7 +550,7 @@ func TestStormOrchestratorConversationDataInResult(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -587,8 +587,8 @@ func TestStormOrchestratorConversationDataInResult(t *testing.T) { } } -// TestStormOrchestratorAnalysisUsesConversationFacts tests that analysis uses facts from conversations -func TestStormOrchestratorAnalysisUsesConversationFacts(t *testing.T) { +// TestStormLoopAnalysisUsesConversationFacts tests that analysis uses facts from conversations +func TestStormLoopAnalysisUsesConversationFacts(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -619,7 +619,7 @@ func TestStormOrchestratorAnalysisUsesConversationFacts(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -665,8 +665,8 @@ collectLoop: // The key test is that the analysis phase ran } -// TestStormOrchestratorPerspectiveWithBasicFactWriter tests that Basic Fact Writer is included -func TestStormOrchestratorPerspectiveWithBasicFactWriter(t *testing.T) { +// TestStormLoopPerspectiveWithBasicFactWriter tests that Basic Fact Writer is included +func TestStormLoopPerspectiveWithBasicFactWriter(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -688,7 +688,7 @@ func TestStormOrchestratorPerspectiveWithBasicFactWriter(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -716,8 +716,8 @@ func TestStormOrchestratorPerspectiveWithBasicFactWriter(t *testing.T) { } } -// TestStormOrchestratorCostAccumulation tests that costs are accumulated from all phases -func TestStormOrchestratorCostAccumulation(t *testing.T) { +// TestStormLoopCostAccumulation tests that costs are accumulated from all phases +func TestStormLoopCostAccumulation(t *testing.T) { cfg := stormTestConfig() bus := events.NewBus(200) defer bus.Close() @@ -734,7 +734,7 @@ func TestStormOrchestratorCostAccumulation(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) @@ -777,7 +777,7 @@ func TestStormResultStructure(t *testing.T) { mockTools := newStormMockToolExecutor() - orch := NewStormOrchestrator(bus, cfg, + orch := NewStormLoop(bus, cfg, WithStormTools(mockTools), WithStormClient(mockLLM), ) diff --git a/go-research/internal/architectures/storm/storm.go b/go-research/internal/architectures/storm/storm.go index accf227..bd58bb7 100644 --- a/go-research/internal/architectures/storm/storm.go +++ b/go-research/internal/architectures/storm/storm.go @@ -49,7 +49,6 @@ import ( "go-research/internal/config" "go-research/internal/events" "go-research/internal/llm" - "go-research/internal/orchestrator" "go-research/internal/tools" ) @@ -59,6 +58,7 @@ type Architecture struct { config *config.Config client llm.ChatClient tools tools.ToolExecutor + loop *StormLoop } // Config holds configuration for the STORM architecture. @@ -71,12 +71,21 @@ type Config struct { // New creates a new STORM architecture instance. func New(cfg Config) *Architecture { - return &Architecture{ + a := &Architecture{ bus: cfg.Bus, config: cfg.AppConfig, client: cfg.Client, tools: cfg.Tools, } + opts := []StormLoopOption{} + if cfg.Client != nil { + opts = append(opts, WithStormClient(cfg.Client)) + } + if cfg.Tools != nil { + opts = append(opts, WithStormTools(cfg.Tools)) + } + a.loop = NewStormLoop(cfg.Bus, cfg.AppConfig, opts...) + return a } // Name returns the architecture identifier. @@ -102,20 +111,19 @@ func (a *Architecture) SupportsResume() bool { func (a *Architecture) Research(ctx context.Context, sessionID string, query string) (*architectures.Result, error) { startTime := time.Now() - // Build orchestrator options - opts := []orchestrator.StormOrchestratorOption{} - if a.client != nil { - opts = append(opts, orchestrator.WithStormClient(a.client)) - } - if a.tools != nil { - opts = append(opts, orchestrator.WithStormTools(a.tools)) + if a.loop == nil { + opts := []StormLoopOption{} + if a.client != nil { + opts = append(opts, WithStormClient(a.client)) + } + if a.tools != nil { + opts = append(opts, WithStormTools(a.tools)) + } + a.loop = NewStormLoop(a.bus, a.config, opts...) } - // Create the STORM orchestrator (conversation-based) - orch := orchestrator.NewStormOrchestrator(a.bus, a.config, opts...) - // Execute the full STORM workflow - stormResult, err := orch.Research(ctx, query) + stormResult, err := a.loop.Research(ctx, query) if err != nil { return &architectures.Result{ SessionID: sessionID, @@ -138,7 +146,7 @@ func (a *Architecture) Resume(ctx context.Context, sessionID string) (*architect } // convertResult transforms StormResult into the standard architectures.Result format. -func (a *Architecture) convertResult(sessionID string, sr *orchestrator.StormResult, startTime time.Time) *architectures.Result { +func (a *Architecture) convertResult(sessionID string, sr *StormResult, startTime time.Time) *architectures.Result { result := &architectures.Result{ SessionID: sessionID, Query: sr.Topic, diff --git a/go-research/internal/architectures/think_deep/DIFFUSION_DR.md b/go-research/internal/architectures/think_deep/DIFFUSION_DR.md new file mode 100644 index 0000000..6839eb4 --- /dev/null +++ b/go-research/internal/architectures/think_deep/DIFFUSION_DR.md @@ -0,0 +1,1028 @@ +# Diffusion Deep Research: A Test-Time Denoising Approach + +**The paradigm shift from single-pass generation to iterative refinement for AI research agents** + +--- + +## Abstract + +Diffusion Deep Research represents a fundamental architectural shift in how AI systems approach complex research tasks. Rather than generating research reports in a single pass, this approach models the entire research process as a **diffusion process**—starting with a "noisy" initial draft and iteratively "denoising" it through cycles of information retrieval, reasoning, and revision. + +This document provides a comprehensive technical overview of the Diffusion Deep Research architecture, its theoretical foundations, implementation details based on the **ThinkDepth.ai reference implementation**, and the benefits that make it the current state-of-the-art approach for autonomous research agents. + +--- + +## Table of Contents + +1. [Introduction: Why Diffusion for Research?](#introduction-why-diffusion-for-research) +2. [Theoretical Foundations](#theoretical-foundations) +3. [Core Architecture](#core-architecture) +4. [The Diffusion Algorithm](#the-diffusion-algorithm) +5. [The Self-Balancing Principle](#the-self-balancing-principle) +6. [Implementation: Complete Reference](#implementation-complete-reference) +7. [Quality Rules: Insightfulness & Helpfulness](#quality-rules-insightfulness--helpfulness) +8. [Benefits and Benchmarks](#benefits-and-benchmarks) +9. [Context Engineering Considerations](#context-engineering-considerations) +10. [References and Further Reading](#references-and-further-reading) + +--- + +## Introduction: Why Diffusion for Research? + +### The Problem with Single-Pass Research + +Traditional AI research agents follow a linear paradigm: + +``` +Query → Search → Synthesize → Report +``` + +This approach suffers from several fundamental limitations: + +1. **Information Loss**: Important context discovered late in the process cannot influence earlier decisions +2. **No Self-Correction**: Errors or gaps in early research propagate to the final output +3. **Static Search Strategy**: The search strategy is fixed at the start and cannot adapt to findings +4. **Coherence Degradation**: Long reports lose coherence when sections are generated independently + +### The Diffusion Paradigm + +Diffusion models, originally developed for image generation, provide an elegant solution. Instead of generating content in one pass, they: + +1. Start with a **noisy initial state** (random noise for images, rough draft for research) +2. **Iteratively refine** through multiple denoising steps +3. Use **guidance signals** to steer the refinement (class labels for images, research findings for reports) + +> "The iterative nature of diffusion models naturally mirrors how humans actually conduct research—cycles of searching, reasoning, and revision." +> — _Google Research, Deep Researcher with Test-Time Diffusion, 2025_ [1] + +This insight led to the development of **Test-Time Diffusion Deep Research (TTD-DR)**, which applies diffusion principles to research report generation. + +--- + +## Theoretical Foundations + +### Classical Diffusion Models + +In classical diffusion models (e.g., DDPM, Stable Diffusion), the process consists of: + +**Forward Diffusion**: Gradually add noise to data + +``` +x₀ → x₁ → x₂ → ... → xₜ (pure noise) +``` + +**Reverse Diffusion**: Learn to denoise step by step + +``` +xₜ → xₜ₋₁ → ... → x₁ → x₀ (clean data) +``` + +### Adaptation to Research + +For research report generation, we reinterpret this process: + +| Classical Diffusion | Research Diffusion | +| ------------------- | --------------------------------------- | +| Random noise (xₜ) | Initial draft from model knowledge | +| Denoising step | Research iteration + draft refinement | +| Guidance signal | Retrieved information from web search | +| Clean output (x₀) | Comprehensive, accurate research report | + +The key insight is that the **initial draft** generated purely from the LLM's training data represents the "noisy" starting state. Each iteration of: + +1. Identifying gaps in the draft +2. Searching for information to fill those gaps +3. Incorporating findings into the draft + +...acts as a **denoising step** that brings the report closer to ground truth. + +### Mathematical Formulation + +Let: + +- `D₀` = Initial draft (from LLM training data only) +- `Dₜ` = Draft at iteration t +- `R(Dₜ)` = Research function that identifies gaps and retrieves information +- `U(Dₜ, R(Dₜ))` = Update function that incorporates research into draft + +The diffusion process becomes: + +``` +D₁ = U(D₀, R(D₀)) +D₂ = U(D₁, R(D₁)) +... +Dₙ = U(Dₙ₋₁, R(Dₙ₋₁)) +``` + +The process terminates when: + +- `R(Dₜ)` returns no new information (information gap closed) +- Maximum iterations reached +- Supervisor determines research is comprehensive + +--- + +## Core Architecture + +The ThinkDepth.ai implementation consists of **five primary phases**, orchestrated through a LangGraph state machine: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ PHASE 1: USER CLARIFICATION │ +│ (clarify_with_user) │ +│ │ +│ "Does the user's request need clarification?" │ +│ - Check for acronyms, abbreviations, unknown terms │ +│ - Assess if sufficient context exists │ +│ - Optional: ask clarifying question before proceeding │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ PHASE 2: RESEARCH BRIEF GENERATION │ +│ (write_research_brief) │ +│ │ +│ Transform conversation into detailed research brief: │ +│ • Maximize specificity and detail │ +│ • Handle unstated dimensions as open considerations │ +│ • Avoid unwarranted assumptions │ +│ • Distinguish research scope from user preferences │ +│ • Use first person ("I want to understand...") │ +│ • Specify priority sources │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ PHASE 3: INITIAL DRAFT GENERATION │ +│ (write_draft_report) │ +│ │ +│ Generate draft from LLM's INTERNAL KNOWLEDGE ONLY: │ +│ • NO external information retrieval yet │ +│ • Structured with proper headings (# ## ###) │ +│ • This is the "NOISY" initial state for diffusion │ +│ • May contain outdated or incomplete information │ +│ • Provides structure to guide subsequent research │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ PHASE 4: DIFFUSION LOOP │ +│ (supervisor_subgraph) │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────┐ │ +│ │ THE DIFFUSION ALGORITHM (per iteration): │ │ +│ │ │ │ +│ │ 1. Generate research questions to address GAPS in draft │ │ +│ │ 2. ConductResearch: retrieve external info for "denoising" │ │ +│ │ 3. refine_draft_report: remove "noise" (imprecision, │ │ +│ │ incompleteness) from draft │ │ +│ │ 4. Assess: Are findings comprehensive? (NOT draft appearance!) │ │ +│ └───────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Supervisor coordinates parallel sub-researchers: │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │ (max 3) │ +│ │ Topic A │ │ Topic B │ │ Topic C │ │ +│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ │ +│ └────────────────┼────────────────┘ │ +│ ▼ │ +│ refine_draft_report │ +│ │ │ +│ ▼ │ +│ Continue or ResearchComplete? │ +│ │ +│ Hard Limit: 15 iterations (think_tool + ConductResearch + refine) │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ PHASE 5: FINAL REPORT GENERATION │ +│ (final_report_generation) │ +│ │ +│ Apply quality optimization with Insightfulness + Helpfulness rules: │ +│ • Deduplicate findings by URL │ +│ • Granular breakdown with specific causes/impacts │ +│ • Detailed mapping tables for comparisons │ +│ • Nuanced discussion with explicit exploration │ +│ • Proper citation format: [1] Source Title: URL │ +│ • Language matching: output in same language as user input │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## The Diffusion Algorithm + +The core innovation is the **Self-Balancing Test-Time Diffusion** algorithm, encoded directly in the supervisor's system prompt. Here is the exact algorithm from the ThinkDepth.ai implementation: + +### The Four Steps + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ DIFFUSION ALGORITHM │ +│ │ +│ 1. GENERATE RESEARCH QUESTIONS │ +│ └─ Analyze draft report to identify information gaps │ +│ └─ Generate targeted questions to address those gaps │ +│ │ +│ 2. ConductResearch │ +│ └─ Retrieve external information to provide concrete "delta" │ +│ └─ This information is used for "denoising" the draft │ +│ └─ Execute in parallel when topics are independent │ +│ │ +│ 3. refine_draft_report │ +│ └─ Remove "noise" from draft (imprecision, incompleteness) │ +│ └─ Incorporate new findings with citations │ +│ └─ ALWAYS run after ConductResearch │ +│ │ +│ 4. ResearchComplete (or continue) │ +│ └─ CRITICAL: Base decision on FINDINGS completeness │ +│ └─ NOT on how polished the draft looks │ +│ └─ Run diverse research questions to verify no new findings │ +│ └─ For non-English: run additional verification round │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Critical Completion Criteria + +From the actual prompt (`lead_researcher_with_multiple_steps_diffusion_double_check_prompt`): + +> "**CompleteResearch**: complete research only based on ConductResearch tool's findings' completeness. **It should not be based on the draft report.** Even if the draft report looks complete, you should continue doing the research until all the research findings are collected. You know the research findings are complete by running ConductResearch tool to generate diverse research questions to see if you cannot find any new findings." + +This is the **self-balancing** aspect: the model determines when research is complete based on the information landscape, not the appearance of the output. + +### Supervisor Tools + +| Tool | Purpose | Invocation | +| --------------------- | -------------------------------- | --------------------------------------------------------------------- | +| `ConductResearch` | Delegate research to sub-agent | `{"research_topic": "Detailed paragraph..."}` | +| `refine_draft_report` | Update draft with new findings | `{"research_brief": "...", "findings": "...", "draft_report": "..."}` | +| `ResearchComplete` | Signal research is comprehensive | `{}` | +| `think_tool` | Strategic reflection | `{"reflection": "..."}` | + +### Sub-Researcher Tools + +| Tool | Purpose | Limits | +| --------------- | ------------------------------- | ----------------------------------------- | +| `tavily_search` | Web search with summarization | Simple: 2-3 calls, Complex: up to 5 calls | +| `think_tool` | Analyze results, plan next step | Use after each search | + +--- + +## The Self-Balancing Principle + +> "Self-balancing Agentic AI sets up self-balancing rules to guide the self-evolution of the agentic system. We can set up a group of rules and let the model decide how to apply them dynamically." +> — _Paichun Lin, ThinkDepth.ai_ [2] + +### Two-Stage Gap Closing + +The algorithm explicitly separates **information gap closing** from **generation gap closing**: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ TWO-STAGE GAP CLOSING │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ STAGE 1: INFORMATION GAP CLOSING │ │ +│ │ │ │ +│ │ Focus: Close information gap through external retrieval │ │ +│ │ Method: ConductResearch → refine_draft_report loop │ │ +│ │ Draft update: Incorporate facts, add citations │ │ +│ │ │ │ +│ │ Information Gap = What's needed - What we've retrieved │ │ +│ │ │ │ +│ │ Exit: Diverse research questions yield no new findings │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ STAGE 2: GENERATION GAP CLOSING │ │ +│ │ │ │ +│ │ Focus: Optimize output quality │ │ +│ │ Method: Apply full Insightfulness + Helpfulness rules │ │ +│ │ Draft update: Full rewrite, formatting, citation cleanup │ │ +│ │ │ │ +│ │ Generation Gap = Ideal output quality - Current output quality │ │ +│ │ │ │ +│ │ Exit: Quality rules satisfied, report finalized │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Why Separate the Stages? + +From the research: + +> "There is a trade-off between the two gaps. We cannot optimize the generation gap too early when the system is still optimizing the information gap because the generation gap tends to bring more verbose and stylistic content that can distract from finding missing information." +> — _Paichun Lin, ThinkDepth.ai_ [2] + +**Stage 1 characteristics:** + +- Focus on **what** information exists, not **how** to present it +- Draft updates are functional, not polished +- Prioritizes breadth of coverage +- Uses global-context OR section-specific queries based on gap analysis + +**Stage 2 characteristics:** + +- All information is available +- Focus on presentation, coherence, and user satisfaction +- Applies full quality optimization +- Generates final deliverable with proper citations + +--- + +## Implementation: Complete Reference + +### State Definitions + +From `state_multi_agent_supervisor.py`: + +```python +class SupervisorState(TypedDict): + """State for the multi-agent research supervisor.""" + + # Messages exchanged with supervisor for coordination + supervisor_messages: Annotated[Sequence[BaseMessage], add_messages] + + # Detailed research brief that guides the overall direction + research_brief: str + + # Processed notes ready for final report generation + notes: Annotated[list[str], operator.add] + + # Counter tracking research iterations performed + research_iterations: int # Max: 15 + + # Raw unprocessed research notes from sub-agents + raw_notes: Annotated[list[str], operator.add] + + # The evolving draft report + draft_report: str +``` + +### Tool Definitions + +```python +@tool +class ConductResearch(BaseModel): + """Tool for delegating a research task to a specialized sub-agent.""" + research_topic: str = Field( + description="The topic to research. Should be a single topic, " + "and should be described in high detail (at least a paragraph)." + ) + +@tool +class ResearchComplete(BaseModel): + """Tool for indicating that the research process is complete.""" + pass + +@tool +def think_tool(reflection: str) -> str: + """Tool for strategic reflection on research progress and decision-making. + + Use this tool after each search to analyze results and plan next steps. + + Reflection should address: + 1. Analysis of current findings + 2. Gap assessment - what's still missing? + 3. Quality evaluation - sufficient evidence? + 4. Strategic decision - continue or conclude? + """ + return f"Reflection recorded: {reflection}" + +@tool +def refine_draft_report( + research_brief: str, + findings: str, + draft_report: str +) -> str: + """Refine draft report with new research findings. + + Synthesizes findings into draft, maintaining structure + and adding citations. + """ + # Uses report_generation_with_draft_insight_prompt + # Returns refined draft content +``` + +### Supervisor Node Implementation + +From `multi_agent_supervisor.py`: + +```python +# Configuration +max_researcher_iterations = 15 # Hard limit on tool calls +max_concurrent_researchers = 3 # Parallel sub-agents + +async def supervisor(state: SupervisorState) -> Command: + """Coordinate research activities. + + Analyzes the research brief and current progress to decide: + - What research topics need investigation + - Whether to conduct parallel research + - When research is complete + """ + supervisor_messages = state.get("supervisor_messages", []) + + # Format system prompt with diffusion algorithm + system_message = lead_researcher_with_multiple_steps_diffusion_double_check_prompt.format( + date=get_today_str(), + max_concurrent_research_units=max_concurrent_researchers, + max_researcher_iterations=max_researcher_iterations + ) + + messages = [SystemMessage(content=system_message)] + supervisor_messages + response = await supervisor_model_with_tools.ainvoke(messages) + + return Command( + goto="supervisor_tools", + update={ + "supervisor_messages": [response], + "research_iterations": state.get("research_iterations", 0) + 1 + } + ) + +async def supervisor_tools(state: SupervisorState) -> Command: + """Execute supervisor decisions.""" + + # Check exit criteria + exceeded_iterations = research_iterations >= max_researcher_iterations + research_complete = any( + tc["name"] == "ResearchComplete" + for tc in most_recent_message.tool_calls + ) + + if exceeded_iterations or research_complete: + return Command(goto=END, update={"notes": get_notes_from_tool_calls(...)}) + + # Separate tool call types + think_calls = [tc for tc in tool_calls if tc["name"] == "think_tool"] + research_calls = [tc for tc in tool_calls if tc["name"] == "ConductResearch"] + refine_calls = [tc for tc in tool_calls if tc["name"] == "refine_draft_report"] + + # Execute think_tool calls (synchronous) + for tool_call in think_calls: + observation = think_tool.invoke(tool_call["args"]) + tool_messages.append(ToolMessage(content=observation, ...)) + + # Execute ConductResearch calls (PARALLEL) + if research_calls: + coros = [ + researcher_agent.ainvoke({ + "researcher_messages": [HumanMessage(content=tc["args"]["research_topic"])], + "research_topic": tc["args"]["research_topic"] + }) + for tc in research_calls + ] + results = await asyncio.gather(*coros) # Parallel execution! + + # Each sub-agent returns compressed_research + for result, tc in zip(results, research_calls): + tool_messages.append(ToolMessage( + content=result.get("compressed_research", ""), + name=tc["name"], + tool_call_id=tc["id"] + )) + + # Execute refine_draft_report + for tc in refine_calls: + notes = get_notes_from_tool_calls(supervisor_messages) + draft_report = refine_draft_report.invoke({ + "research_brief": state.get("research_brief"), + "findings": "\n".join(notes), + "draft_report": state.get("draft_report") + }) + tool_messages.append(ToolMessage(content=draft_report, ...)) + + return Command( + goto="supervisor", + update={"supervisor_messages": tool_messages, "draft_report": draft_report} + ) +``` + +### Sub-Researcher Implementation + +From `research_agent.py`: + +```python +def llm_call(state: ResearcherState): + """Analyze state and decide next action.""" + return { + "researcher_messages": [ + model_with_tools.invoke( + [SystemMessage(content=research_agent_prompt)] + state["researcher_messages"] + ) + ] + } + +def tool_node(state: ResearcherState): + """Execute all tool calls from previous response.""" + tool_calls = state["researcher_messages"][-1].tool_calls + observations = [] + + for tool_call in tool_calls: + tool = tools_by_name[tool_call["name"]] + observations.append(tool.invoke(tool_call["args"])) + + return {"researcher_messages": [ + ToolMessage(content=obs, name=tc["name"], tool_call_id=tc["id"]) + for obs, tc in zip(observations, tool_calls) + ]} + +def compress_research(state: ResearcherState) -> dict: + """Compress research findings into concise summary for supervisor.""" + + # Use compression prompt to synthesize findings + response = compress_model.invoke([ + SystemMessage(content=compress_research_system_prompt.format(date=get_today_str())), + *state.get("researcher_messages", []), + HumanMessage(content=compress_research_human_message) + ]) + + return { + "compressed_research": str(response.content), + "raw_notes": ["\n".join(raw_notes)] + } + +def should_continue(state: ResearcherState) -> str: + """Route to tool execution or compression.""" + if state["researcher_messages"][-1].tool_calls: + return "tool_node" + return "compress_research" + +# Graph: START → llm_call ⇄ tool_node → compress_research → END +``` + +### Search Tool with Summarization + +From `utils.py`: + +```python +@tool +def tavily_search(query: str, max_results: int = 3, topic: str = "general") -> str: + """Fetch results from Tavily search API with content summarization.""" + + # Execute search + search_results = tavily_search_multiple([query], max_results, topic, include_raw_content=True) + + # Deduplicate by URL + unique_results = deduplicate_search_results(search_results) + + # Summarize each result (using LLM) + summarized_results = process_search_results(unique_results) + + # Format output + return format_search_output(summarized_results) + +def summarize_webpage_content(webpage_content: str) -> str: + """Summarize webpage to ~25-30% of original length.""" + + structured_model = summarization_model.with_structured_output(Summary) + + summary = structured_model.invoke([ + HumanMessage(content=summarize_webpage_prompt.format( + webpage_content=webpage_content, + date=get_today_str() + )) + ]) + + return f"\n{summary.summary}\n\n\n" \ + f"\n{summary.key_excerpts}\n" +``` + +--- + +## Quality Rules: Insightfulness & Helpfulness + +The final report generation applies explicit quality rules. From `prompts.py`: + +### Insightfulness Rules + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ INSIGHTFULNESS RULES │ +│ │ +│ 1. GRANULAR BREAKDOWN │ +│ └─ Break down topics into specific causes and specific impacts │ +│ └─ Don't generalize - be concrete │ +│ │ +│ 2. DETAILED MAPPING TABLE │ +│ └─ Create tables mapping causes to effects │ +│ └─ Use for comparisons and summaries │ +│ │ +│ 3. NUANCED DISCUSSION │ +│ └─ Detailed exploration of the topic │ +│ └─ Explicit discussion of edge cases and limitations │ +│ └─ Don't oversimplify - clarify ambiguity │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Helpfulness Rules + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ HELPFULNESS RULES │ +│ │ +│ 1. SATISFYING USER INTENT │ +│ └─ Does the response directly address the user's request? │ +│ │ +│ 2. EASE OF UNDERSTANDING │ +│ └─ Is the response fluent, coherent, and logically structured? │ +│ │ +│ 3. ACCURACY │ +│ └─ Are the facts, reasoning, and explanations correct? │ +│ │ +│ 4. APPROPRIATE LANGUAGE │ +│ └─ Is the tone suitable and professional? │ +│ └─ Avoid unnecessary jargon or confusing phrasing │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Section Writing Guidelines + +From the actual prompt: + +``` +For each section of the report: +- Have an explicit discussion in simple, clear language +- DO NOT oversimplify. Clarify when a concept is ambiguous. +- DO NOT list facts in bullet points. Write in paragraph form. +- If there are theoretical frameworks, provide detailed application +- For comparison and conclusion, include a summary table +- Use ## for section titles (Markdown format) +- Do NOT refer to yourself as the writer +- Each section should be as long as necessary to deeply answer the question +``` + +### Citation Rules + +These rules directly support the **FACT evaluation metrics** (Citation Accuracy and Effective Citations) used in DeepResearch Bench [4]: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ CITATION FORMAT │ +│ │ +│ In-text: Use [1], [2], etc. for inline citations │ +│ │ +│ Sources section: │ +│ ### Sources │ +│ [1] Source Title: URL │ +│ [2] Source Title: URL │ +│ ... │ +│ │ +│ IMPORTANT: │ +│ - Number sources sequentially without gaps (1,2,3,4...) │ +│ - Each source on a separate line │ +│ - Include URL only in Sources section │ +│ - Citations are extremely important - users rely on them │ +│ │ +│ FACT Evaluation Connection: │ +│ - FACT extracts statement-URL pairs to verify citations │ +│ - Citation Accuracy = supported citations / total citations │ +│ - Effective Citations = avg verifiably supported citations per task │ +│ - Clean format enables accurate automated extraction │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Benefits and Benchmarks + +### DeepResearch Bench: The Evaluation Standard + +**DeepResearch Bench** is the comprehensive benchmark for evaluating Deep Research Agents (DRAs). It consists of **100 PhD-level research tasks** (50 Chinese, 50 English) designed by domain experts across Science & Technology, Finance & Business, Software, and other fields [4]. + +The benchmark uses two complementary evaluation frameworks: + +#### 🎯 RACE Framework (Reference-based Adaptive Criteria-driven Evaluation) + +RACE evaluates report generation quality through four key dimensions: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ RACE EVALUATION METRICS │ +│ │ +│ 📚 COMPREHENSIVENESS │ +│ Coverage breadth and depth of the research topic │ +│ → Directly measures how well the INFORMATION GAP was closed │ +│ │ +│ 🔍 INSIGHT / DEPTH │ +│ Quality, originality, logic, and value of analysis │ +│ → Measures analytical quality beyond surface-level facts │ +│ │ +│ 📋 INSTRUCTION FOLLOWING │ +│ Adherence to task requirements and constraints │ +│ → Measures alignment with user intent │ +│ │ +│ 📖 READABILITY │ +│ Clarity of structure, fluency, and ease of understanding │ +│ → Measures GENERATION GAP closing (final polish) │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +**Scoring Formula:** + +``` +Total Score = (Comprehensiveness × W₁) + (Insight × W₂) + + (Instruction Following × W₃) + (Readability × W₄) + +Where: W₁ + W₂ + W₃ + W₄ = 1.0 (dynamic weights per task) +``` + +RACE uses **reference-based scoring**, comparing generated reports against high-quality reference reports created by PhD-level experts. This ensures discriminative and reliable evaluation validated against human expert judgments. + +#### 🔗 FACT Framework (Factual Abundance and Citation Trustworthiness) + +FACT evaluates information retrieval and grounding capabilities: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ FACT EVALUATION PROCESS │ +│ │ +│ 1. Statement-URL Extraction │ +│ └─ Automatically extract factual claims and their cited sources │ +│ │ +│ 2. Deduplication │ +│ └─ Remove redundant statement-URL pairs │ +│ │ +│ 3. Support Verification │ +│ └─ Web scraping + LLM judgment to verify citations │ +│ │ +│ 4. Metrics Calculation │ +│ └─ Citation Accuracy: % of correctly supported citations │ +│ └─ Effective Citations: Avg verified citations per task │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### How Diffusion Optimizes for These Metrics + +The Test-Time Diffusion approach directly targets the RACE metrics: + +| RACE Metric | Diffusion Mechanism | Implementation | +| ------------------------- | ---------------------------------------- | --------------------------------------- | +| **Comprehensiveness** | Iterative gap-filling loop | `ConductResearch` until no new findings | +| **Insight** | Insightfulness Rules in final generation | Granular breakdown, detailed tables | +| **Instruction Following** | Research brief generation phase | `write_research_brief` from user query | +| **Readability** | Helpfulness Rules + structured draft | `refine_draft_report` + final polish | + +For FACT metrics: + +- **Citation Accuracy**: Each sub-researcher includes inline citations +- **Effective Citations**: Compression preserves ALL sources with citations + +### Empirical Results + +ThinkDepth.ai, implementing the Self-Balancing Test-Time Diffusion algorithm, achieved **#1 ranking on DeepResearch Bench** (November 2025) [4]: + +| Rank | System | Overall Score | vs ThinkDepth | +| ---- | --------------------- | ------------- | ------------- | +| 🥇 | **ThinkDepth.ai** | **52.58** | — | +| 🥈 | Google Gemini 2.5 Pro | 51.16 | -2.78% | +| 🥉 | OpenAI Deep Research | 49.58 | -6.04% | +| 4 | Anthropic Claude | 48.83 | -7.45% | + +_Source: [DeepResearch Bench Leaderboard](https://huggingface.co/spaces/muset-ai/DeepResearch-Bench-Leaderboard), November 2025_ + +### Component-Level Analysis + +**Comprehensiveness Score** (Information Gap Closing): + +| System | Score | vs ThinkDepth | Why Diffusion Wins | +| --------------------- | ----- | ------------- | ------------------------------------------ | +| ThinkDepth.ai | 52.03 | — | Iterative research until findings complete | +| Google Gemini 2.5 Pro | 50.50 | -3.02% | | +| OpenAI Deep Research | 49.29 | -5.57% | | +| Anthropic Claude | 48.36 | -7.58% | | + +**Insight Score** (Quality of Analysis): + +| System | Score | vs ThinkDepth | Why Diffusion Wins | +| --------------------- | ----- | ------------- | --------------------------------------- | +| ThinkDepth.ai | 53.94 | — | Insightfulness Rules + draft refinement | +| Google Gemini 2.5 Pro | 51.62 | -4.49% | | +| OpenAI Deep Research | 48.94 | -10.21% | | +| Anthropic Claude | 48.79 | -10.54% | | + +**Instruction Following Score**: + +| System | Score | vs ThinkDepth | Why Diffusion Wins | +| --------------------- | ----- | ------------- | ----------------------------- | +| ThinkDepth.ai | 52.07 | — | Detailed research brief phase | +| Google Gemini 2.5 Pro | 51.07 | -1.95% | | +| OpenAI Deep Research | 50.67 | -2.68% | | +| Anthropic Claude | 49.67 | -4.61% | | + +**Readability Score**: + +| System | Score | vs ThinkDepth | Why Diffusion Wins | +| --------------------- | ----- | ------------- | ------------------------------------ | +| ThinkDepth.ai | 50.44 | — | Helpfulness Rules + structured draft | +| Google Gemini 2.5 Pro | 50.22 | -0.44% | | +| OpenAI Deep Research | 48.82 | -3.22% | | +| Anthropic Claude | 48.31 | -4.22% | | + +### Why Diffusion Outperforms + +1. **Iterative Refinement Catches Gaps** → Higher Comprehensiveness + - Each iteration identifies and fills missing information + - Traditional single-pass cannot self-correct + - Exit based on findings completeness, not draft appearance + +2. **Parallel Execution is Efficient** → Better Coverage + - Up to 3 sub-researchers gather diverse perspectives simultaneously + - Uses `asyncio.gather()` for true parallel execution + - Isolated contexts prevent cross-contamination + +3. **Explicit Completion Criteria** → Validated Comprehensiveness + - Research ends based on **findings comprehensiveness** + - "Run diverse research questions to see if you cannot find new findings" + - NOT based on draft appearance (which can be misleading) + +4. **Self-Balancing Adaptivity** → Right-Sized Research + - Simple topics: 2-3 iterations + - Complex topics: 10+ iterations as needed + - Model dynamically adjusts to task complexity + +5. **Draft as Context Anchor** → Higher Readability & Coherence + - Draft serves as persistent context across iterations + - Findings accumulate rather than being lost + - Reduces the "lost in the middle" problem [5] + +6. **Quality Rules in Final Generation** → Higher Insight Scores + - Insightfulness Rules: Granular breakdown, detailed tables, nuanced discussion + - Helpfulness Rules: User intent, clarity, accuracy, professional language + - Citation Rules: Proper attribution for verifiable facts + +--- + +## Context Engineering Considerations + +### The Context Problem + +Long-horizon research tasks face several context challenges [6]: + +| Problem | Description | Diffusion Solution | +| ----------------------- | ------------------------------------- | ----------------------------------------------- | +| **Context Poisoning** | Hallucinations enter context | Draft serves as verified state | +| **Context Distraction** | Too much context overwhelms focus | Parallel sub-agents with isolated contexts | +| **Context Confusion** | Superfluous context influences output | Structured finding format with compression | +| **Context Clash** | Parts of context disagree | Supervisor resolves conflicts during refinement | + +### Draft as Context Anchor + +The draft serves as a **persistent, verified context** that: + +1. **Evolves incrementally**: Each `refine_draft_report` call validated +2. **Structures information**: Prevents disorganized accumulation +3. **Guides research**: Makes gaps explicit +4. **Maintains coherence**: Narrative thread across iterations + +``` +Traditional RAG: Diffusion Approach: + +Query → Search → Response Query → Brief → Draft → [Research → Refine] × N → Report + +Context grows unboundedly Draft stays ~constant size +No structure Structured by sections +Can contradict itself Conflicts resolved each iteration +``` + +### Multi-Agent Context Isolation + +Sub-researchers operate with **isolated contexts**—they cannot see each other's work: + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ SUPERVISOR CONTEXT │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ - Full research brief │ │ +│ │ - Current draft │ │ +│ │ - Compressed findings from all sub-agents (via ToolMessages) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────┼──────────────────────┐ │ +│ ▼ ▼ ▼ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Sub-1 │ │ Sub-2 │ │ Sub-3 │ │ +│ │ │ │ │ │ │ │ +│ │ Isolated │ │ Isolated │ │ Isolated │ │ +│ │ Context │ │ Context │ │ Context │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ +│ │ +│ "When calling ConductResearch, provide complete standalone │ +│ instructions - sub-agents can't see other agents' work" │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +This prevents: + +- Topic A's findings from biasing Topic B's research +- Context growing unboundedly during parallel work +- Confusion from interleaved search results + +### Research Compression + +Each sub-agent compresses its findings before returning to supervisor: + +```python +def compress_research(state: ResearcherState) -> dict: + """Compress research findings into concise summary.""" + + # Compression guidelines: + # - Include ALL relevant information verbatim + # - Remove obviously irrelevant or duplicate info + # - Include inline citations for each source + # - List all sources at the end with citations + # - Exclude think_tool reflections (internal only) + + return { + "compressed_research": compressed_content, + "raw_notes": raw_notes + } +``` + +--- + +## Configuration Reference + +```python +# Supervisor Configuration +max_researcher_iterations = 15 # Total supervisor tool calls +max_concurrent_researchers = 3 # Parallel sub-agents + +# Sub-Researcher Configuration +# Simple queries: 2-3 search calls max +# Complex queries: up to 5 search calls max +# Always stop after 5 if can't find sources + +# Search Configuration +max_results_per_query = 3 +include_raw_content = True # For summarization +max_context_length = 250000 + +# Models (from ThinkDepth.ai implementation) +supervisor_model = "openai:gpt-5" +researcher_model = "openai:gpt-5" +summarization_model = "openai:gpt-5" +compress_model = "openai:gpt-5" # max_tokens=32000 +writer_model = "openai:gpt-5" # max_tokens=40000 +``` + +--- + +## References and Further Reading + +### Primary Sources + +[1] **Google Research** (2025). _Deep Researcher with Test-Time Diffusion_. Google Research Blog. +https://research.google/blog/deep-researcher-with-test-time-diffusion/ + +[2] **Paichun Lin** (2025). _Self-Balancing Agentic AI: Test-Time Diffusion and Context Engineering Re-imagined for Deep Research_. Paichun Publication, Substack. +https://paichunlin.substack.com/p/self-balancing-agentic-ai-test-time + +[3] **Richard Sutton** (2019). _The Bitter Lesson_. Incomplete Ideas Blog. +http://www.incompleteideas.net/IncIdeas/BitterLesson.html + +[4] **DeepResearch Bench** (2025). _A Comprehensive Benchmark for Deep Research Agents_. + +- **Leaderboard**: https://huggingface.co/spaces/muset-ai/DeepResearch-Bench-Leaderboard +- **GitHub**: https://github.com/Ayanami0730/deep_research_bench +- **Paper**: https://deepresearch-bench.github.io/static/papers/deepresearch-bench.pdf +- **Website**: https://deepresearch-bench.github.io/ + +Evaluation Frameworks: + +- **RACE**: Reference-based Adaptive Criteria-driven Evaluation (Comprehensiveness, Insight, Instruction Following, Readability) +- **FACT**: Framework for Factual Abundance and Citation Trustworthiness (Citation Accuracy, Effective Citations) + +[5] **Liu et al.** (2023). _Lost in the Middle: How Language Models Use Long Contexts_. arXiv:2307.03172 + +[6] **Anthropic** (2025). _Context Engineering for AI Agents_. Anthropic Research. + +### Reference Implementation + +- **ThinkDepth.ai** - Production implementation + https://thinkdepth.ai + +- **ThinkDepth.ai GitHub** - Open source reference implementation (Python) + https://github.com/thinkdepthai/Deep_Research + +### Key Implementation Files + +| File | Purpose | +| --------------------------------- | -------------------------------------------------- | +| `multi_agent_supervisor.py` | Supervisor node, parallel research execution | +| `research_agent.py` | Sub-researcher implementation | +| `prompts.py` | All prompt templates including diffusion algorithm | +| `state_multi_agent_supervisor.py` | SupervisorState, tool definitions | +| `state_research.py` | ResearcherState definitions | +| `utils.py` | Search, summarization, tools | +| `research_agent_full.py` | Full workflow orchestration | +| `research_agent_scope.py` | Brief generation, draft creation | + +--- + +_This document is based on the ThinkDepth.ai open-source implementation. For Go implementation details, see the adjacent README.md and source files._ diff --git a/go-research/internal/architectures/think_deep/README.md b/go-research/internal/architectures/think_deep/README.md index 8e21dd1..154ef89 100644 --- a/go-research/internal/architectures/think_deep/README.md +++ b/go-research/internal/architectures/think_deep/README.md @@ -41,7 +41,7 @@ User Query: "Compare AI safety approaches of OpenAI, Anthropic, and DeepMind" | |Sub-Res #1 | |Sub-Res #2 | |Sub-Res #3 | <- Parallel execution | | |"OpenAI | |"Anthropic | |"DeepMind | | | | safety" | | approach" | | research" | | -| +-----------+ +-----------+ +-----------+ | +| +-----------+ +-----------+ +-----------+ | | \ | / | | \ | / | | v v v | @@ -150,6 +150,7 @@ Instead of a single pass, ThinkDeep treats the initial draft as a **"noisy" stat ### Completion Criteria The supervisor calls `research_complete` when: + - Research findings are comprehensive (NOT when the draft "looks good") - Diverse research questions no longer yield new findings - Maximum iterations reached (hard limit: 15) @@ -172,6 +173,7 @@ The supervisor calls `research_complete` when: ``` **Key Guidelines:** + - Maximize specificity without inventing preferences - Handle unstated dimensions as open considerations - Distinguish research scope from user preferences @@ -217,6 +219,7 @@ This draft serves as the starting point that gets iteratively refined. ``` **Scaling Rules:** + - Simple fact-finding: 1 sub-agent - Comparisons: 1 sub-agent per element (e.g., 3 companies = 3 sub-agents) - Maximum 3 concurrent sub-researchers @@ -255,16 +258,16 @@ This draft serves as the starting point that gets iteratively refined. ## Key Files -| Component | File | Purpose | -|-----------|------|---------| -| **Orchestrator** | `internal/orchestrator/think_deep.go` | 4-phase workflow coordination | -| **Supervisor Agent** | `internal/agents/supervisor.go` | Diffusion loop, parallel research | -| **Sub-Researcher Agent** | `internal/agents/sub_researcher.go` | Focused search with limits | -| **State Definitions** | `internal/think_deep/state.go` | SupervisorState, ResearcherState | -| **Prompts** | `internal/think_deep/prompts.go` | All prompt templates | -| **Tools** | `internal/think_deep/tools.go` | Tool parsing and execution | -| **Content Summarizer** | `internal/tools/summarizer.go` | LLM-based page summarization | -| **Search Tool** | `internal/tools/search.go` | Brave API with summarization | +| Component | File | Purpose | +| ------------------------ | ------------------------------------- | --------------------------------- | +| **Orchestrator** | `internal/orchestrator/think_deep.go` | 4-phase workflow coordination | +| **Supervisor Agent** | `internal/agents/supervisor.go` | Diffusion loop, parallel research | +| **Sub-Researcher Agent** | `internal/agents/sub_researcher.go` | Focused search with limits | +| **State Definitions** | `internal/think_deep/state.go` | SupervisorState, ResearcherState | +| **Prompts** | `internal/think_deep/prompts.go` | All prompt templates | +| **Tools** | `internal/think_deep/tools.go` | Tool parsing and execution | +| **Content Summarizer** | `internal/tools/summarizer.go` | LLM-based page summarization | +| **Search Tool** | `internal/tools/search.go` | Brave API with summarization | --- @@ -272,19 +275,19 @@ This draft serves as the starting point that gets iteratively refined. ### Supervisor Tools -| Tool | Purpose | Usage | -|------|---------|-------| -| `conduct_research` | Delegate to sub-researcher | `{"research_topic": "Detailed paragraph..."}` | -| `refine_draft` | Incorporate findings into draft | `{}` | -| `think` | Strategic reflection | `{"reflection": "..."}` | -| `research_complete` | Signal completion | `{}` | +| Tool | Purpose | Usage | +| ------------------- | ------------------------------- | --------------------------------------------- | +| `conduct_research` | Delegate to sub-researcher | `{"research_topic": "Detailed paragraph..."}` | +| `refine_draft` | Incorporate findings into draft | `{}` | +| `think` | Strategic reflection | `{"reflection": "..."}` | +| `research_complete` | Signal completion | `{}` | ### Sub-Researcher Tools -| Tool | Purpose | Usage | -|------|---------|-------| -| `search` | Web search via Brave API | `{"query": "search terms"}` | -| `think` | Analyze results, plan next | `{"reflection": "..."}` | +| Tool | Purpose | Usage | +| -------- | -------------------------- | --------------------------- | +| `search` | Web search via Brave API | `{"query": "search terms"}` | +| `think` | Analyze results, plan next | `{"reflection": "..."}` | --- @@ -381,6 +384,7 @@ This implementation is based on the **Self-Balancing Test-Time Diffusion Deep Re Special thanks to **Paichun Lin** ([@paichunjimlin](https://www.linkedin.com/in/paichunjimlin), Stanford CS) for developing and open-sourcing this approach. ThinkDepth.ai achieved #1 ranking on [DeepResearch Bench](https://huggingface.co/spaces/muset-ai/DeepResearch-Bench-Leaderboard) when it was released, outperforming Google Gemini 2.5 Pro, OpenAI Deep Research, and Anthropic Claude Deep Research. For more technical details, see: + - [Self-Balancing Agentic AI Blog Post](https://paichunlin.substack.com/p/self-balancing-agentic-ai-test-time) - [ThinkDepth.ai GitHub Repository](https://github.com/thinkdepthai/Deep_Research) @@ -392,4 +396,4 @@ For more technical details, see: - **[ThinkDepth.ai GitHub](https://github.com/thinkdepthai/Deep_Research)** - Reference Python implementation - **[DeepResearch Bench](https://github.com/Ayanami0730/deep_research_bench)** - Evaluation benchmark - **[Self-Balancing Agentic AI](https://paichunlin.substack.com/p/self-balancing-agentic-ai-test-time)** - Algorithm deep-dive -- **[Google: Deep-Researcher with Test-Time Diffusion](https://research.google/blog/deep-researcher-with-test-time-diffusion/)** - Google Research blog post \ No newline at end of file +- **[Google: Deep-Researcher with Test-Time Diffusion](https://research.google/blog/deep-researcher-with-test-time-diffusion/)** - Google Research blog post diff --git a/go-research/internal/orchestrator/think_deep.go b/go-research/internal/architectures/think_deep/loop.go similarity index 75% rename from go-research/internal/orchestrator/think_deep.go rename to go-research/internal/architectures/think_deep/loop.go index f7c486e..41d62a0 100644 --- a/go-research/internal/orchestrator/think_deep.go +++ b/go-research/internal/architectures/think_deep/loop.go @@ -1,4 +1,4 @@ -package orchestrator +package think_deep import ( "context" @@ -7,25 +7,26 @@ import ( "time" "go-research/internal/agents" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/config" "go-research/internal/events" "go-research/internal/llm" "go-research/internal/session" - "go-research/internal/think_deep" "go-research/internal/tools" ) -// ThinkDeepOrchestrator coordinates ThinkDepth-style deep research using +// AgentLoop coordinates ThinkDepth-style deep research using // the "Self-Balancing Test-Time Diffusion" approach. // // The workflow consists of four phases: -// 1. Brief Generation - Transform user query into detailed research brief -// 2. Initial Draft - Generate draft from model's knowledge -// 3. Diffusion Loop - Supervisor coordination with sub-researchers -// 4. Final Report - Full optimization with Insightfulness + Helpfulness rules +// 1. Brief generation: convert the user query into a structured brief (LLM call). +// 2. Draft generation: produce a first-pass report structure from model priors (LLM call). +// 3. Diffusion loop: supervisor iterates, spawning sub-researchers and refining the draft +// until completion criteria are met (bounded by iteration limits). +// 4. Final report: synthesize deduped findings + refined draft into the final deliverable. // // Original: research_agent_full.py in ThinkDepth.ai -type ThinkDeepOrchestrator struct { +type AgentLoop struct { bus *events.Bus appConfig *config.Config client llm.ChatClient @@ -34,11 +35,11 @@ type ThinkDeepOrchestrator struct { model string // Injection context for expansion workflows - injectionContext *think_deep.InjectionContext + injectionContext *runtime.InjectionContext } -// ThinkDeepConfig holds configuration for the ThinkDeep orchestrator. -type ThinkDeepConfig struct { +// LoopConfig holds configuration for the ThinkDeep loop. +type LoopConfig struct { // MaxSupervisorIterations is the max diffusion iterations MaxSupervisorIterations int @@ -49,21 +50,21 @@ type ThinkDeepConfig struct { MaxConcurrentResearch int } -// DefaultThinkDeepConfig returns sensible defaults for ThinkDeep configuration. -func DefaultThinkDeepConfig() ThinkDeepConfig { - return ThinkDeepConfig{ +// DefaultLoopConfig returns sensible defaults for ThinkDeep configuration. +func DefaultLoopConfig() LoopConfig { + return LoopConfig{ MaxSupervisorIterations: 15, MaxSubResearcherIter: 5, MaxConcurrentResearch: 3, } } -// ThinkDeepOption allows configuring the ThinkDeep orchestrator. -type ThinkDeepOption func(*ThinkDeepOrchestrator) +// LoopOption allows configuring the ThinkDeep loop. +type LoopOption func(*AgentLoop) -// WithThinkDeepClient injects a custom LLM client (for testing). -func WithThinkDeepClient(client llm.ChatClient) ThinkDeepOption { - return func(o *ThinkDeepOrchestrator) { +// WithLoopClient injects a custom LLM client (for testing). +func WithLoopClient(client llm.ChatClient) LoopOption { + return func(o *AgentLoop) { o.client = client o.model = client.GetModel() // Recreate supervisor with injected client @@ -78,27 +79,27 @@ func WithThinkDeepClient(client llm.ChatClient) ThinkDeepOption { } } -// WithThinkDeepTools injects a custom tool executor (for testing). -func WithThinkDeepTools(toolExec tools.ToolExecutor) ThinkDeepOption { - return func(o *ThinkDeepOrchestrator) { +// WithLoopTools injects a custom tool executor (for testing). +func WithLoopTools(toolExec tools.ToolExecutor) LoopOption { + return func(o *AgentLoop) { o.tools = toolExec } } // WithInjectionContext provides prior context for expansion workflows. // This enables the orchestrator to build upon existing research findings. -func WithInjectionContext(ctx *think_deep.InjectionContext) ThinkDeepOption { - return func(o *ThinkDeepOrchestrator) { +func WithInjectionContext(ctx *runtime.InjectionContext) LoopOption { + return func(o *AgentLoop) { o.injectionContext = ctx } } -// NewThinkDeepOrchestrator creates a new ThinkDeep-style research orchestrator. -func NewThinkDeepOrchestrator(bus *events.Bus, cfg *config.Config, opts ...ThinkDeepOption) *ThinkDeepOrchestrator { +// NewAgentLoop creates a new ThinkDeep-style research loop. +func NewAgentLoop(bus *events.Bus, cfg *config.Config, opts ...LoopOption) *AgentLoop { client := llm.NewClient(cfg) toolReg := tools.NewRegistry(cfg.BraveAPIKey) - o := &ThinkDeepOrchestrator{ + o := &AgentLoop{ bus: bus, appConfig: cfg, client: client, @@ -121,8 +122,8 @@ func NewThinkDeepOrchestrator(bus *events.Bus, cfg *config.Config, opts ...Think return o } -// ThinkDeepResult contains the output of ThinkDeep research. -type ThinkDeepResult struct { +// LoopResult contains the output of ThinkDeep research. +type LoopResult struct { // Query is the original user query Query string @@ -139,7 +140,7 @@ type ThinkDeepResult struct { FinalReport string // SubInsights contains all structured insights captured during diffusion - SubInsights []think_deep.SubInsight + SubInsights []runtime.SubInsight // Cost tracks total token usage Cost session.CostBreakdown @@ -149,11 +150,11 @@ type ThinkDeepResult struct { } // Research executes the ThinkDeep workflow. -func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*ThinkDeepResult, error) { +func (o *AgentLoop) Research(ctx context.Context, query string) (*LoopResult, error) { startTime := time.Now() var totalCost session.CostBreakdown - // Emit start event + // Emit start event early so downstream listeners can attach. if o.bus != nil { o.bus.Publish(events.Event{ Type: events.EventResearchStarted, @@ -166,7 +167,8 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th } o.emitDiffusionStarted(query) - // Phase 1: Generate research brief from query + // Phase 1: Generate research brief from query. + // Single LLM call that expands the raw query into objectives, questions, and constraints. o.emitPhaseProgress("brief", "Generating research brief from query...") researchBrief, briefCost, err := o.generateResearchBrief(ctx, query) if err != nil { @@ -188,7 +190,8 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th default: } - // Phase 2: Generate initial draft report (from model's knowledge) + // Phase 2: Generate initial draft report (from model priors only). + // No external research yet—this seeds structure and highlights gaps for diffusion. o.emitPhaseProgress("draft", "Generating initial draft from model knowledge...") initialDraft, draftCost, err := o.generateInitialDraft(ctx, researchBrief) if err != nil { @@ -205,7 +208,12 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th default: } - // Phase 3: Supervisor coordination with diffusion refinement + // Phase 3: Supervisor coordination with diffusion refinement. + // The supervisor loops with max-iteration guard: + // - analyze gaps + // - delegate parallel sub-researchers + // - refine draft + // - stop based on findings, not draft appearance o.emitPhaseProgress("diffuse", "Starting diffusion-based refinement...") supervisorResult, err := o.supervisor.Coordinate( @@ -232,7 +240,8 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th default: } - // Phase 4: Final report generation (full optimization) + // Phase 4: Final report generation (full optimization). + // Uses deduped findings + refined draft to apply Insightfulness/Helpfulness rules. o.emitFinalReportStarted() finalReport, reportCost, err := o.generateFinalReport(ctx, researchBrief, supervisorResult) @@ -247,7 +256,7 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th o.emitFinalReportComplete() o.emitCostEvent("total", totalCost) - return &ThinkDeepResult{ + return &LoopResult{ Query: query, ResearchBrief: researchBrief, Notes: supervisorResult.Notes, @@ -260,14 +269,15 @@ func (o *ThinkDeepOrchestrator) Research(ctx context.Context, query string) (*Th } // executeSubResearch runs a sub-researcher agent for a specific topic. -func (o *ThinkDeepOrchestrator) executeSubResearch( +func (o *AgentLoop) executeSubResearch( ctx context.Context, topic string, researcherNum int, diffusionIteration int, ) (*agents.SubResearcherResult, error) { - // Create sub-researcher with summarization-enabled tool registry - subTools := think_deep.SubResearcherToolRegistry(o.appConfig.BraveAPIKey, o.client) + // Each sub-researcher gets a fresh tool registry tuned for summarization. + // Supervisor controls fan-out; this function is the per-delegation callback. + subTools := runtime.SubResearcherToolRegistry(o.appConfig.BraveAPIKey, o.client) subResearcher := agents.NewSubResearcherAgent( o.client, subTools, @@ -279,9 +289,9 @@ func (o *ThinkDeepOrchestrator) executeSubResearch( } // generateResearchBrief transforms the user query into a detailed research brief. -func (o *ThinkDeepOrchestrator) generateResearchBrief(ctx context.Context, query string) (string, session.CostBreakdown, error) { +func (o *AgentLoop) generateResearchBrief(ctx context.Context, query string) (string, session.CostBreakdown, error) { date := time.Now().Format("2006-01-02") - prompt := think_deep.TransformToResearchBriefPrompt(query, date) + prompt := runtime.TransformToResearchBriefPrompt(query, date) resp, err := o.client.Chat(ctx, []llm.Message{ {Role: "user", Content: prompt}, @@ -299,9 +309,9 @@ func (o *ThinkDeepOrchestrator) generateResearchBrief(ctx context.Context, query } // generateInitialDraft creates the initial draft report from model's knowledge. -func (o *ThinkDeepOrchestrator) generateInitialDraft(ctx context.Context, brief string) (string, session.CostBreakdown, error) { +func (o *AgentLoop) generateInitialDraft(ctx context.Context, brief string) (string, session.CostBreakdown, error) { date := time.Now().Format("2006-01-02") - prompt := think_deep.InitialDraftPrompt(brief, date) + prompt := runtime.InitialDraftPrompt(brief, date) resp, err := o.client.Chat(ctx, []llm.Message{ {Role: "user", Content: prompt}, @@ -319,17 +329,18 @@ func (o *ThinkDeepOrchestrator) generateInitialDraft(ctx context.Context, brief } // generateFinalReport creates the final optimized report with full Insightfulness + Helpfulness rules. -func (o *ThinkDeepOrchestrator) generateFinalReport( +func (o *AgentLoop) generateFinalReport( ctx context.Context, brief string, supervisor *agents.SupervisorResult, ) (string, session.CostBreakdown, error) { date := time.Now().Format("2006-01-02") - // Deduplicate findings based on URL overlap to prevent redundant sources + // Deduplicate findings based on URL overlap to prevent redundant sources; + // keeps notes with at least one new URL or URL-less general content. deduplicatedNotes := o.deduplicateFindings(supervisor.Notes) findings := strings.Join(deduplicatedNotes, "\n\n---\n\n") - prompt := think_deep.FinalReportPrompt(brief, findings, supervisor.DraftReport, date) + prompt := runtime.FinalReportPrompt(brief, findings, supervisor.DraftReport, date) resp, err := o.client.Chat(ctx, []llm.Message{ {Role: "user", Content: prompt}, @@ -348,7 +359,7 @@ func (o *ThinkDeepOrchestrator) generateFinalReport( // Event emission helpers -func (o *ThinkDeepOrchestrator) emitDiffusionStarted(topic string) { +func (o *AgentLoop) emitDiffusionStarted(topic string) { if o.bus == nil { return } @@ -362,7 +373,7 @@ func (o *ThinkDeepOrchestrator) emitDiffusionStarted(topic string) { }) } -func (o *ThinkDeepOrchestrator) emitDiffusionComplete(iterations, notesCount int) { +func (o *AgentLoop) emitDiffusionComplete(iterations, notesCount int) { if o.bus == nil { return } @@ -380,7 +391,7 @@ func (o *ThinkDeepOrchestrator) emitDiffusionComplete(iterations, notesCount int }) } -func (o *ThinkDeepOrchestrator) emitFinalReportStarted() { +func (o *AgentLoop) emitFinalReportStarted() { if o.bus == nil { return } @@ -393,7 +404,7 @@ func (o *ThinkDeepOrchestrator) emitFinalReportStarted() { }) } -func (o *ThinkDeepOrchestrator) emitFinalReportComplete() { +func (o *AgentLoop) emitFinalReportComplete() { if o.bus == nil { return } @@ -403,7 +414,7 @@ func (o *ThinkDeepOrchestrator) emitFinalReportComplete() { }) } -func (o *ThinkDeepOrchestrator) emitCostEvent(scope string, cost session.CostBreakdown) { +func (o *AgentLoop) emitCostEvent(scope string, cost session.CostBreakdown) { if o.bus == nil || cost.TotalTokens == 0 { return } @@ -422,7 +433,7 @@ func (o *ThinkDeepOrchestrator) emitCostEvent(scope string, cost session.CostBre }) } -func (o *ThinkDeepOrchestrator) emitPhaseProgress(phase, message string) { +func (o *AgentLoop) emitPhaseProgress(phase, message string) { if o.bus == nil { return } @@ -438,7 +449,7 @@ func (o *ThinkDeepOrchestrator) emitPhaseProgress(phase, message string) { // enhanceBriefForExpansion modifies the research brief to focus on expansion. // It adds context about existing research to guide sub-researchers toward new insights. -func (o *ThinkDeepOrchestrator) enhanceBriefForExpansion(brief string) string { +func (o *AgentLoop) enhanceBriefForExpansion(brief string) string { if o.injectionContext == nil { return brief } @@ -492,12 +503,12 @@ func (o *ThinkDeepOrchestrator) enhanceBriefForExpansion(brief string) string { // deduplicateFindings removes notes with entirely redundant URLs. // A note is kept if it contains at least one URL not seen in previous notes, // or if it contains no URLs (general content that should be preserved). -func (o *ThinkDeepOrchestrator) deduplicateFindings(notes []string) []string { +func (o *AgentLoop) deduplicateFindings(notes []string) []string { seenURLs := make(map[string]bool) var result []string for _, note := range notes { - urls := think_deep.ExtractURLs(note) + urls := runtime.ExtractURLs(note) // Check how many new URLs this note contributes newURLCount := 0 diff --git a/go-research/internal/orchestrator/think_deep_test.go b/go-research/internal/architectures/think_deep/loop_test.go similarity index 91% rename from go-research/internal/orchestrator/think_deep_test.go rename to go-research/internal/architectures/think_deep/loop_test.go index 47c11ec..7459c01 100644 --- a/go-research/internal/orchestrator/think_deep_test.go +++ b/go-research/internal/architectures/think_deep/loop_test.go @@ -1,4 +1,4 @@ -package orchestrator +package think_deep import ( "context" @@ -85,7 +85,7 @@ func (m *mockThinkDeepTools) ToolNames() []string { return []string{"search", "think"} } -func TestNewThinkDeepOrchestrator(t *testing.T) { +func TestNewAgentLoop(t *testing.T) { bus := events.NewBus(100) cfg := &config.Config{ OpenRouterAPIKey: "test-key", @@ -93,7 +93,7 @@ func TestNewThinkDeepOrchestrator(t *testing.T) { Model: "test-model", } - orch := NewThinkDeepOrchestrator(bus, cfg) + orch := NewAgentLoop(bus, cfg) if orch == nil { t.Fatal("expected orchestrator to be created") @@ -108,7 +108,7 @@ func TestNewThinkDeepOrchestrator(t *testing.T) { } } -func TestThinkDeepOrchestratorWithOptions(t *testing.T) { +func TestAgentLoopWithOptions(t *testing.T) { bus := events.NewBus(100) cfg := &config.Config{ OpenRouterAPIKey: "test-key", @@ -119,11 +119,11 @@ func TestThinkDeepOrchestratorWithOptions(t *testing.T) { mockClient := &mockThinkDeepClient{model: "custom-model"} mockTools := &mockThinkDeepTools{} - orch := NewThinkDeepOrchestrator( + orch := NewAgentLoop( bus, cfg, - WithThinkDeepClient(mockClient), - WithThinkDeepTools(mockTools), + WithLoopClient(mockClient), + WithLoopTools(mockTools), ) if orch == nil { @@ -143,8 +143,8 @@ func TestThinkDeepOrchestratorWithOptions(t *testing.T) { } } -func TestDefaultThinkDeepConfig(t *testing.T) { - cfg := DefaultThinkDeepConfig() +func TestDefaultLoopConfig(t *testing.T) { + cfg := DefaultLoopConfig() if cfg.MaxSupervisorIterations != 15 { t.Errorf("expected MaxSupervisorIterations=15, got %d", cfg.MaxSupervisorIterations) @@ -159,7 +159,7 @@ func TestDefaultThinkDeepConfig(t *testing.T) { } } -func TestThinkDeepOrchestratorResearchFullWorkflow(t *testing.T) { +func TestAgentLoopResearchFullWorkflow(t *testing.T) { bus := events.NewBus(100) cfg := &config.Config{ OpenRouterAPIKey: "test-key", @@ -251,11 +251,11 @@ The test topic is important for understanding the broader context. }, } - orch := NewThinkDeepOrchestrator( + orch := NewAgentLoop( bus, cfg, - WithThinkDeepClient(mockClient), - WithThinkDeepTools(mockTools), + WithLoopClient(mockClient), + WithLoopTools(mockTools), ) result, err := orch.Research(context.Background(), "test topic") @@ -289,7 +289,7 @@ The test topic is important for understanding the broader context. } } -func TestThinkDeepOrchestratorEventEmission(t *testing.T) { +func TestAgentLoopEventEmission(t *testing.T) { bus := events.NewBus(100) cfg := &config.Config{ OpenRouterAPIKey: "test-key", @@ -316,10 +316,10 @@ func TestThinkDeepOrchestratorEventEmission(t *testing.T) { }, } - orch := NewThinkDeepOrchestrator( + orch := NewAgentLoop( bus, cfg, - WithThinkDeepClient(mockClient), + WithLoopClient(mockClient), ) _, err := orch.Research(context.Background(), "test query") @@ -374,7 +374,7 @@ func TestThinkDeepOrchestratorEventEmission(t *testing.T) { } } -func TestThinkDeepOrchestratorContextCancellation(t *testing.T) { +func TestAgentLoopContextCancellation(t *testing.T) { bus := events.NewBus(100) cfg := &config.Config{ OpenRouterAPIKey: "test-key", @@ -388,10 +388,10 @@ func TestThinkDeepOrchestratorContextCancellation(t *testing.T) { }, } - orch := NewThinkDeepOrchestrator( + orch := NewAgentLoop( bus, cfg, - WithThinkDeepClient(mockClient), + WithLoopClient(mockClient), ) // Create already-cancelled context @@ -406,8 +406,8 @@ func TestThinkDeepOrchestratorContextCancellation(t *testing.T) { } } -func TestThinkDeepResultFields(t *testing.T) { - result := &ThinkDeepResult{ +func TestLoopResultFields(t *testing.T) { + result := &LoopResult{ Query: "test query", ResearchBrief: "## Research Brief", Notes: []string{"note1", "note2"}, @@ -432,7 +432,7 @@ func TestThinkDeepResultFields(t *testing.T) { } } -func TestThinkDeepOrchestratorNilBusCreation(t *testing.T) { +func TestAgentLoopNilBusCreation(t *testing.T) { cfg := &config.Config{ OpenRouterAPIKey: "test-key", BraveAPIKey: "test-brave-key", @@ -440,7 +440,7 @@ func TestThinkDeepOrchestratorNilBusCreation(t *testing.T) { } // Test with nil bus - should create orchestrator without panic - orch := NewThinkDeepOrchestrator(nil, cfg) + orch := NewAgentLoop(nil, cfg) if orch == nil { t.Fatal("expected orchestrator to be created even with nil bus") diff --git a/go-research/internal/think_deep/injection.go b/go-research/internal/architectures/think_deep/runtime/injection.go similarity index 98% rename from go-research/internal/think_deep/injection.go rename to go-research/internal/architectures/think_deep/runtime/injection.go index 1db9513..c532bcd 100644 --- a/go-research/internal/think_deep/injection.go +++ b/go-research/internal/architectures/think_deep/runtime/injection.go @@ -1,4 +1,4 @@ -package think_deep +package runtime // InjectionContext provides prior knowledge for expansion workflows. // This context is injected into the supervisor state before research begins. diff --git a/go-research/internal/think_deep/prompts.go b/go-research/internal/architectures/think_deep/runtime/prompts.go similarity index 99% rename from go-research/internal/think_deep/prompts.go rename to go-research/internal/architectures/think_deep/runtime/prompts.go index c8a6162..010e8d6 100644 --- a/go-research/internal/think_deep/prompts.go +++ b/go-research/internal/architectures/think_deep/runtime/prompts.go @@ -1,10 +1,10 @@ -// Package think_deep implements the ThinkDepth.ai "Self-Balancing Test-Time +// Package runtime implements the ThinkDepth.ai "Self-Balancing Test-Time // Diffusion Deep Research" architecture. // // This file contains all prompt templates used throughout the ThinkDeep workflow. // The prompts are based on the original ThinkDepth.ai implementation which // achieved #1 ranking on DeepResearch Bench. -package think_deep +package runtime import "fmt" diff --git a/go-research/internal/think_deep/state.go b/go-research/internal/architectures/think_deep/runtime/state.go similarity index 99% rename from go-research/internal/think_deep/state.go rename to go-research/internal/architectures/think_deep/runtime/state.go index 1b416ec..b44c614 100644 --- a/go-research/internal/think_deep/state.go +++ b/go-research/internal/architectures/think_deep/runtime/state.go @@ -1,9 +1,9 @@ -// Package think_deep implements the ThinkDepth.ai "Self-Balancing Test-Time +// Package runtime implements the ThinkDepth.ai "Self-Balancing Test-Time // Diffusion Deep Research" architecture. // // This file contains state definitions used to track progress through the // ThinkDeep workflow, including supervisor coordination and sub-researcher tasks. -package think_deep +package runtime import ( "regexp" diff --git a/go-research/internal/think_deep/state_test.go b/go-research/internal/architectures/think_deep/runtime/state_test.go similarity index 99% rename from go-research/internal/think_deep/state_test.go rename to go-research/internal/architectures/think_deep/runtime/state_test.go index 9bceb82..5f20333 100644 --- a/go-research/internal/think_deep/state_test.go +++ b/go-research/internal/architectures/think_deep/runtime/state_test.go @@ -1,4 +1,4 @@ -package think_deep +package runtime import ( "testing" diff --git a/go-research/internal/think_deep/tools.go b/go-research/internal/architectures/think_deep/runtime/tools.go similarity index 98% rename from go-research/internal/think_deep/tools.go rename to go-research/internal/architectures/think_deep/runtime/tools.go index d5ddbff..aec0326 100644 --- a/go-research/internal/think_deep/tools.go +++ b/go-research/internal/architectures/think_deep/runtime/tools.go @@ -1,10 +1,10 @@ -// Package think_deep implements the ThinkDepth.ai "Self-Balancing Test-Time +// Package runtime implements the ThinkDepth.ai "Self-Balancing Test-Time // Diffusion Deep Research" architecture. // // This file contains tool implementations specific to the ThinkDeep workflow, // including the think tool, conduct_research delegation, draft refinement, and // research completion signaling. -package think_deep +package runtime import ( "context" diff --git a/go-research/internal/think_deep/tools_integration_test.go b/go-research/internal/architectures/think_deep/runtime/tools_integration_test.go similarity index 99% rename from go-research/internal/think_deep/tools_integration_test.go rename to go-research/internal/architectures/think_deep/runtime/tools_integration_test.go index 80d11f1..f0df589 100644 --- a/go-research/internal/think_deep/tools_integration_test.go +++ b/go-research/internal/architectures/think_deep/runtime/tools_integration_test.go @@ -1,4 +1,4 @@ -package think_deep +package runtime import ( "context" diff --git a/go-research/internal/architectures/think_deep/think_deep.go b/go-research/internal/architectures/think_deep/think_deep.go index 0bec984..ad59607 100644 --- a/go-research/internal/architectures/think_deep/think_deep.go +++ b/go-research/internal/architectures/think_deep/think_deep.go @@ -46,11 +46,10 @@ import ( "go-research/internal/architectures" "go-research/internal/architectures/catalog" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/config" "go-research/internal/events" "go-research/internal/llm" - "go-research/internal/orchestrator" - "go-research/internal/think_deep" "go-research/internal/tools" ) @@ -61,27 +60,42 @@ type Architecture struct { config *config.Config client llm.ChatClient tools tools.ToolExecutor - injectionContext *think_deep.InjectionContext + injectionContext *runtime.InjectionContext + loop *AgentLoop } // Config holds configuration for the ThinkDeep architecture. type Config struct { AppConfig *config.Config Bus *events.Bus - Client llm.ChatClient // Optional: inject for testing - Tools tools.ToolExecutor // Optional: inject for testing - InjectionContext *think_deep.InjectionContext // Optional: prior context for expansion + Client llm.ChatClient // Optional: inject for testing + Tools tools.ToolExecutor // Optional: inject for testing + InjectionContext *runtime.InjectionContext // Optional: prior context for expansion } // New creates a new ThinkDeep architecture instance. func New(cfg Config) *Architecture { - return &Architecture{ + a := &Architecture{ bus: cfg.Bus, config: cfg.AppConfig, client: cfg.Client, tools: cfg.Tools, injectionContext: cfg.InjectionContext, } + + opts := []LoopOption{} + if cfg.Client != nil { + opts = append(opts, WithLoopClient(cfg.Client)) + } + if cfg.Tools != nil { + opts = append(opts, WithLoopTools(cfg.Tools)) + } + if cfg.InjectionContext != nil { + opts = append(opts, WithInjectionContext(cfg.InjectionContext)) + } + + a.loop = NewAgentLoop(cfg.Bus, cfg.AppConfig, opts...) + return a } // Name returns the architecture identifier. @@ -101,8 +115,12 @@ func (a *Architecture) SupportsResume() bool { // SetInjectionContext sets the context for expansion workflows. // This allows building upon existing research findings. -func (a *Architecture) SetInjectionContext(ctx *think_deep.InjectionContext) { +func (a *Architecture) SetInjectionContext(ctx *runtime.InjectionContext) { a.injectionContext = ctx + // Update loop with new context + if a.loop != nil { + a.loop.injectionContext = ctx + } } // Research executes the ThinkDeep research workflow: @@ -113,23 +131,22 @@ func (a *Architecture) SetInjectionContext(ctx *think_deep.InjectionContext) { func (a *Architecture) Research(ctx context.Context, sessionID string, query string) (*architectures.Result, error) { startTime := time.Now() - // Build orchestrator options - opts := []orchestrator.ThinkDeepOption{} - if a.client != nil { - opts = append(opts, orchestrator.WithThinkDeepClient(a.client)) - } - if a.tools != nil { - opts = append(opts, orchestrator.WithThinkDeepTools(a.tools)) - } - if a.injectionContext != nil { - opts = append(opts, orchestrator.WithInjectionContext(a.injectionContext)) + if a.loop == nil { + opts := []LoopOption{} + if a.client != nil { + opts = append(opts, WithLoopClient(a.client)) + } + if a.tools != nil { + opts = append(opts, WithLoopTools(a.tools)) + } + if a.injectionContext != nil { + opts = append(opts, WithInjectionContext(a.injectionContext)) + } + a.loop = NewAgentLoop(a.bus, a.config, opts...) } - // Create the ThinkDeep orchestrator - orch := orchestrator.NewThinkDeepOrchestrator(a.bus, a.config, opts...) - // Execute the full ThinkDeep workflow - thinkDeepResult, err := orch.Research(ctx, query) + thinkDeepResult, err := a.loop.Research(ctx, query) if err != nil { return &architectures.Result{ SessionID: sessionID, @@ -151,8 +168,8 @@ func (a *Architecture) Resume(ctx context.Context, sessionID string) (*architect return nil, fmt.Errorf("resume not supported for ThinkDepth") } -// convertResult transforms ThinkDeepResult into the standard architectures.Result format. -func (a *Architecture) convertResult(sessionID string, tdr *orchestrator.ThinkDeepResult, startTime time.Time) *architectures.Result { +// convertResult transforms LoopResult into the standard architectures.Result format. +func (a *Architecture) convertResult(sessionID string, tdr *LoopResult, startTime time.Time) *architectures.Result { result := &architectures.Result{ SessionID: sessionID, Query: tdr.Query, diff --git a/go-research/internal/e2e/e2e_test.go b/go-research/internal/e2e/e2e_test.go index 06171fa..833d1ec 100644 --- a/go-research/internal/e2e/e2e_test.go +++ b/go-research/internal/e2e/e2e_test.go @@ -12,6 +12,7 @@ import ( "go-research/internal/agent" "go-research/internal/architectures/catalog" + "go-research/internal/architectures/storm" "go-research/internal/config" "go-research/internal/events" "go-research/internal/llm" @@ -2197,9 +2198,9 @@ func TestStormOrchestratorE2EFullWorkflow(t *testing.T) { mockTools := NewMockToolExecutor() // Create STORM orchestrator with mocks - orch := orchestrator.NewStormOrchestrator(bus, cfg, - orchestrator.WithStormClient(mockLLM), - orchestrator.WithStormTools(mockTools), + orch := storm.NewStormLoop(bus, cfg, + storm.WithStormClient(mockLLM), + storm.WithStormTools(mockTools), ) ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) @@ -2359,9 +2360,9 @@ func TestStormOrchestratorE2EWithGapFilling(t *testing.T) { mockTools := NewMockToolExecutor() - orch := orchestrator.NewStormOrchestrator(bus, cfg, - orchestrator.WithStormClient(mockLLM), - orchestrator.WithStormTools(mockTools), + orch := storm.NewStormLoop(bus, cfg, + storm.WithStormClient(mockLLM), + storm.WithStormTools(mockTools), ) ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) @@ -2398,9 +2399,9 @@ func TestStormOrchestratorE2EContextCancellation(t *testing.T) { mockTools := NewMockToolExecutor() - orch := orchestrator.NewStormOrchestrator(bus, cfg, - orchestrator.WithStormClient(mockLLM), - orchestrator.WithStormTools(mockTools), + orch := storm.NewStormLoop(bus, cfg, + storm.WithStormClient(mockLLM), + storm.WithStormTools(mockTools), ) // Cancel immediately @@ -2446,9 +2447,9 @@ func TestStormOrchestratorE2EEventSequence(t *testing.T) { mockTools := NewMockToolExecutor() - orch := orchestrator.NewStormOrchestrator(bus, cfg, - orchestrator.WithStormClient(mockLLM), - orchestrator.WithStormTools(mockTools), + orch := storm.NewStormLoop(bus, cfg, + storm.WithStormClient(mockLLM), + storm.WithStormTools(mockTools), ) ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) diff --git a/go-research/internal/obsidian/writer.go b/go-research/internal/obsidian/writer.go index 780e8c3..c35c8fd 100644 --- a/go-research/internal/obsidian/writer.go +++ b/go-research/internal/obsidian/writer.go @@ -12,8 +12,8 @@ import ( "text/template" "time" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/session" - "go-research/internal/think_deep" "gopkg.in/yaml.v3" ) @@ -140,7 +140,7 @@ func (w *Writer) GetReportPath(sess *session.Session) string { // WriteSource writes a source file to the sources directory. // Returns the generated filename for linking from insights. -func (w *Writer) WriteSource(sessionDir string, sourceRef think_deep.SourceReference, index int) (string, error) { +func (w *Writer) WriteSource(sessionDir string, sourceRef runtime.SourceReference, index int) (string, error) { // Generate safe filename from URL or file path var safeName string if sourceRef.URL != "" { @@ -231,7 +231,7 @@ func (w *Writer) WriteSource(sessionDir string, sourceRef think_deep.SourceRefer // WriteSources writes all sources from insights to the sources directory. // Returns a map of source URL/path to filename for linking. -func (w *Writer) WriteSources(sessionDir string, insights []think_deep.SubInsight) (map[string]string, error) { +func (w *Writer) WriteSources(sessionDir string, insights []runtime.SubInsight) (map[string]string, error) { sourceMap := make(map[string]string) sourceIndex := 0 @@ -259,9 +259,9 @@ func (w *Writer) WriteSources(sessionDir string, insights []think_deep.SubInsigh // Also create source from legacy SourceURL/SourceContent if not already covered if insight.SourceURL != "" && sourceMap[insight.SourceURL] == "" { sourceIndex++ - src := think_deep.SourceReference{ + src := runtime.SourceReference{ URL: insight.SourceURL, - Type: think_deep.SourceTypeWeb, + Type: runtime.SourceTypeWeb, RelevantExcerpt: insight.SourceContent, FetchedAt: insight.Timestamp, } @@ -277,16 +277,16 @@ func (w *Writer) WriteSources(sessionDir string, insights []think_deep.SubInsigh // WriteInsight writes a single insight to the insights directory. // Enhanced to include data points, analysis chain, and source links. -func (w *Writer) WriteInsight(sessionDir string, insight think_deep.SubInsight, index int, sourceMap map[string]string) error { +func (w *Writer) WriteInsight(sessionDir string, insight runtime.SubInsight, index int, sourceMap map[string]string) error { filename := filepath.Join(sessionDir, "insights", fmt.Sprintf("insight_%03d.md", index)) frontmatter := map[string]interface{}{ - "insight_id": insight.ID, - "topic": insight.Topic, - "confidence": fmt.Sprintf("%.2f", insight.Confidence), - "iteration": insight.Iteration, - "researcher": insight.ResearcherNum, - "timestamp": insight.Timestamp.Format(time.RFC3339), + "insight_id": insight.ID, + "topic": insight.Topic, + "confidence": fmt.Sprintf("%.2f", insight.Confidence), + "iteration": insight.Iteration, + "researcher": insight.ResearcherNum, + "timestamp": insight.Timestamp.Format(time.RFC3339), } if insight.SourceURL != "" { frontmatter["source_url"] = insight.SourceURL @@ -416,7 +416,7 @@ func (w *Writer) WriteInsight(sessionDir string, insight think_deep.SubInsight, // WriteInsights writes all insights for a session. // It first writes all sources, then writes insights with source links. -func (w *Writer) WriteInsights(sessionDir string, insights []think_deep.SubInsight) error { +func (w *Writer) WriteInsights(sessionDir string, insights []runtime.SubInsight) error { // First, write all sources and get the mapping sourceMap, err := w.WriteSources(sessionDir, insights) if err != nil { @@ -470,7 +470,7 @@ func sanitizeFilename(input string) string { } // WriteWithInsights writes a session with its sub-insights to the Obsidian vault. -func (w *Writer) WriteWithInsights(sess *session.Session, subInsights []think_deep.SubInsight) error { +func (w *Writer) WriteWithInsights(sess *session.Session, subInsights []runtime.SubInsight) error { // Create session directory structure sessionDir := filepath.Join(w.vaultPath, sess.ID) dirs := []string{ @@ -592,7 +592,7 @@ const sessionMOCTemplate = `--- ` // writeSessionMOCWithInsights writes a session MOC that includes links to insights -func (w *Writer) writeSessionMOCWithInsights(dir string, sess *session.Session, insights []think_deep.SubInsight) error { +func (w *Writer) writeSessionMOCWithInsights(dir string, sess *session.Session, insights []runtime.SubInsight) error { filename := filepath.Join(dir, "session.md") frontmatter := map[string]interface{}{ @@ -619,7 +619,7 @@ func (w *Writer) writeSessionMOCWithInsights(dir string, sess *session.Session, data := struct { Frontmatter string *session.Session - SubInsights []think_deep.SubInsight + SubInsights []runtime.SubInsight }{string(fm), sess, insights} var content bytes.Buffer diff --git a/go-research/internal/repl/handlers/expand.go b/go-research/internal/repl/handlers/expand.go index bbe83f3..547f99d 100644 --- a/go-research/internal/repl/handlers/expand.go +++ b/go-research/internal/repl/handlers/expand.go @@ -9,11 +9,11 @@ import ( "go-research/internal/agent" think_deep_arch "go-research/internal/architectures/think_deep" + "go-research/internal/architectures/think_deep/runtime" "go-research/internal/events" "go-research/internal/orchestrator" "go-research/internal/repl" "go-research/internal/session" - "go-research/internal/think_deep" ) // ExpandHandler handles /expand and natural language follow-ups @@ -239,8 +239,8 @@ func (h *ExpandHandler) runThinkDeep(ctx *repl.Context, query string, sess *sess } // buildInjectionContext creates an injection context from the session chain. -func (h *ExpandHandler) buildInjectionContext(ctx *repl.Context, expansionTopic string) *think_deep.InjectionContext { - injection := think_deep.NewInjectionContext() +func (h *ExpandHandler) buildInjectionContext(ctx *repl.Context, expansionTopic string) *runtime.InjectionContext { + injection := runtime.NewInjectionContext() injection.SetExpansionTopic(expansionTopic) // Walk session chain and accumulate context diff --git a/go-research/internal/session/context_view.go b/go-research/internal/session/context_view.go index c101580..2782c77 100644 --- a/go-research/internal/session/context_view.go +++ b/go-research/internal/session/context_view.go @@ -4,36 +4,36 @@ import ( "fmt" "strings" - "go-research/internal/think_deep" + "go-research/internal/architectures/think_deep/runtime" ) // ContextSnapshot contains statistics and raw context for a session type ContextSnapshot struct { // Session metadata - SessionID string - Mode Mode - Status SessionStatus - Query string - + SessionID string + Mode Mode + Status SessionStatus + Query string + // Statistics - ReportLength int - SourcesCount int - InsightsCount int - WorkersCount int + ReportLength int + SourcesCount int + InsightsCount int + WorkersCount int IterationsCount int - ToolCallsCount int - Cost CostBreakdown - + ToolCallsCount int + Cost CostBreakdown + // Token estimates EstimatedTokens int MaxTokens int - + // Think-deep specific - HasThinkDeepContext bool - ThinkDeepFindings int + HasThinkDeepContext bool + ThinkDeepFindings int ThinkDeepVisitedURLs int ThinkDeepHasReport bool - + // Raw context string RawContext string } @@ -43,26 +43,26 @@ func BuildContextSnapshot(sess *Session, store *Store, maxTokens int) (*ContextS if sess == nil { return nil, fmt.Errorf("no active session") } - + snapshot := &ContextSnapshot{ - SessionID: sess.ID, - Mode: sess.Mode, - Status: sess.Status, - Query: sess.Query, - ReportLength: len(sess.Report), - SourcesCount: len(sess.Sources), - InsightsCount: len(sess.Insights), - WorkersCount: len(sess.Workers), - Cost: sess.Cost, + SessionID: sess.ID, + Mode: sess.Mode, + Status: sess.Status, + Query: sess.Query, + ReportLength: len(sess.Report), + SourcesCount: len(sess.Sources), + InsightsCount: len(sess.Insights), + WorkersCount: len(sess.Workers), + Cost: sess.Cost, MaxTokens: maxTokens, } - + // Count iterations and tool calls across all workers for _, worker := range sess.Workers { snapshot.IterationsCount += len(worker.Iterations) snapshot.ToolCallsCount += len(worker.ToolCalls) } - + // Build raw context based on mode if sess.Mode == ModeThinkDeep { // For think_deep, build injection context like expand handler does @@ -76,18 +76,18 @@ func BuildContextSnapshot(sess *Session, store *Store, maxTokens int) (*ContextS // For fast/storm, use continuation context snapshot.RawContext = BuildContinuationContext(sess) } - + // Estimate tokens (rough approximation: chars/4) snapshot.EstimatedTokens = estimateTokens(snapshot.RawContext) - + return snapshot, nil } // buildInjectionContextForSnapshot builds injection context from session chain // Similar to ExpandHandler.buildInjectionContext but doesn't need expansion topic -func buildInjectionContextForSnapshot(sess *Session, store *Store) *think_deep.InjectionContext { - injection := think_deep.NewInjectionContext() - +func buildInjectionContextForSnapshot(sess *Session, store *Store) *runtime.InjectionContext { + injection := runtime.NewInjectionContext() + // Walk session chain and accumulate context current := sess for current != nil { @@ -95,17 +95,17 @@ func buildInjectionContextForSnapshot(sess *Session, store *Store) *think_deep.I for _, ins := range current.Insights { injection.AddFinding(ins.Finding) } - + // Accumulate sources as visited URLs for _, src := range current.Sources { injection.AddVisitedURL(src) } - + // Keep existing report for context if injection.ExistingReport == "" && current.Report != "" { injection.SetExistingReport(current.Report) } - + // Walk to parent if current.ParentID == nil { break @@ -116,18 +116,18 @@ func buildInjectionContextForSnapshot(sess *Session, store *Store) *think_deep.I } current = parent } - + return injection } // serializeInjectionContext converts injection context to readable string -func serializeInjectionContext(injection *think_deep.InjectionContext) string { +func serializeInjectionContext(injection *runtime.InjectionContext) string { var sb strings.Builder - + if injection.ExpansionTopic != "" { sb.WriteString(fmt.Sprintf("Expansion topic: %s\n\n", injection.ExpansionTopic)) } - + if injection.ExistingReport != "" { sb.WriteString("Existing report:\n") report := injection.ExistingReport @@ -137,7 +137,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { sb.WriteString(report) sb.WriteString("\n\n") } - + if len(injection.PreviousFindings) > 0 { sb.WriteString("Previous findings:\n") for _, finding := range injection.PreviousFindings { @@ -145,7 +145,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { } sb.WriteString("\n") } - + if len(injection.ValidatedFacts) > 0 { sb.WriteString("Validated facts:\n") for _, fact := range injection.ValidatedFacts { @@ -153,7 +153,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { } sb.WriteString("\n") } - + if len(injection.VisitedURLs) > 0 { sb.WriteString("Visited URLs:\n") limit := len(injection.VisitedURLs) @@ -168,7 +168,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { } sb.WriteString("\n") } - + if len(injection.KnownGaps) > 0 { sb.WriteString("Known gaps:\n") for _, gap := range injection.KnownGaps { @@ -176,7 +176,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { } sb.WriteString("\n") } - + if len(injection.RelatedTopics) > 0 { sb.WriteString("Related topics:\n") for _, topic := range injection.RelatedTopics { @@ -184,7 +184,7 @@ func serializeInjectionContext(injection *think_deep.InjectionContext) string { } sb.WriteString("\n") } - + return sb.String() } diff --git a/lib/posts.ts b/lib/posts.ts index 3b6321b..33b1529 100644 --- a/lib/posts.ts +++ b/lib/posts.ts @@ -45,6 +45,15 @@ const posts: Post[] = [ readTime: '10 min read', audioUrl: '/posts/context-engineering-claude-code/audio.mp3', }, + { + title: 'Diffusion Deep Research', + slug: 'diffusion-deep-research', + description: 'How diffusion can be used to create a deep research report.', + publishedAt: '2025-12-05', + tags: ['ai', 'agents', 'research', 'deep-research'], + cover: '/posts/diffusion-deep-research/cover-optimized.webp', + readTime: '15 min read', + }, ]; export function getAllPosts(): Post[] { diff --git a/package.json b/package.json index bf2b5ba..a7b648a 100644 --- a/package.json +++ b/package.json @@ -56,6 +56,7 @@ "next": "16.0.7", "next-themes": "^0.4.6", "posthog-js": "^1.301.0", + "prism-react-renderer": "^2.4.1", "react": "19.2.1", "react-day-picker": "9.11.3", "react-dom": "19.2.1", diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 245f8c1..eafbb27 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -131,6 +131,9 @@ importers: posthog-js: specifier: ^1.301.0 version: 1.301.0 + prism-react-renderer: + specifier: ^2.4.1 + version: 2.4.1(react@19.2.1) react: specifier: 19.2.1 version: 19.2.1 @@ -1567,6 +1570,9 @@ packages: '@types/node@24.10.1': resolution: {integrity: sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ==} + '@types/prismjs@1.26.5': + resolution: {integrity: sha512-AUZTa7hQ2KY5L7AmtSiqxlhWxb4ina0yd8hNbl4TWuqnv/pFP0nDMb3YrfSBf4hJVGLh2YEIBfKaBW/9UEl6IQ==} + '@types/react-dom@19.2.3': resolution: {integrity: sha512-jp2L/eY6fn+KgVVQAOqYItbF0VY/YApe5Mz2F0aykSO8gx31bYCZyvSeYxCHKvzHG5eZjc+zyaS5BrBWya2+kQ==} peerDependencies: @@ -2889,6 +2895,11 @@ packages: engines: {node: '>=14'} hasBin: true + prism-react-renderer@2.4.1: + resolution: {integrity: sha512-ey8Ls/+Di31eqzUxC46h8MksNuGx/n0AAC8uKpwFau4RPDYLuE3EXTp8N8G2vX2N7UC/+IXeNUnlWBGGcAG+Ig==} + peerDependencies: + react: '>=16.0.0' + prop-types@15.8.1: resolution: {integrity: sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg==} @@ -4615,6 +4626,8 @@ snapshots: dependencies: undici-types: 7.16.0 + '@types/prismjs@1.26.5': {} + '@types/react-dom@19.2.3(@types/react@19.2.7)': dependencies: '@types/react': 19.2.7 @@ -6060,6 +6073,12 @@ snapshots: prettier@3.7.4: {} + prism-react-renderer@2.4.1(react@19.2.1): + dependencies: + '@types/prismjs': 1.26.5 + clsx: 2.1.1 + react: 19.2.1 + prop-types@15.8.1: dependencies: loose-envify: 1.4.0 diff --git a/public/posts/diffusion-deep-research/cover-optimized.avif b/public/posts/diffusion-deep-research/cover-optimized.avif new file mode 100644 index 0000000..7282ac1 Binary files /dev/null and b/public/posts/diffusion-deep-research/cover-optimized.avif differ diff --git a/public/posts/diffusion-deep-research/cover-optimized.png b/public/posts/diffusion-deep-research/cover-optimized.png new file mode 100644 index 0000000..5984583 Binary files /dev/null and b/public/posts/diffusion-deep-research/cover-optimized.png differ diff --git a/public/posts/diffusion-deep-research/cover-optimized.webp b/public/posts/diffusion-deep-research/cover-optimized.webp new file mode 100644 index 0000000..baec9e3 Binary files /dev/null and b/public/posts/diffusion-deep-research/cover-optimized.webp differ diff --git a/public/posts/diffusion-deep-research/cover.png b/public/posts/diffusion-deep-research/cover.png new file mode 100644 index 0000000..d8b2b26 Binary files /dev/null and b/public/posts/diffusion-deep-research/cover.png differ diff --git a/thoughts/shared/plans/blog-typography-components.md b/thoughts/shared/plans/blog-typography-components.md index dd6e7a1..88442ab 100644 --- a/thoughts/shared/plans/blog-typography-components.md +++ b/thoughts/shared/plans/blog-typography-components.md @@ -177,8 +177,7 @@ const headingVariants = cva('font-bold text-balance scroll-mt-20', { }); export interface BlogHeadingProps - extends React.HTMLAttributes, - VariantProps { + extends React.HTMLAttributes, VariantProps { level: 1 | 2 | 3 | 4 | 5 | 6; } @@ -236,7 +235,8 @@ const listVariants = cva('my-6 space-y-2', { }); export interface BlogListProps - extends React.HTMLAttributes, + extends + React.HTMLAttributes, VariantProps { variant?: 'unordered' | 'ordered' | 'checklist'; } @@ -803,6 +803,7 @@ Not required for this implementation - these are presentational Server Component - Verify blog list page works 8. **Verify build**: + ```bash pnpm build ``` diff --git a/thoughts/shared/plans/data-analysis-tools-implementation.md b/thoughts/shared/plans/data-analysis-tools-implementation.md index fa1d0d2..1b9dc90 100644 --- a/thoughts/shared/plans/data-analysis-tools-implementation.md +++ b/thoughts/shared/plans/data-analysis-tools-implementation.md @@ -11,6 +11,7 @@ This plan implements data analysis tools (CSV EDA) and document reading tools (P The codebase has a clean tool architecture: - **Tool Interface** (`internal/tools/registry.go:9-13`): + ```go type Tool interface { Name() string @@ -41,6 +42,7 @@ The codebase has a clean tool architecture: ### Current Dependencies From `go.mod`: + - Go 1.24.0 - No existing PDF/document parsing libraries - No CSV/data analysis libraries @@ -77,6 +79,7 @@ After this plan is complete: ## Implementation Approach Follow the `SearchTool` + `ContentSummarizer` pattern: + 1. Create standalone tools that work without LLM 2. Add optional LLM enhancement for deeper analysis 3. Register tools in `SubResearcherToolRegistry()` @@ -95,6 +98,7 @@ Implement a PDF text extraction tool using the `pdfcpu` library (pure Go, no CGO #### 1. Add pdfcpu dependency **Command**: + ```bash go get github.com/pdfcpu/pdfcpu ``` @@ -314,11 +318,13 @@ func TestPDFReadTool_Execute_RealFile(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./internal/tools/...` - [x] No linting errors: `cd go-research && golangci-lint run ./...` (if available) #### Manual Verification: + - [ ] Create a test PDF and verify text extraction works - [ ] Large PDFs are truncated appropriately @@ -335,6 +341,7 @@ Implement a DOCX text extraction tool. Since unioffice is commercial, use `balia #### 1. Add docx dependency **Command**: + ```bash go get github.com/nguyenthenguyen/docx ``` @@ -491,10 +498,12 @@ func TestDOCXReadTool_Execute_RealFile(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./internal/tools/...` #### Manual Verification: + - [ ] Create a test DOCX and verify text extraction works - [ ] Complex DOCX with tables/formatting extracts readable text @@ -634,10 +643,12 @@ func TestDocumentReadTool_Execute_DetectsDOCX(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./internal/tools/...` #### Manual Verification: + - [x] Tool correctly routes PDF files to PDF reader - [x] Tool correctly routes DOCX files to DOCX reader - [x] Unsupported formats return clear error message @@ -655,6 +666,7 @@ Implement a CSV analysis tool that performs exploratory data analysis (EDA) incl #### 1. Add statistics dependency **Command**: + ```bash go get gonum.org/v1/gonum/stat go get github.com/montanaflynn/stats @@ -1094,10 +1106,12 @@ func TestIsNumericColumn(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./internal/tools/...` #### Manual Verification: + - [x] Tool correctly identifies numeric vs string columns - [x] Summary statistics are accurate - [ ] Large CSV files are handled without memory issues @@ -1241,11 +1255,13 @@ Add to the `` section: ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./...` - [x] Type check passes: `cd go-research && go vet ./...` #### Manual Verification: + - [x] New tools appear in sub-researcher tool registry - [x] Updated prompts include documentation for new tools - [ ] Sub-researcher can call document and CSV tools during research @@ -1265,6 +1281,7 @@ Create integration tests that verify the new tools work correctly within the Thi **Directory**: `internal/tools/testdata/` Create test files: + - `testdata/sample.pdf` - Simple PDF with text content - `testdata/sample.docx` - Simple DOCX with text content - `testdata/sample.csv` - CSV with numeric and string columns @@ -1354,11 +1371,13 @@ func TestSubResearcherToolRegistry_ExecuteCSVTool(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] All tests pass: `cd go-research && go test ./...` - [x] Tool registry contains all expected tools #### Manual Verification: + - [ ] End-to-end test with real PDF/DOCX/CSV files - [ ] Sub-researcher can use document tools during research session - [ ] Performance is acceptable with moderately sized files (< 10MB) @@ -1370,6 +1389,7 @@ func TestSubResearcherToolRegistry_ExecuteCSVTool(t *testing.T) { ### Unit Tests Each tool has dedicated unit tests covering: + - Name and description methods - Missing/invalid arguments - File not found scenarios diff --git a/thoughts/shared/plans/deep-research-agent-python-mvp.md b/thoughts/shared/plans/deep-research-agent-python-mvp.md index 11ea63c..a51d94c 100644 --- a/thoughts/shared/plans/deep-research-agent-python-mvp.md +++ b/thoughts/shared/plans/deep-research-agent-python-mvp.md @@ -10,11 +10,13 @@ Build a production-ready deep research agent in **pure Python** over **3 increme **Repository**: `addcommitpush.io` - A Next.js 16 TypeScript blog **Existing Assets**: + - `/deep-research-agent/` directory (empty skeleton, will be rebuilt in Python) - Research documents at `/thoughts/shared/research/` (comprehensive architecture references) - No Python dependencies in main blog project **Technology Constraints**: + - **Language**: Pure Python 3.14+ (no TypeScript mixing) - **Package Manager**: uv (modern, fast Python package manager) - **State**: File-based initially, PostgreSQL optional for production @@ -34,6 +36,7 @@ After completing all 3 phases, you'll have: ### Verification **Automated**: + - `uv sync` installs all dependencies successfully - `ruff check .` passes (no linting errors) - `mypy src/` passes (strict type checking) @@ -41,6 +44,7 @@ After completing all 3 phases, you'll have: - CLI runs: `uv run research "test query"` **Manual**: + - Research reports are coherent and well-sourced - Multi-agent execution is 3-5× faster than single-agent - EDA notebooks execute without errors in Jupyter @@ -64,6 +68,7 @@ To keep scope manageable for MVP: **Strategy**: Build vertically - each phase delivers end-to-end value. **Tech Stack**: + - **Language**: Python 3.11+ - **Package Manager**: uv - **Framework**: LangGraph (multi-agent orchestration) @@ -94,6 +99,7 @@ Build a working single-agent system using the ReAct pattern (Reasoning + Acting) **Directory**: `deep-research-agent/` (complete rebuild) **Structure**: + ``` deep-research-agent/ ├── pyproject.toml # uv project config @@ -132,6 +138,7 @@ deep-research-agent/ ``` **File**: `deep-research-agent/pyproject.toml` + ```toml [project] name = "deep-research" @@ -186,6 +193,7 @@ disallow_untyped_defs = true ``` **File**: `deep-research-agent/.env.example` + ```bash # LLM Provider (choose one) OPENAI_API_KEY=sk-... @@ -210,7 +218,8 @@ TEMPERATURE=0.0 ``` **File**: `deep-research-agent/README.md` -```markdown + +````markdown # Deep Research Agent Multi-agent deep research system built with LangGraph. @@ -228,6 +237,7 @@ uv sync cp .env.example .env # Edit .env with your API keys ``` +```` ## Usage @@ -254,7 +264,8 @@ uv run ruff check . # Type check uv run mypy src/ ``` -``` + +```` #### 2. Core Agent Implementation @@ -450,7 +461,7 @@ class ReactAgent: sources.update(urls) return sorted(sources) -``` +```` **File**: `src/deep_research/agent/prompts.py` @@ -944,6 +955,7 @@ if __name__ == "__main__": ### Success Criteria #### Automated Verification: + - [x] Dependencies install: `uv sync` - [x] Linting passes: `uv run ruff check .` - [x] Type checking passes: `uv run mypy src/` @@ -952,6 +964,7 @@ if __name__ == "__main__": - [ ] Output file created: `uv run research "test" -o test.md && test -f test.md` (requires API keys) #### Manual Verification: + - [ ] Research report is coherent and structured - [ ] Sources are cited with working URLs - [ ] Facts are accurate (spot-check 3-5 key points) @@ -1007,6 +1020,7 @@ Upgrade to multi-agent architecture using LangGraph. A LeadResearcher orchestrat #### 1. Add LangGraph Dependencies **File**: `pyproject.toml` (update dependencies) + ```toml dependencies = [ # ... existing ... @@ -1330,12 +1344,14 @@ def multi( ### Success Criteria #### Automated Verification: + - [x] Dependencies install: `uv sync` - [x] Type check passes: `uv run mypy src/` - [ ] Multi-agent runs: `uv run research multi "Complex topic"` (requires API keys) - [ ] Parallel execution happens (check logs for concurrent workers) #### Manual Verification: + - [ ] Multi-agent completes 2-3× faster than single-agent - [ ] Reports are more comprehensive - [ ] Worker summaries are distinct (not redundant) @@ -1367,6 +1383,7 @@ Add data analysis capabilities. Agent takes CSV/Parquet files, performs EDA, and #### 1. Add Data Dependencies **File**: `pyproject.toml` (update) + ```toml dependencies = [ # ... existing ... @@ -1754,6 +1771,7 @@ def eda(filepath: str, goal: str | None, model: str | None) -> None: ### Success Criteria #### Automated Verification: + - [x] Dependencies install: `uv sync` - [x] CLI command exists: `uv run research eda --help` - [x] Tests pass: `uv run pytest tests/test_notebook.py tests/test_executor.py` @@ -1762,6 +1780,7 @@ def eda(filepath: str, goal: str | None, model: str | None) -> None: - [ ] Notebook generated: `uv run research eda data/car_price_prediction_.csv` (requires API keys) #### Manual Verification: + - [ ] Notebook opens in Jupyter - [ ] "Run All Cells" executes without errors - [ ] Narrative follows 7-act structure (all 7 acts present) @@ -1784,6 +1803,7 @@ def eda(filepath: str, goal: str | None, model: str | None) -> None: ### Unit Tests **File**: `tests/test_tools.py` + ```python import pytest from deep_research.tools.search import SearchTool @@ -1813,6 +1833,7 @@ async def test_fetch_tool(): ### Integration Tests **File**: `tests/test_agent.py` + ```python @pytest.mark.asyncio async def test_react_agent(): diff --git a/thoughts/shared/plans/eda-agent-multi-source-parallel.md b/thoughts/shared/plans/eda-agent-multi-source-parallel.md index f5ec28d..aa0337d 100644 --- a/thoughts/shared/plans/eda-agent-multi-source-parallel.md +++ b/thoughts/shared/plans/eda-agent-multi-source-parallel.md @@ -16,6 +16,7 @@ The `IterativeEDAAgent` (deep-research-agent/src/deep_research/agent/iterative_e 4. **Generate Outputs** (lines 435-456): Creates markdown report + executed Jupyter notebook **Key Architecture Patterns Already in Place:** + - Tool base class: `deep-research-agent/src/deep_research/tools/base.py:7-35` - LangChain tool wrapping: `deep-research-agent/src/deep_research/agent/react.py:60-85` - Parallel execution via LangGraph Send API: `deep-research-agent/src/deep_research/agent/orchestrator.py:155-184` @@ -24,32 +25,38 @@ The `IterativeEDAAgent` (deep-research-agent/src/deep_research/agent/iterative_e ### Current Limitations **Single Format Support** (iterative_eda.py:98) + - Hardcoded `pd.read_csv(filepath)` - only CSV supported - No file extension detection or validation - No support for Excel, Parquet, Pickle, JSON, TSV, etc. **Not a Tool** + - Instantiated directly by CLI (cli.py:275), not through tool system - Cannot be called by ReactAgent or WorkerAgent - Not registered in multi-agent tool suite **Missing Dependencies** (pyproject.toml:19) + - No `openpyxl` for Excel support - No `pyarrow` for Parquet support ### Key Discoveries **✅ Parallelization Already Supported!** + - Each `IterativeEDAAgent` creates own `CodeExecutor` with isolated kernel (iterative_eda.py:25) - Multiple agents can run concurrently without interference - LangGraph Send API (orchestrator.py:155-184) generalizes to any worker type **✅ Tool Pattern is Proven** + - `SearchTool` and `FetchTool` demonstrate the exact pattern we need to follow - ReactAgent (react.py:60-85) shows how to wrap tools with `@tool` decorator - All infrastructure already exists **✅ Architecture is Additive, Not Refactoring** + - No breaking changes to existing EDA functionality - Pure additions to tool system and data loading - Existing CLI remains unchanged @@ -59,6 +66,7 @@ The `IterativeEDAAgent` (deep-research-agent/src/deep_research/agent/iterative_e ### Success Criteria Users can: + 1. **Run EDA on multiple formats**: `uv run research eda data.xlsx "predict sales"` 2. **Use EDA as tool in research**: `uv run research research "Analyze sales.csv and compare with industry benchmarks"` 3. **Execute parallel EDA**: Multiple datasets analyzed concurrently via orchestrator @@ -67,11 +75,13 @@ Users can: ### Verification Steps **Automated Verification:** + - [ ] Build succeeds: `cd deep-research-agent && uv run pytest` - [ ] Type checking passes: `cd deep-research-agent && uv run mypy src` - [ ] Linting passes: `cd deep-research-agent && uv run ruff check src` **Manual Verification:** + - [ ] CLI works with CSV: `uv run research eda data/car_price_prediction_.csv "predict price"` - [ ] CLI works with Excel: `uv run research eda customers.xlsx "segment customers"` - [ ] CLI works with Parquet: `uv run research eda events.parquet "analyze behavior"` @@ -545,12 +555,14 @@ def test_supported_formats() -> None: ### Success Criteria #### Automated Verification: + - [x] Tests pass: `cd deep-research-agent && uv run pytest tests/test_data_loader.py -v` - [x] Type checking: `cd deep-research-agent && uv run mypy src/deep_research/tools/data_loader.py` - [x] Linting: `cd deep-research-agent && uv run ruff check src/deep_research/tools/data_loader.py` - [ ] Integration test: `cd deep-research-agent && uv run pytest tests/test_agent.py -v` #### Manual Verification: + - [ ] CSV works: `uv run research eda deep-research-agent/data/car_price_prediction_.csv "predict price"` - [ ] Create test Excel file and verify: `uv run research eda test.xlsx "analyze data"` - [ ] Create test Parquet file and verify: `uv run research eda test.parquet "analyze data"` @@ -880,12 +892,14 @@ def test_eda_tool_properties() -> None: ### Success Criteria #### Automated Verification: + - [x] Tests pass: `cd deep-research-agent && uv run pytest tests/test_eda_tool.py -v` (Note: 1 test requires API key) - [x] Integration tests: `cd deep-research-agent && uv run pytest tests/ -v` (Most pass, some require API key) - [x] Type checking: `cd deep-research-agent && uv run mypy src` - [x] Linting: `cd deep-research-agent && uv run ruff check src` #### Manual Verification: + - [ ] ReactAgent can call eda tool: Test with query that mentions analyzing a CSV file - [ ] Tool returns formatted insights: Verify output contains insights and notebook path - [ ] Agent can reason about results: Check that agent uses insights in final answer @@ -1068,11 +1082,13 @@ Return JSON array with this structure: ### Success Criteria #### Automated Verification: + - [x] Parallel tests pass: `cd deep-research-agent && uv run pytest tests/test_parallel_eda.py -v` (Note: Tests require API key) - [x] No kernel conflicts: Tests complete without errors (architecture supports isolation) - [ ] Performance improvement: Parallel execution is faster than sequential (< 1.5x sequential time) (Requires API key to verify) #### Manual Verification: + - [ ] Two concurrent EDAs work: Run test manually, verify both complete - [ ] No kernel crashes: Check logs for kernel shutdown errors - [ ] Results are independent: Verify each analysis produces correct insights for its dataset @@ -1269,7 +1285,7 @@ async def test_pure_data_analysis_workflow() -> None: **File**: `deep-research-agent/README.md` **Changes**: Add combined workflow examples -```markdown +````markdown # Deep Research Agent ... (existing content) ... @@ -1300,6 +1316,7 @@ uv run research eda data/customers.xlsx "segment customers" # Analyze Parquet uv run research eda data/events.parquet "analyze user behavior" ``` +```` ### Combined Workflows @@ -1314,12 +1331,14 @@ uv run research research "Compare sales patterns in Q1.csv and Q2.csv, then rese ``` The system automatically: + - Detects data analysis needs in your query - Spawns parallel workers for data analysis and web research - Synthesizes insights from both sources - Generates Jupyter notebooks with executable analysis ... (rest of README) ... + ``` ### Success Criteria @@ -1460,3 +1479,4 @@ After deployment, monitor: **Risk Level**: Low (all patterns proven, changes are additive) **Team Size**: 1-2 developers **Dependencies**: None (all patterns exist in codebase) +``` diff --git a/thoughts/shared/plans/go-deep-research-agent-sota.md b/thoughts/shared/plans/go-deep-research-agent-sota.md index 9d7f823..8be6fe3 100644 --- a/thoughts/shared/plans/go-deep-research-agent-sota.md +++ b/thoughts/shared/plans/go-deep-research-agent-sota.md @@ -3,6 +3,7 @@ ## Overview Transform the existing Go research agent from a basic ReAct implementation into a state-of-the-art deep research system incorporating: + - **Context Folding** (AgentFold-style proactive context management) - **Multi-Perspective Research** (STORM-style expert conversations) - **DAG-Based Planning** (parallel task execution with dependencies) @@ -17,17 +18,17 @@ This plan uses the **existing LLM client** (OpenRouter via `internal/llm/client. ### What Exists -| Component | Location | Current State | -|-----------|----------|---------------| -| Orchestrator | `internal/orchestrator/orchestrator.go` | Basic coordination, complexity-based worker allocation | -| Planner | `internal/orchestrator/planner.go` | Flat task decomposition (no DAG, no dependencies) | -| Worker Pool | `internal/orchestrator/pool.go` | Parallel execution with goroutines | -| ReAct Agent | `internal/agent/react.go` | Single-agent ReAct loop with `//` | -| Synthesizer | `internal/orchestrator/synthesizer.go` | Simple concatenation synthesis | -| Tools | `internal/tools/` | `search` (Brave API), `fetch` (HTML scraping) | -| LLM Client | `internal/llm/client.go` | OpenRouter client with streaming support | -| Session | `internal/session/session.go` | Session state, worker context tracking | -| Events | `internal/events/` | Event bus for UI updates | +| Component | Location | Current State | +| ------------ | --------------------------------------- | -------------------------------------------------------- | +| Orchestrator | `internal/orchestrator/orchestrator.go` | Basic coordination, complexity-based worker allocation | +| Planner | `internal/orchestrator/planner.go` | Flat task decomposition (no DAG, no dependencies) | +| Worker Pool | `internal/orchestrator/pool.go` | Parallel execution with goroutines | +| ReAct Agent | `internal/agent/react.go` | Single-agent ReAct loop with `//` | +| Synthesizer | `internal/orchestrator/synthesizer.go` | Simple concatenation synthesis | +| Tools | `internal/tools/` | `search` (Brave API), `fetch` (HTML scraping) | +| LLM Client | `internal/llm/client.go` | OpenRouter client with streaming support | +| Session | `internal/session/session.go` | Session state, worker context tracking | +| Events | `internal/events/` | Event bus for UI updates | ### Key Discoveries @@ -117,6 +118,7 @@ We'll implement in **6 phases**, each building on the previous. Each phase produ ## Phase 1: Context Manager Foundation ### Overview + Implement the Context Manager with multi-scale state summaries and token budget management. This is the most critical component for long-horizon research tasks. ### Changes Required @@ -486,11 +488,13 @@ func TestBuildMessages(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./internal/context/...` - [x] Tests pass: `go test ./internal/context/...` - [x] No linting errors: `go vet ./internal/context/...` #### Manual Verification: + - [x] Context manager initializes without panic - [x] Token counting produces reasonable estimates - [x] ShouldFold triggers at correct threshold @@ -500,6 +504,7 @@ func TestBuildMessages(t *testing.T) { ## Phase 2: DAG-Based Planning System ### Overview + Replace flat task lists with a directed acyclic graph structure supporting parallel execution with dependencies. ### Changes Required @@ -900,11 +905,13 @@ func TestDAGTopologicalOrder(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./internal/planning/...` - [x] Tests pass: `go test ./internal/planning/...` - [x] No linting errors: `go vet ./internal/planning/...` #### Manual Verification: + - [x] DAG correctly tracks dependencies - [x] Ready tasks are computed correctly as dependencies complete - [x] Perspective discovery returns reasonable expert viewpoints @@ -914,6 +921,7 @@ func TestDAGTopologicalOrder(t *testing.T) { ## Phase 3: Specialized Search Agent ### Overview + Create an iterative search agent that generates follow-up queries based on knowledge gaps. ### Changes Required @@ -1190,10 +1198,12 @@ func dedupe(items []string) []string { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./internal/agents/...` - [x] Tests pass: `go test ./internal/agents/...` #### Manual Verification: + - [x] Search agent generates queries from perspectives - [x] Facts are extracted with sources - [x] Gaps are identified correctly @@ -1204,6 +1214,7 @@ func dedupe(items []string) []string { ## Phase 4: Analysis Agent ### Overview + Create an agent that cross-validates facts and identifies contradictions. ### Changes Required @@ -1454,11 +1465,13 @@ func parseKnowledgeGaps(content string) []KnowledgeGap { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./internal/agents/...` - [x] Tests pass: `go test ./internal/agents/...` (15 analysis tests) - [x] No linting errors: `go vet ./internal/agents/...` #### Manual Verification: + - [x] Facts are cross-validated with scores - [x] Contradictions are detected - [x] Knowledge gaps are identified with suggested queries @@ -1468,6 +1481,7 @@ func parseKnowledgeGaps(content string) []KnowledgeGap { ## Phase 5: Synthesis Agent ### Overview + Create an agent that generates structured reports with proper citations. ### Changes Required @@ -1699,10 +1713,12 @@ func defaultOutline() []string { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./internal/agents/...` - [x] Tests pass: `go test ./internal/agents/...` #### Manual Verification: + - [x] Outline is generated based on perspectives - [x] Sections reference sources - [x] Final report is well-structured with citations @@ -1712,6 +1728,7 @@ func defaultOutline() []string { ## Phase 6: Deep Research Orchestrator Integration ### Overview + Replace the existing orchestrator with the new deep research orchestrator that coordinates all components. ### Changes Required @@ -2003,11 +2020,13 @@ func (o *Orchestrator) DeepResearch(ctx context.Context, query string) (*DeepRes ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `go build ./...` - [x] Tests pass: `go test ./...` (Phase 6 related packages pass; unrelated panel test has pre-existing failure) - [x] No linting errors: `go vet ./...` #### Manual Verification: + - [ ] Running deep research discovers perspectives - [ ] DAG tasks execute in correct order - [ ] Context folding occurs during long research diff --git a/thoughts/shared/plans/go-deep-research-agent-v1.md b/thoughts/shared/plans/go-deep-research-agent-v1.md index 3127be4..2f2eed9 100644 --- a/thoughts/shared/plans/go-deep-research-agent-v1.md +++ b/thoughts/shared/plans/go-deep-research-agent-v1.md @@ -17,6 +17,7 @@ Build a Go-based deep research agent with an interactive REPL interface, multi-w ## Desired End State A working CLI tool (`go-research`) that: + 1. Starts an interactive REPL with readline support 2. Accepts `/fast ` for single-worker quick research 3. Accepts `/deep ` for multi-worker parallel research @@ -25,6 +26,7 @@ A working CLI tool (`go-research`) that: 6. Supports session continuation with `/expand`, `/rerun`, `/recompile` ### Verification: + ```bash # Build succeeds cd go-research && go build ./... @@ -58,6 +60,7 @@ cp go-research/.env.example go-research/.env ``` **File**: `go-research/.env.example` + ``` # LLM Provider OPENROUTER_API_KEY=sk-or-... @@ -76,6 +79,7 @@ TEMPERATURE=0.0 ``` **Usage during development and validation:** + ```bash # Load environment and run cd go-research @@ -85,6 +89,7 @@ go run ./cmd/research **Validation with real API calls:** When validating phases that include manual verification with real API calls (Phases 2+), use the `.env` file with valid API keys: + ```bash cd go-research && source .env && go run ./cmd/research # Then run actual research queries to verify functionality @@ -95,6 +100,7 @@ cd go-research && source .env && go run ./cmd/research ## Phase 1: Core Infrastructure ### Overview + Set up project structure, configuration, LLM client, and event bus. ### Changes Required: @@ -231,6 +237,7 @@ Copy the event bus implementation from architecture document (lines 351-408). **File**: `go-research/internal/llm/client.go` Copy the LLM client from architecture document (lines 1241-1403), but: + - Use `cfg.Model` instead of hardcoded `modelID` - Accept config as constructor parameter @@ -302,12 +309,14 @@ func main() { ### Success Criteria: #### Automated Verification: + - [x] `cd go-research && go mod tidy` succeeds - [x] `go build ./...` succeeds - [x] `go vet ./...` passes - [x] `OPENROUTER_API_KEY=test BRAVE_API_KEY=test go run ./cmd/research` prints config info #### Manual Verification: + - [x] Event bus can publish and receive events (write a quick test in main) - [x] LLM client structure compiles (actual API calls tested in Phase 2) @@ -316,6 +325,7 @@ func main() { ## Phase 2: Agent & Tools ### Overview + Implement the search tool (Brave API), fetch tool (web scraping), tool registry, and ReAct agent loop. ### Changes Required: @@ -614,6 +624,7 @@ Copy session types from architecture document (lines 161-261). **File**: `go-research/internal/agent/react.go` Copy ReAct agent from architecture document (lines 1409-1623), with modifications: + - Accept config and tools registry as constructor parameters - Use the centralized model config @@ -678,13 +689,16 @@ func (w *Worker) Research(ctx context.Context, objective string) (session.Worker ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes #### Manual Verification (use `source .env` with valid API keys): + ```bash cd go-research && source .env && go run ./cmd/research ``` + - [ ] Search tool returns results for a test query - [ ] Fetch tool extracts text from a test URL - [ ] ReAct agent can complete a simple research query (e.g., `/fast What is ReAct?`) @@ -694,6 +708,7 @@ cd go-research && source .env && go run ./cmd/research ## Phase 3: Orchestration ### Overview + Implement query complexity analysis, task decomposition, worker pool with goroutines, and result synthesizer. ### Changes Required: @@ -903,10 +918,12 @@ Copy orchestrator from architecture document (lines 1098-1235). ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes #### Manual Verification: + - [ ] Planner returns reasonable complexity scores - [ ] Task decomposition creates appropriate sub-tasks - [ ] Worker pool executes tasks in parallel (visible via event output) @@ -917,6 +934,7 @@ Copy orchestrator from architecture document (lines 1098-1235). ## Phase 4: Session Management ### Overview + Implement session persistence, versioning, and context building for continuation. ### Changes Required: @@ -1153,10 +1171,12 @@ func min(a, b int) int { ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes #### Manual Verification: + - [ ] Sessions save to JSON files correctly - [ ] Sessions load from disk correctly - [ ] Session versioning creates linked sessions @@ -1167,6 +1187,7 @@ func min(a, b int) int { ## Phase 5: Obsidian Integration ### Overview + Implement vault directory structure, worker markdown files, report versioning, and session MOC. ### Changes Required: @@ -1338,10 +1359,12 @@ generated: {{.UpdatedAt.Format "2006-01-02T15:04:05Z07:00"}} ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes #### Manual Verification: + - [ ] Sessions write to vault directory structure - [ ] Worker files contain proper frontmatter and content - [ ] Report files are versioned correctly @@ -1353,6 +1376,7 @@ generated: {{.UpdatedAt.Format "2006-01-02T15:04:05Z07:00"}} ## Phase 6: Interactive REPL ### Overview + Implement readline integration, command parser, router, all command handlers, and tab completion. ### Changes Required: @@ -1674,14 +1698,17 @@ func main() { ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes - [x] Binary can be executed: `./go-research --help` or similar #### Manual Verification (use `source .env` with valid API keys): + ```bash cd go-research && source .env && go run ./cmd/research ``` + - [ ] REPL starts with welcome message - [ ] Tab completion works for commands - [ ] `/help` shows all commands @@ -1696,6 +1723,7 @@ cd go-research && source .env && go run ./cmd/research ## Phase 7: Polish ### Overview + Add streaming output, progress indicators, comprehensive error handling, and graceful shutdown. ### Changes Required: @@ -1863,6 +1891,7 @@ func (r *REPL) shutdown() { #### 4. Error Handling Improvements Ensure all errors are: + - Wrapped with context - Displayed clearly to user - Logged if verbose mode is on @@ -1870,11 +1899,13 @@ Ensure all errors are: ### Success Criteria: #### Automated Verification: + - [x] `go build ./...` succeeds - [x] `go vet ./...` passes - [x] `go test ./...` passes (if tests added) #### Manual Verification: + - [ ] Streaming responses appear character-by-character - [ ] Progress spinners show during long operations - [ ] Ctrl+C triggers graceful shutdown @@ -1887,12 +1918,14 @@ Ensure all errors are: ## Testing Strategy ### Unit Tests: + - Event bus publish/subscribe - Session serialization/deserialization - Parser command parsing - Tool argument validation ### Integration Tests: + - Full research flow (mock LLM responses) - Session save/load cycle - Obsidian vault generation @@ -1900,6 +1933,7 @@ Ensure all errors are: ### Manual Testing Steps: **Setup (required for real API testing):** + ```bash cd go-research # Ensure .env has valid API keys (copy from .env.example if needed) @@ -1908,6 +1942,7 @@ go run ./cmd/research ``` **Test sequence:** + 1. Start REPL, verify welcome message 2. Run `/fast What is Go?` - verify single-worker execution 3. Run `/deep How do modern web frameworks work?` - verify multi-worker parallel execution diff --git a/thoughts/shared/plans/go-research-event-sourced-storage.md b/thoughts/shared/plans/go-research-event-sourced-storage.md index 7cd7c73..454e8c3 100644 --- a/thoughts/shared/plans/go-research-event-sourced-storage.md +++ b/thoughts/shared/plans/go-research-event-sourced-storage.md @@ -8,12 +8,12 @@ Implement an event-sourced adapter-based storage architecture for the go-researc ### What Exists -| Component | Location | Issue | -|-----------|----------|-------| -| Event Bus | `internal/events/bus.go:8-66` | Fire-and-forget, events dropped if buffer full | -| Session Store | `internal/session/store.go:18-147` | Snapshot-only, no intermediate state | -| Orchestrator | `internal/orchestrator/deep.go:25-35` | Stateless, state in local variables | -| DAG | `internal/planning/dag.go:60-69` | Direct mutation, no history | +| Component | Location | Issue | +| ------------- | ------------------------------------- | ---------------------------------------------- | +| Event Bus | `internal/events/bus.go:8-66` | Fire-and-forget, events dropped if buffer full | +| Session Store | `internal/session/store.go:18-147` | Snapshot-only, no intermediate state | +| Orchestrator | `internal/orchestrator/deep.go:25-35` | Stateless, state in local variables | +| DAG | `internal/planning/dag.go:60-69` | Direct mutation, no history | ### Key Discoveries @@ -50,6 +50,7 @@ After implementation: ### Verification After implementation: + - `go build ./...` compiles - `go test ./...` passes - `/deep ` works with event persistence @@ -196,10 +197,12 @@ func (e BaseEvent) GetTimestamp() time.Time { return e.Timestamp } ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./internal/core/...` - [x] No linting errors: `cd go-research && go vet ./internal/core/...` #### Manual Verification: + - [x] Port interfaces are well-documented - [x] No circular dependencies in core package @@ -474,10 +477,12 @@ type ReportSnapshot struct { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./internal/core/domain/events/...` - [x] No linting errors: `cd go-research && go vet ./internal/core/domain/events/...` #### Manual Verification: + - [x] All event types have JSON tags for serialization - [x] Events follow naming convention: `Event` - [x] All events embed BaseEvent @@ -1166,10 +1171,12 @@ func (c *events.CostBreakdown) Add(other events.CostBreakdown) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./internal/core/domain/aggregate/...` - [x] Unit tests pass: `cd go-research && go test ./internal/core/domain/aggregate/...` #### Manual Verification: + - [x] Commands validate preconditions correctly - [x] Events apply to state correctly - [x] State can be reconstructed from event replay @@ -1640,10 +1647,12 @@ func TestEventStore_GetAllAggregateIDs(t *testing.T) { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./internal/adapters/storage/...` - [x] Tests pass: `cd go-research && go test ./internal/adapters/storage/...` #### Manual Verification: + - [x] Events persist to disk as JSON files - [x] Events load correctly with type discrimination - [x] Version conflict detection works @@ -2152,10 +2161,12 @@ func (o *DeepOrchestratorES) buildAnalysisResultFromState(state *aggregate.Resea ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./internal/orchestrator/...` - [x] Tests pass: `cd go-research && go test ./internal/orchestrator/...` #### Manual Verification: + - [x] Events persist during research execution - [x] Research can be interrupted and state is preserved - [x] State can be reconstructed from events @@ -2620,11 +2631,13 @@ func main() { ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd go-research && go build ./...` - [x] Tests pass: `cd go-research && go test ./...` - [x] Type check passes: `cd go-research && go vet ./...` #### Manual Verification: + - [x] `/resume ` continues interrupted research - [x] `/sessions-es` shows all event-sourced sessions - [x] Legacy sessions can be migrated @@ -2687,6 +2700,7 @@ None - this is additive. Legacy sessions can be migrated. ### Configuration Add to config: + ```go type Config struct { // ... existing ... @@ -2699,6 +2713,7 @@ type Config struct { ### Rollback If issues occur: + 1. Legacy sessions still exist in original location 2. Event store can be deleted to revert 3. Old orchestrator remains available diff --git a/thoughts/shared/plans/go-research-hexagonal-refactoring.md b/thoughts/shared/plans/go-research-hexagonal-refactoring.md index 1dc9f19..e1cb8e5 100644 --- a/thoughts/shared/plans/go-research-hexagonal-refactoring.md +++ b/thoughts/shared/plans/go-research-hexagonal-refactoring.md @@ -3,6 +3,7 @@ ## Overview Refactor the go-research deep research agent to implement Ports/Adapters (Hexagonal) Architecture, separating: + 1. The agent core from the CLI (REPL) 2. The agent core from storage (sessions, reports) 3. Enable swappable frontends (CLI first, web/API later) @@ -12,24 +13,26 @@ Refactor the go-research deep research agent to implement Ports/Adapters (Hexago ### What Exists -| Component | Location | Coupling Level | Issues | -|-----------|----------|---------------|--------| -| Event Bus | `internal/events/` | **Low** | Clean pub/sub, already abstracted | -| LLM Client | `internal/llm/client.go` | **Low** | Has `ChatClient` interface | -| Tool Executor | `internal/tools/registry.go` | **Low** | Has `ToolExecutor` interface | -| CLI/REPL | `internal/repl/` | **Medium** | Direct orchestrator instantiation in handlers | -| Storage | `internal/session/store.go` | **High** | Session struct is both domain AND storage schema | -| Obsidian | `internal/obsidian/` | **Medium** | VaultWriter interface exists but limited | +| Component | Location | Coupling Level | Issues | +| ------------- | ---------------------------- | -------------- | ------------------------------------------------ | +| Event Bus | `internal/events/` | **Low** | Clean pub/sub, already abstracted | +| LLM Client | `internal/llm/client.go` | **Low** | Has `ChatClient` interface | +| Tool Executor | `internal/tools/registry.go` | **Low** | Has `ToolExecutor` interface | +| CLI/REPL | `internal/repl/` | **Medium** | Direct orchestrator instantiation in handlers | +| Storage | `internal/session/store.go` | **High** | Session struct is both domain AND storage schema | +| Obsidian | `internal/obsidian/` | **Medium** | VaultWriter interface exists but limited | ### Key Discoveries **Strong Points:** + - `llm.ChatClient` interface already exists (`internal/llm/client.go:19-25`) - `tools.ToolExecutor` interface already exists (`internal/tools/registry.go:16-19`) - Event bus enables loose coupling for progress updates - Options pattern used for dependency injection in orchestrators **Weak Points:** + - `Session` struct conflates domain model and storage schema - Handlers directly instantiate orchestrators (`handlers/start.go:145`) - Result→Session transformation logic lives in handlers @@ -120,6 +123,7 @@ After refactoring: ### Verification After implementation: + - `go build ./...` compiles without errors - `go test ./...` all tests pass - Existing functionality preserved: @@ -299,10 +303,12 @@ type EventSubscriber interface { ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `go build ./internal/core/ports/...` - [ ] No linting errors: `go vet ./internal/core/ports/...` #### Manual Verification: + - [ ] All port interfaces are well-documented - [ ] Interface methods match current implementations' signatures @@ -628,11 +634,13 @@ func generateID() string { ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `go build ./internal/core/domain/...` - [ ] Tests pass: `go test ./internal/core/domain/...` - [ ] No linting errors: `go vet ./internal/core/domain/...` #### Manual Verification: + - [ ] Domain models have no `json` tags - [ ] Domain models have no infrastructure dependencies - [ ] All types are well-documented @@ -1161,11 +1169,13 @@ func (b *InMemoryBus) Close() { ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `go build ./internal/adapters/...` - [ ] Tests pass: `go test ./internal/adapters/...` - [ ] No linting errors: `go vet ./internal/adapters/...` #### Manual Verification: + - [ ] All adapters implement their port interfaces (compile-time verification) - [ ] Existing functionality preserved when adapters wrap current implementations @@ -1395,11 +1405,13 @@ func (s *SessionServiceImpl) Delete(ctx context.Context, id string) error { ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `go build ./internal/core/service/...` - [ ] Tests pass: `go test ./internal/core/service/...` - [ ] No linting errors: `go vet ./internal/core/service/...` #### Manual Verification: + - [ ] Services implement port interfaces - [ ] Services only depend on ports, not concrete implementations - [ ] Business logic is properly encapsulated @@ -1626,11 +1638,13 @@ func main() { ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `go build ./...` - [ ] Tests pass: `go test ./...` - [ ] No linting errors: `go vet ./...` #### Manual Verification: + - [ ] `/fast` command executes research correctly - [ ] `/deep` command executes multi-perspective research - [ ] `/sessions` lists all sessions diff --git a/thoughts/shared/plans/interactive-cli-agentic-research.md b/thoughts/shared/plans/interactive-cli-agentic-research.md index 6dff613..d50e507 100644 --- a/thoughts/shared/plans/interactive-cli-agentic-research.md +++ b/thoughts/shared/plans/interactive-cli-agentic-research.md @@ -3,6 +3,7 @@ ## Overview Transform the go-research CLI into a Claude Code-style interactive experience where users can: + 1. Ask questions about existing research (answered from chat history + reports) 2. Expand on specific topics with context injection 3. Have queries intelligently routed to the appropriate handler via LLM classification @@ -12,6 +13,7 @@ This plan implements the architecture from `thoughts/shared/research/2025-12-03_ ## Current State Analysis ### Existing Infrastructure + - **Router** (`internal/repl/router.go:46-74`): Routes natural language → expand (if session) or storm (if no session) - **ExpandHandler** (`internal/repl/handlers/expand.go:21-88`): Creates versioned sessions with continuation context - **Session Context** (`internal/session/context.go:9-45`): Builds text summary from prior session (report, insights, sources) @@ -19,6 +21,7 @@ This plan implements the architecture from `thoughts/shared/research/2025-12-03_ - **Obsidian Writer** (`internal/obsidian/writer.go:27-62`): Creates directories for insights/ but never populates them ### Key Gaps + 1. No sub-insight capture during ThinkDeep research 2. Insights directory created but empty - no insight files written 3. No question-answering from existing content @@ -28,12 +31,14 @@ This plan implements the architecture from `thoughts/shared/research/2025-12-03_ ## Desired End State After implementation: + 1. ThinkDeep captures insights per search result and saves them to Obsidian 2. Users can ask questions and get answers from chat history + all reports in session chain 3. Natural language queries are classified by LLM into research/question/expand intents 4. Expansion injects prior context (findings, visited URLs, existing report) into supervisor state ### Verification + - Run ThinkDeep research → insights/ directory populated with files - Ask question about report → get answer without new research - Type ambiguous query → LLM classifies intent correctly @@ -88,6 +93,7 @@ type SubInsight struct { ``` Add methods: + ```go func (s *SupervisorState) AddSubInsight(insight SubInsight) func (s *SupervisorState) GetSubInsights() []SubInsight @@ -99,6 +105,7 @@ func (s *SupervisorState) GetSubInsights() []SubInsight **Changes**: After search tool execution, extract insights from results Add function to extract insights from search result content: + ```go // extractInsightsFromSearch parses search results and extracts structured insights. // Called after each search tool execution with the raw search results. @@ -106,6 +113,7 @@ func extractInsightsFromSearch(topic string, searchResult string, researcherNum ``` This function should: + 1. Parse the search result content 2. Extract key findings (facts, data points, claims) 3. Create SubInsight for each distinct finding @@ -117,6 +125,7 @@ This function should: **Changes**: Collect insights from sub-researcher results In `executeParallelResearch()` around line 395-417, after processing each sub-researcher result: + ```go // After accumulating notes and raw notes, also accumulate insights for _, insight := range result.Insights { @@ -155,11 +164,13 @@ Pass through from supervisor result to orchestrator result. ### Success Criteria #### Automated Verification + - [x] Build succeeds: `go build ./...` - [x] Tests pass: `go test ./internal/think_deep/... ./internal/agents/...` - [x] No linting errors: `golangci-lint run ./internal/think_deep/... ./internal/agents/...` #### Manual Verification + - [ ] Run ThinkDeep research and verify `ThinkDeepResult.SubInsights` is populated - [ ] Each insight has valid Topic, Finding, SourceURL fields - [ ] Insight count correlates with number of search results processed @@ -205,6 +216,7 @@ func (w *Writer) WriteInsights(sessionDir string, insights []think_deep.SubInsig **Changes**: Call WriteInsights in main Write method In `Write()` method after writing workers (around line 49): + ```go // Write insight files if len(sess.Insights) > 0 { @@ -221,6 +233,7 @@ if len(sess.Insights) > 0 { **Changes**: Add Insights section to sessionMOCTemplate Add after the Sources section: + ```go ## Insights @@ -239,6 +252,7 @@ Add after the Sources section: **Changes**: Pass insights to Obsidian writer After research completion, call the extended write method: + ```go if err := ctx.Obsidian.WriteWithInsights(newSess, result.SubInsights); err != nil { ctx.Renderer.Error(fmt.Errorf("save to obsidian: %w", err)) @@ -248,11 +262,13 @@ if err := ctx.Obsidian.WriteWithInsights(newSess, result.SubInsights); err != ni ### Success Criteria #### Automated Verification + - [x] Build succeeds: `go build ./...` - [x] Tests pass: `go test ./internal/obsidian/...` - [x] Existing Obsidian tests still pass (no regression) #### Manual Verification + - [ ] Run ThinkDeep research → insights/ directory contains insight_001.md, insight_002.md, etc. - [ ] Each insight file has valid YAML frontmatter - [ ] Session MOC links to all insight files @@ -392,11 +408,13 @@ add("question", &QuestionHandler{}, "/question ", "Ask a question about e ### Success Criteria #### Automated Verification + - [ ] Build succeeds: `go build ./...` - [ ] Tests pass: `go test ./internal/repl/handlers/...` - [ ] No linting errors: `golangci-lint run ./internal/repl/handlers/...` #### Manual Verification + - [ ] Start research on a topic - [ ] Run `/question What did the report say about X?` - [ ] Get answer synthesized from report content @@ -508,6 +526,7 @@ type Context struct { **Changes**: Replace simple session check with classifier In `Route()` method, replace lines 59-73: + ```go // Natural language with session - classify intent if r.ctx.Session != nil && r.ctx.Classifier != nil { @@ -584,11 +603,13 @@ Default to a fast, cheap model like Haiku for classification. ### Success Criteria #### Automated Verification + - [ ] Build succeeds: `go build ./...` - [ ] Tests pass: `go test ./internal/repl/...` - [ ] No linting errors: `golangci-lint run ./internal/repl/...` #### Manual Verification + - [ ] Start research on "Swedish political parties" - [ ] Type "What did you find about Moderaterna?" → routes to question handler - [ ] Type "Tell me more about tax policies" → routes to expand handler @@ -678,6 +699,7 @@ func WithExpansionFocus(topic string) ThinkDeepOption { **Changes**: Use injection context when initializing supervisor In `Research()` method, after creating supervisor state (around line 189-207): + ```go // Apply injection context if provided if td.injectionContext != nil { @@ -790,11 +812,13 @@ func (td *ThinkDeep) enhanceBriefForExpansion(brief string, injection *think_dee ### Success Criteria #### Automated Verification + - [ ] Build succeeds: `go build ./...` - [ ] Tests pass: `go test ./internal/orchestrator/... ./internal/think_deep/...` - [ ] No linting errors: `golangci-lint run ./internal/orchestrator/... ./internal/think_deep/...` #### Manual Verification + - [ ] Start research on "Swedish political parties economic policies" - [ ] Note the number of sources and iterations - [ ] Expand on "corporate tax proposals" @@ -810,22 +834,22 @@ func (td *ThinkDeep) enhanceBriefForExpansion(brief string, injection *think_dee ### Unit Tests -| Phase | Test File | Key Tests | -|-------|-----------|-----------| -| 1 | `internal/think_deep/state_test.go` | SubInsight struct, Add/Get methods | -| 1 | `internal/agents/sub_researcher_test.go` | Insight extraction from search results | -| 2 | `internal/obsidian/writer_test.go` | WriteInsight, WriteInsights methods | -| 3 | `internal/repl/handlers/question_test.go` | QA context building, answer generation | -| 4 | `internal/repl/classifier_test.go` | Classification accuracy for each intent | -| 5 | `internal/think_deep/injection_test.go` | InjectionContext struct, builder functions | +| Phase | Test File | Key Tests | +| ----- | ----------------------------------------- | ------------------------------------------ | +| 1 | `internal/think_deep/state_test.go` | SubInsight struct, Add/Get methods | +| 1 | `internal/agents/sub_researcher_test.go` | Insight extraction from search results | +| 2 | `internal/obsidian/writer_test.go` | WriteInsight, WriteInsights methods | +| 3 | `internal/repl/handlers/question_test.go` | QA context building, answer generation | +| 4 | `internal/repl/classifier_test.go` | Classification accuracy for each intent | +| 5 | `internal/think_deep/injection_test.go` | InjectionContext struct, builder functions | ### Integration Tests -| Phase | Test File | Key Tests | -|-------|-----------|-----------| -| 1-2 | `internal/e2e/think_deep_insights_test.go` | Full ThinkDeep run captures and persists insights | -| 3-4 | `internal/e2e/interactive_flow_test.go` | Research → Question → Expand flow | -| 5 | `internal/e2e/expansion_context_test.go` | Expansion uses prior context correctly | +| Phase | Test File | Key Tests | +| ----- | ------------------------------------------ | ------------------------------------------------- | +| 1-2 | `internal/e2e/think_deep_insights_test.go` | Full ThinkDeep run captures and persists insights | +| 3-4 | `internal/e2e/interactive_flow_test.go` | Research → Question → Expand flow | +| 5 | `internal/e2e/expansion_context_test.go` | Expansion uses prior context correctly | ### Manual Testing Steps @@ -861,6 +885,7 @@ func (td *ThinkDeep) enhanceBriefForExpansion(brief string, injection *think_dee ## Migration Notes No migration needed - this is additive functionality: + - Existing sessions work unchanged - New fields (SubInsights) default to empty - Classification is opt-in via Classifier presence in context diff --git a/thoughts/shared/plans/interactive-research-repl-implementation.md b/thoughts/shared/plans/interactive-research-repl-implementation.md index fe3877d..d6798b0 100644 --- a/thoughts/shared/plans/interactive-research-repl-implementation.md +++ b/thoughts/shared/plans/interactive-research-repl-implementation.md @@ -60,6 +60,7 @@ The deep-research agent has a **solid foundation** for interactive mode: ### Key Discoveries **File References:** + - Session state: `deep-research-agent/src/deep_research/agent/state.py:60-94` - CLI commands: `deep-research-agent/src/deep_research/cli.py:68-651` - Obsidian writer: `deep-research-agent/src/deep_research/obsidian/writer.py:51-77` @@ -94,12 +95,14 @@ A production-ready interactive CLI that enables: ### Verification Criteria **Automated Verification:** + - [x] Build succeeds: `cd deep-research-agent && uv sync` ✅ - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ (9 source files) - [x] Tests pass: `uv run pytest tests/test_repl*.py -v` ✅ (31 tests passing) - [x] Linting passes: `uv run ruff check src/deep_research/repl/` ✅ **Manual Verification:** + - [ ] REPL starts successfully: `uv run research interactive` - [ ] Can start new research session and see live progress - [ ] Can continue from previous session (context loaded correctly) @@ -128,17 +131,20 @@ To prevent scope creep, the following are **explicitly out of scope**: ### Architecture Decision **Technology Stack:** + - **REPL Framework**: `prompt_toolkit` (not cmd2) - Native async support (critical for long-running research) - Rich customization (completion, history, styling) - Used by production tools (IPython, ptpython) **State Management:** + - **In-Memory Cache**: `SessionManager` maintains active session - **Single Source of Truth**: Obsidian vault (file-based persistence) - **Sync Strategy**: Update vault on session start/complete/switch **Command Parsing:** + - **Parser**: argparse with `exit_on_error=False` - **Tokenizer**: shlex for shell-like input handling - **Aliases**: Built into argparse subparsers @@ -161,6 +167,7 @@ To prevent scope creep, the following are **explicitly out of scope**: ## Phase 1: REPL Foundation ### Overview + Build the basic REPL loop with command parsing infrastructure. No research execution yet - focus on getting the interactive shell working with command parsing and validation. ### Changes Required @@ -182,6 +189,7 @@ dependencies = [ **New Directory**: `deep-research-agent/src/deep_research/repl/` Create the following files: + - `__init__.py` - Module exports - `parser.py` - Command parser implementation - `shell.py` - Main REPL loop @@ -481,12 +489,14 @@ def test_parse_empty_input() -> None: ### Success Criteria #### Automated Verification: + - [x] Dependencies install: `cd deep-research-agent && uv sync` ✅ - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ - [x] Parser tests pass: `uv run pytest tests/test_repl_parser.py -v` ✅ (16 tests) - [x] Linting passes: `uv run ruff check src/deep_research/repl/` ✅ #### Manual Verification: + - [ ] REPL starts: `uv run research interactive` - [ ] Welcome message displays correctly - [ ] Can parse `start` command and show parsed args @@ -528,6 +538,7 @@ research> exit ## Phase 2: Session Management ### Overview + Implement in-memory session tracking and lifecycle management. Enable starting new research sessions and tracking active session state. ### Changes Required @@ -885,12 +896,14 @@ def test_set_and_clear_active_session(manager): ### Success Criteria #### Automated Verification: + - [x] Dependencies synced: `uv sync` ✅ - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ - [x] Session manager tests pass: `uv run pytest tests/test_repl_session_manager.py -v` ✅ (3 tests) - [x] Linting passes: `uv run ruff check src/deep_research/repl/` ✅ #### Manual Verification: + - [ ] Can start research session via REPL - [ ] Live progress displays during research - [ ] Session saved to Obsidian vault after completion @@ -929,6 +942,7 @@ research> exit ## Phase 3: Session Continuation ### Overview + Enable continuation of previous sessions with new queries, creating versioned sessions (v1 → v2 → v3). Implement context compression from previous session for coherent continuation. ### Changes Required @@ -1299,11 +1313,13 @@ def test_build_worker_expansion_context_not_found(): ### Success Criteria #### Automated Verification: + - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ - [x] Context tests pass: `uv run pytest tests/test_repl_context.py -v` ✅ (3 tests) - [x] All REPL tests pass: `uv run pytest tests/test_repl*.py -v` ✅ (22 tests at this phase) #### Manual Verification: + - [ ] Can continue from previous session (creates v2) - [ ] Continuation context includes insights and report summary - [ ] Parent session ID correctly set (e.g., `session_abc123_v1`) @@ -1338,6 +1354,7 @@ research> expand --worker task_1 Research quantum algorithms in detail ## Phase 4: Multi-Session Management ### Overview + Enable tracking and switching between multiple concurrent sessions. Implement session listing, switching, and persistence of active session across REPL restarts. ### Changes Required @@ -1676,10 +1693,12 @@ async def interactive_repl() -> None: ### Success Criteria #### Automated Verification: + - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ - [x] All tests pass: `uv run pytest tests/test_repl*.py -v` ✅ (31 tests total) #### Manual Verification: + - [ ] `list sessions` shows all sessions with active session highlighted - [ ] `list workers` shows workers for active session - [ ] `list workers --session ID` shows workers for specific session @@ -1728,6 +1747,7 @@ research> status ## Phase 5: Polish & UX Enhancements ### Overview + Production-ready UX with tab completion, cost estimation, keyboard shortcuts, and export functionality. ### Changes Required @@ -2128,11 +2148,13 @@ def show_help(console: Console) -> None: ### Success Criteria #### Automated Verification: + - [x] Type checking passes: `uv run mypy src/deep_research/repl/` ✅ (9 source files) - [x] All tests pass: `uv run pytest tests/test_repl*.py -v` ✅ (31 tests total) - [x] Linting passes: `uv run ruff check src/deep_research/repl/` ✅ #### Manual Verification: + - [ ] Tab completion works for commands (type `sw` → `switch`) - [ ] Tab completion shows session IDs (type `switch session_`) - [ ] Auto-suggest shows grey text from history @@ -2186,6 +2208,7 @@ research> sta ### Unit Tests **Coverage Areas:** + 1. Command parser (test_repl_parser.py) - All command variations - Aliases @@ -2242,6 +2265,7 @@ Create a comprehensive manual test plan: # REPL Manual Test Plan ## Basic Functionality + - [ ] REPL starts without errors - [ ] Welcome message displays - [ ] Can enter commands and see responses @@ -2249,12 +2273,14 @@ Create a comprehensive manual test plan: - [ ] Ctrl+D exits ## Command Parsing + - [ ] All commands parse correctly - [ ] Aliases work (new → start, quit → exit) - [ ] Invalid commands show error - [ ] Empty input ignored ## Session Management + - [ ] Can start new research session - [ ] Live progress displays during research - [ ] Session saved to vault @@ -2262,12 +2288,14 @@ Create a comprehensive manual test plan: - [ ] Can reset session ## Continuation + - [ ] Can continue from previous session - [ ] Version increments (v1 → v2) - [ ] Parent session ID set correctly - [ ] Can expand worker findings ## Multi-Session + - [ ] Can create multiple sessions - [ ] List sessions shows all - [ ] Active session highlighted @@ -2275,6 +2303,7 @@ Create a comprehensive manual test plan: - [ ] Last session loads on restart ## UX Features + - [ ] Tab completion for commands - [ ] Tab completion for session IDs - [ ] Auto-suggest from history @@ -2285,6 +2314,7 @@ Create a comprehensive manual test plan: - [ ] Help text accurate ## Error Handling + - [ ] Invalid session ID shows clear error - [ ] Worker not found shows available workers - [ ] Network errors handled gracefully @@ -2294,6 +2324,7 @@ Create a comprehensive manual test plan: ### Performance Testing **Metrics to Track:** + 1. Command execution speed - `status` < 100ms - `list sessions` < 500ms @@ -2400,6 +2431,7 @@ uv run research interactive ``` No migration needed for: + - Existing sessions in Obsidian vault - Session data structures - ObsidianWriter/Loader @@ -2408,6 +2440,7 @@ No migration needed for: ### Data Format All data structures remain unchanged: + - `ResearchSession` (state.py:60-94) - `WorkerFullContext` (state.py:30-58) - Obsidian vault structure @@ -2418,6 +2451,7 @@ All data structures remain unchanged: ## References ### Internal Documentation + - Research Document: `thoughts/shared/research/2025-11-21_interactive-research-cli-architecture.md` - Codebase Files: - `deep-research-agent/src/deep_research/cli.py` - Current CLI @@ -2429,15 +2463,18 @@ All data structures remain unchanged: ### External Resources **prompt_toolkit:** + - Official Docs: https://python-prompt-toolkit.readthedocs.io/ - Async Example: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/main/examples/prompts/asyncio-prompt.py - REPL Tutorial: https://python-prompt-toolkit.readthedocs.io/en/master/pages/tutorials/repl.html **Production Examples:** + - ptpython: https://github.com/prompt-toolkit/ptpython - IPython: https://ipython.readthedocs.io/ **Command Parsing:** + - shlex: https://docs.python.org/3/library/shlex.html - argparse: https://docs.python.org/3/library/argparse.html @@ -2523,6 +2560,7 @@ All data structures remain unchanged: - **Phase 5** (Polish & UX): 1 week **Parallel Work Possible**: + - Tests can be written alongside implementation - Documentation can be updated incrementally @@ -2533,6 +2571,7 @@ All data structures remain unchanged: This implementation plan is based on comprehensive research and verified against the actual codebase. The deep-research agent has a **strong foundation** for interactive mode - the data model, persistence layer, and async execution are all ready. We're adding a thin REPL layer on top. **Critical Success Factors**: + 1. ✅ Use prompt_toolkit for rich async REPL 2. ✅ SessionManager for fast session switching 3. ✅ Context compression to ~50k tokens @@ -2552,19 +2591,20 @@ All 5 implementation phases have been successfully completed with comprehensive ### Implementation Metrics -| Metric | Target | Actual | Status | -|--------|--------|--------|--------| -| **Phases Complete** | 5 | 5 | ✅ | -| **Test Coverage** | >90% | 100% | ✅ | -| **Tests Passing** | All | 31/31 | ✅ | -| **Type Errors** | 0 | 0 | ✅ | -| **Linting Issues** | 0 | 0 | ✅ | -| **Files Created** | ~9 | 9 | ✅ | -| **Test Files** | ~4 | 4 | ✅ | +| Metric | Target | Actual | Status | +| ------------------- | ------ | ------ | ------ | +| **Phases Complete** | 5 | 5 | ✅ | +| **Test Coverage** | >90% | 100% | ✅ | +| **Tests Passing** | All | 31/31 | ✅ | +| **Type Errors** | 0 | 0 | ✅ | +| **Linting Issues** | 0 | 0 | ✅ | +| **Files Created** | ~9 | 9 | ✅ | +| **Test Files** | ~4 | 4 | ✅ | ### Files Created **REPL Module** (`src/deep_research/repl/`): + - ✅ `__init__.py` - Module exports - ✅ `parser.py` - Command parser with argparse (3.7 KB) - ✅ `shell.py` - Main REPL loop with prompt_toolkit (7.1 KB) @@ -2576,18 +2616,21 @@ All 5 implementation phases have been successfully completed with comprehensive - ✅ `cost_estimation.py` - Cost estimation (1.4 KB) **Test Suite** (`tests/`): + - ✅ `test_repl_parser.py` - 16 tests for command parsing - ✅ `test_repl_session_manager.py` - 3 tests for session management - ✅ `test_repl_context.py` - 3 tests for context compression - ✅ `test_repl_state.py` - 9 tests for state persistence **Modified Files**: + - ✅ `pyproject.toml` - Added `prompt-toolkit>=3.0.0` dependency - ✅ `src/deep_research/cli.py` - Added `interactive` command ### Phase Completion Status #### ✅ Phase 1: REPL Foundation + - **Status**: COMPLETE - **Tests**: 16/16 passing - **Features**: @@ -2598,6 +2641,7 @@ All 5 implementation phases have been successfully completed with comprehensive - ✅ Help command with examples #### ✅ Phase 2: Session Management + - **Status**: COMPLETE - **Tests**: 3/3 passing - **Features**: @@ -2608,6 +2652,7 @@ All 5 implementation phases have been successfully completed with comprehensive - ✅ LiveProgress integration for real-time updates #### ✅ Phase 3: Session Continuation + - **Status**: COMPLETE - **Tests**: 3/3 passing - **Features**: @@ -2617,6 +2662,7 @@ All 5 implementation phases have been successfully completed with comprehensive - ✅ Parent session linking with `parent_session_id` #### ✅ Phase 4: Multi-Session Management + - **Status**: COMPLETE - **Tests**: 9/9 passing - **Features**: @@ -2628,6 +2674,7 @@ All 5 implementation phases have been successfully completed with comprehensive - ✅ Graceful handling of missing/corrupted state files #### ✅ Phase 5: Polish & UX Enhancements + - **Status**: COMPLETE - **Tests**: All existing tests pass - **Features**: @@ -2660,14 +2707,14 @@ All 5 implementation phases have been successfully completed with comprehensive ### Architecture Compliance -| Design Decision | Implementation | Status | -|----------------|----------------|--------| -| Use `prompt_toolkit` for REPL | ✅ Implemented | ✅ | -| SessionManager pattern | ✅ Implemented | ✅ | -| Obsidian vault as source of truth | ✅ Integrated | ✅ | -| argparse for command parsing | ✅ Implemented | ✅ | -| Context compression to <50k tokens | ✅ Implemented | ✅ | -| LiveProgress integration | ✅ Integrated | ✅ | +| Design Decision | Implementation | Status | +| ---------------------------------- | -------------- | ------ | +| Use `prompt_toolkit` for REPL | ✅ Implemented | ✅ | +| SessionManager pattern | ✅ Implemented | ✅ | +| Obsidian vault as source of truth | ✅ Integrated | ✅ | +| argparse for command parsing | ✅ Implemented | ✅ | +| Context compression to <50k tokens | ✅ Implemented | ✅ | +| LiveProgress integration | ✅ Integrated | ✅ | ### Code Quality Metrics @@ -2682,18 +2729,21 @@ All 5 implementation phases have been successfully completed with comprehensive The following manual tests should be performed before production release: #### Session Management + - [ ] REPL starts successfully: `uv run research interactive` - [ ] Can start new research session and see live progress - [ ] Can continue from previous session (context loaded correctly) - [ ] Session state persists between commands #### Multi-Session Features + - [ ] Can switch between multiple sessions - [ ] Last active session loads automatically on restart - [ ] `list sessions` displays all sessions correctly - [ ] `list workers` shows worker details #### UX Features + - [ ] Tab completion works for commands (type `sw` → `switch`) - [ ] Tab completion shows session IDs - [ ] Auto-suggest shows grey text from history @@ -2701,11 +2751,13 @@ The following manual tests should be performed before production release: - [ ] User can cancel expensive operations #### Export Functionality + - [ ] Export markdown creates readable file - [ ] Export JSON creates valid JSON - [ ] Custom output paths work #### Error Handling + - [ ] Invalid commands show clear error messages - [ ] Corrupted state file handled gracefully - [ ] Missing session files show helpful error @@ -2728,12 +2780,14 @@ The following items were intentionally excluded from scope: ### Recommendations #### Before Production Deployment + 1. ✅ Complete manual testing checklist above 2. ✅ Test with real research queries to validate end-to-end flow 3. ✅ Document keyboard shortcuts in README 4. ⚠️ Consider adding integration tests for full research workflows #### Future Enhancements + 1. Add session archiving/cleanup command 2. Implement session search/filtering 3. Add session tagging for organization @@ -2748,6 +2802,7 @@ The following items were intentionally excluded from scope: All 5 phases have been successfully implemented with comprehensive test coverage, type safety, and adherence to the architectural design. The Interactive Research REPL is ready for production use pending completion of manual testing. **Key Achievements**: + - 📦 9 new modules implemented - ✅ 31 tests passing (100% success rate) - 🔍 Zero type errors (mypy strict mode) diff --git a/thoughts/shared/plans/obsidian-iterative-research-implementation.md b/thoughts/shared/plans/obsidian-iterative-research-implementation.md index 95b842b..50848ba 100644 --- a/thoughts/shared/plans/obsidian-iterative-research-implementation.md +++ b/thoughts/shared/plans/obsidian-iterative-research-implementation.md @@ -11,12 +11,14 @@ Transform the deep-research multi-agent system into an iterative, knowledge-grap ### What We Have **Multi-Agent Research System** (`deep-research-agent/src/deep_research/agent/orchestrator.py`): + - LeadResearcher orchestrates 1-5 parallel WorkerAgents via LangGraph - Workers execute ReAct loops (15 iterations max, 50K token budget) - Dynamic fan-out using LangGraph Send API - HTTP connection pooling for parallel execution (100 connections LLM, 50 search, 50 fetch) **Critical Compression Point** (`orchestrator.py:351-364`): + - Worker outputs compressed from full content (10-50KB) to 2000 tokens (~8KB) - **80-90% of research context is lost**: - Full ReAct thought-action-observation loops @@ -25,6 +27,7 @@ Transform the deep-research multi-agent system into an iterative, knowledge-grap - Worker's decision-making process **Current Storage** (`utils/store.py:15-22`): + - Single JSON file per session: `outputs/sessions/{session_id}.json` - Only stores compressed summaries (not full context) - Session ID based on `hash(query)` - same query overwrites previous @@ -43,6 +46,7 @@ Transform the deep-research multi-agent system into an iterative, knowledge-grap ### Architecture **Obsidian Vault Structure**: + ``` outputs/obsidian/ ├── sessions/ # Research sessions (MOCs) @@ -82,6 +86,7 @@ outputs/obsidian/ ### Verification **After implementation**: + - Run `research multi "What are AI trends?"` → Creates `outputs/obsidian/sessions/session_*.md` - Open Obsidian vault → Navigate session MOC → Click worker wikilink → See full ReAct trace - Run `research expand --session=session_abc123 --worker=task_1 "Focus on GPU costs"` → Creates v2 session @@ -90,6 +95,7 @@ outputs/obsidian/ ## What We're NOT Doing **Explicitly Out of Scope**: + - Cross-session insight linking (v1 keeps insights within-session only) - Custom Obsidian plugins (rely on core + Dataview only) - Real-time collaborative editing (single-user research workflows) @@ -102,6 +108,7 @@ outputs/obsidian/ **Strategy**: Phased implementation with incremental feature delivery. Each phase is independently testable and delivers value. **Key Principles**: + 1. **Never break synthesis**: Compression still used for report generation (backwards compatible) 2. **Store full + compressed**: Full context for Obsidian, compressed for synthesis 3. **Fail gracefully**: Obsidian write failures shouldn't crash research @@ -222,12 +229,14 @@ class ResearchSession: **Modify WorkerAgent Class** (line 21-62): 1. **Add instance variable** (after line 32): + ```python self.full_context: WorkerFullContext | None = None self.capture_context = True # Always capture for Obsidian ``` 2. **Create context at start** (after line 44, before executing): + ```python async def execute(self, objective: str, task_metadata: dict[str, Any]) -> WorkerResult: """Execute worker research with full context capture.""" @@ -287,11 +296,13 @@ async def execute(self, objective: str, task_metadata: dict[str, Any]) -> Worker **Capture ReAct Iterations** (modify `_step` method at line 200-310): 1. **Add iteration tracking variable** (after line 121): + ```python self.react_iterations: list[dict[str, Any]] = [] # Track for export ``` 2. **Capture iteration details** (in `_step` method, after line 216): + ```python async def _step(self) -> str: """Execute one ReAct step with full trace capture.""" @@ -391,6 +402,7 @@ async def _step(self) -> str: ``` 3. **Export iterations in metadata** (modify `research` method at line 163-171): + ```python return ResearchReport( query=query, @@ -412,11 +424,13 @@ return ResearchReport( **Add Session Tracking** (modify LeadResearcher class): 1. **Add instance variable** (after line 46): + ```python self.session: ResearchSession | None = None ``` 2. **Initialize session** (in `research` method, after line 67): + ```python async def research(self, query: str) -> dict[str, Any]: """Execute multi-agent research with full session tracking.""" @@ -460,6 +474,7 @@ async def research(self, query: str) -> dict[str, Any]: ``` 3. **Capture worker full context** (modify `_worker_execution` at line 351-364): + ```python async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: """Execute worker - CAPTURE FULL CONTEXT.""" @@ -517,6 +532,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: ``` 4. **Update complexity in session** (in `_analyze_query` at line 105-149): + ```python # After line 132 (after complexity_score extraction) if self.session: @@ -524,6 +540,7 @@ if self.session: ``` 5. **Update sub_tasks in session** (in `_create_plan` at line 151-311): + ```python # After line 311 (after sub_tasks extraction) if self.session: @@ -533,6 +550,7 @@ if self.session: ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `cd deep-research-agent && uv run pytest tests/ -v` - [ ] Type checking passes: `cd deep-research-agent && uv run mypy src/deep_research/` - [ ] Session object created with all fields populated @@ -541,6 +559,7 @@ if self.session: - [ ] Tool calls captured (verify `len(worker.tool_calls) > 0`) #### Manual Verification: + - [ ] Run research: `uv run research multi "What is Python?"` - [ ] Inspect `orchestrator.session` object after completion - [ ] Verify `session.workers[0].final_output` contains full uncompressed content @@ -566,6 +585,7 @@ Write complete research sessions to Obsidian vault as markdown files with YAML f #### Location: `deep-research-agent/src/deep_research/obsidian/` (new module) Create new module: + ``` deep-research-agent/src/deep_research/obsidian/ ├── __init__.py @@ -914,7 +934,7 @@ class ObsidianWriter: #### File: `deep-research-agent/src/deep_research/obsidian/templates.py` -```python +````python """Markdown templates for Obsidian notes.""" from typing import Any @@ -1117,8 +1137,10 @@ def source_note_template( ## Content -``` +```` + {content_display} + ``` """ @@ -1156,7 +1178,7 @@ def report_template( #### File: `deep-research-agent/src/deep_research/obsidian/insights.py` -```python +````python """Auto-extract insights from worker outputs using LLM.""" from datetime import datetime @@ -1193,15 +1215,14 @@ Example: "tags": ["scaling-laws", "mixture-of-experts", "model-architecture", "efficiency"] } ] -``` +```` Worker Output: {worker_output} """ - async def extract_insights(session: ResearchSession) -> list[dict[str, Any]]: - """Extract insights from all workers in session. +"""Extract insights from all workers in session. Returns: List of insight dictionaries with metadata @@ -1265,7 +1286,8 @@ async def extract_insights(session: ResearchSession) -> list[dict[str, Any]]: # Continue with other workers return all_insights -``` + +```` ### Orchestrator Integration @@ -1274,14 +1296,16 @@ async def extract_insights(session: ResearchSession) -> list[dict[str, Any]]: **Import ObsidianWriter** (after line 10): ```python from deep_research.obsidian.writer import ObsidianWriter -``` +```` **Add to LeadResearcher** (after line 46): + ```python self.obsidian_writer = ObsidianWriter(vault_path="outputs/obsidian") ``` **Write to Obsidian after research** (in `research` method, before returning result): + ```python # Write to Obsidian (always enabled) try: @@ -1308,6 +1332,7 @@ return result #### Location: `deep-research-agent/src/deep_research/cli.py` **Add Obsidian path to output** (after line 217): + ```python # After displaying cost if "obsidian_session_path" in result["metadata"]: @@ -1318,6 +1343,7 @@ if "obsidian_session_path" in result["metadata"]: ### Success Criteria #### Automated Verification: + - [x] Build succeeds: `cd deep-research-agent && uv run pytest tests/ -v` - [x] Type checking passes: `cd deep-research-agent && uv run mypy src/deep_research/` - [ ] Vault structure created: `outputs/obsidian/{sessions,workers,insights,sources,reports}/` @@ -1326,6 +1352,7 @@ if "obsidian_session_path" in result["metadata"]: - [ ] Insight notes created: `outputs/obsidian/insights/insight_*.md` #### Manual Verification: + - [ ] Run research: `uv run research multi "What are the latest AI trends?"` - [ ] Open `outputs/obsidian/` in Obsidian - [ ] Session MOC renders correctly with frontmatter and wikilinks @@ -1352,7 +1379,7 @@ Add `expand` and `recompile-report` CLI commands to enable iterative research wo #### File: `deep-research-agent/src/deep_research/obsidian/loader.py` (new) -```python +````python """Load research sessions from Obsidian vault.""" from pathlib import Path @@ -1641,7 +1668,7 @@ class SessionLoader: }) return sessions -``` +```` ### Expand Command @@ -1869,6 +1896,7 @@ Create a well-structured report that addresses the original query while followin ### Success Criteria #### Automated Verification: + - [ ] Build succeeds: `cd deep-research-agent && uv run pytest tests/ -v` - [x] Type checking passes: `cd deep-research-agent && uv run mypy src/deep_research/` - [x] SessionLoader can load existing sessions @@ -1876,6 +1904,7 @@ Create a well-structured report that addresses the original query while followin - [x] Recompile command creates new report version #### Manual Verification: + - [ ] Run initial research: `uv run research multi "What is Python?"` - [ ] List sessions: Verify session created in `outputs/obsidian/sessions/` - [ ] Expand worker: `uv run research expand --session= --worker=task_1 "Focus on performance"` @@ -2147,7 +2176,7 @@ async def test_expand_workflow(tmp_path): #### File: `deep-research-agent/docs/manual-testing.md` (new) -```markdown +````markdown # Manual Testing Procedures ## Phase 1: Basic Research @@ -2157,6 +2186,7 @@ async def test_expand_workflow(tmp_path): cd deep-research-agent uv run research multi "What are the latest trends in AI?" ``` +```` 2. **Verify output**: - Report displays in console @@ -2164,6 +2194,7 @@ async def test_expand_workflow(tmp_path): - Obsidian path displayed: `✓ Obsidian session: outputs/obsidian/sessions/...` 3. **Check Obsidian files**: + ```bash ls -R outputs/obsidian/ ``` @@ -2205,6 +2236,7 @@ async def test_expand_workflow(tmp_path): ## Phase 3: Iteration Commands 1. **Expand worker research**: + ```bash # Get session ID from previous run SESSION_ID="session_YYYYMMDD_HHMMSS_XXXXXX" @@ -2218,6 +2250,7 @@ async def test_expand_workflow(tmp_path): - Graph view shows lineage 3. **Recompile report**: + ```bash uv run research recompile-report --session=$SESSION_ID "Focus on beginner perspective" ``` @@ -2229,6 +2262,7 @@ async def test_expand_workflow(tmp_path): ## Phase 4: Performance Testing 1. **Large query test**: + ```bash uv run research multi "Comprehensive analysis of machine learning frameworks, comparing PyTorch, TensorFlow, and JAX across performance, ease of use, and ecosystem" ``` @@ -2260,7 +2294,8 @@ async def test_expand_workflow(tmp_path): - Verify 2-4 insights per worker - Check insight quality (title, finding, evidence, implications) - Verify confidence levels assigned -``` + +```` ### User Documentation @@ -2283,14 +2318,16 @@ The deep-research agent automatically saves all research sessions to an Obsidian ### Vault Structure -``` +```` + outputs/obsidian/ -├── sessions/ # Research session MOCs (Maps of Content) -├── workers/ # Individual worker research with full traces -├── insights/ # Auto-extracted atomic insights -├── sources/ # Deduplicated web pages -└── reports/ # Compiled reports (versioned) -``` +├── sessions/ # Research session MOCs (Maps of Content) +├── workers/ # Individual worker research with full traces +├── insights/ # Auto-extracted atomic insights +├── sources/ # Deduplicated web pages +└── reports/ # Compiled reports (versioned) + +```` ### Usage @@ -2302,7 +2339,7 @@ uv run research multi "What are the latest AI trends?" # Output shows Obsidian session path: # ✓ Obsidian session: outputs/obsidian/sessions/session_20250120_142530_abc123_v1.md -``` +```` #### Open in Obsidian @@ -2351,11 +2388,13 @@ SORT created_at ASC ### Graph Exploration Use Obsidian's graph view to: + - Visualize research session structures - Find connections between insights across sessions - Navigate from sources back to insights - Trace research lineage (v1 → v2 → v3) -``` + +```` ### Performance Benchmarks @@ -2432,11 +2471,12 @@ Use Obsidian's graph view to: - Use complexity thresholds to control cost (default settings are good) - Parallel execution is critical (HTTP connection pooling required) - Obsidian write overhead is minimal (always enable) -``` +```` ### Success Criteria #### Automated Verification: + - [ ] All unit tests pass: `uv run pytest tests/test_obsidian_writer.py -v` - [ ] All unit tests pass: `uv run pytest tests/test_session_loader.py -v` - [ ] Integration tests pass: `uv run pytest tests/integration/ -v -m integration` @@ -2444,6 +2484,7 @@ Use Obsidian's graph view to: - [ ] Documentation builds without errors #### Manual Verification: + - [ ] Complete full manual testing procedure (see `docs/manual-testing.md`) - [ ] All edge cases tested and pass - [ ] Performance benchmarks match expected ranges diff --git a/thoughts/shared/plans/storm-alignment-plan.md b/thoughts/shared/plans/storm-alignment-plan.md index 43b4011..8dcd9bd 100644 --- a/thoughts/shared/plans/storm-alignment-plan.md +++ b/thoughts/shared/plans/storm-alignment-plan.md @@ -11,6 +11,7 @@ The current Go implementation deviates from STORM's core mechanism: **simulated expert conversations**. Instead of multi-turn WikiWriter↔TopicExpert dialogues, it uses ReAct-style iterative search loops. This plan aligns the implementation 1-to-1 with STORM while: + - Using **web search** instead of Wikipedia (only allowed divergence) - **Keeping** event sourcing, cost tracking, Obsidian storage - **Keeping** the analysis agent (cross-validation, contradiction detection) @@ -22,9 +23,11 @@ This plan aligns the implementation 1-to-1 with STORM while: **Goal**: Survey related topics before generating perspectives (like STORM's Wikipedia survey, but with web search) ### Files to Modify + - `internal/planning/perspectives.go` ### New Types + ```go // TopicOutline represents structure extracted from a related topic type TopicOutline struct { @@ -37,6 +40,7 @@ type TopicOutline struct { ### New Functions #### 1.1 `SurveyRelatedTopics()` + Surveys related topics via web search and extracts their structure. ```go @@ -47,12 +51,14 @@ func (p *PerspectiveDiscoverer) SurveyRelatedTopics( ``` **Implementation**: + 1. LLM call to generate 3-5 search queries for related subtopics 2. Execute web searches for each query via Brave API 3. Extract key sections/themes from top 3 results per query 4. Return structured outlines as inspiration context **Prompt** (adapted from STORM's `FindRelatedTopic`): + ``` For the topic: "{topic}" @@ -64,6 +70,7 @@ Return JSON array: ["query1", "query2", ...] ``` #### 1.2 `DiscoverWithSurvey()` + Generates perspectives informed by related topic structures. ```go @@ -74,11 +81,13 @@ func (p *PerspectiveDiscoverer) DiscoverWithSurvey( ``` **Implementation**: + 1. Call `SurveyRelatedTopics()` to get related structures 2. Format outlines as inspiration context 3. Generate personas using enhanced prompt **Prompt** (adapted from STORM's `GenPersona`): + ``` For the research topic: "{topic}" @@ -100,6 +109,7 @@ Return JSON array: [{"name": "...", "focus": "...", "questions": [...]}] ``` ### Acceptance Criteria + - [x] Perspectives generated with awareness of related topic structures - [x] Cost tracked for survey LLM calls and searches - [x] Backward compatible: existing `Discover()` still works @@ -113,6 +123,7 @@ Return JSON array: [{"name": "...", "focus": "...", "questions": [...]}] This is the **most critical phase** - it implements STORM's core innovation. ### New File + - `internal/agents/conversation.go` ### New Types @@ -161,6 +172,7 @@ func (s *ConversationSimulator) wikiWriterAsk( ``` **Prompt** (adapted from STORM's `AskQuestionWithPersona`): + ``` You are a {perspective.Name} researching "{topic}". Your focus: {perspective.Focus} @@ -195,6 +207,7 @@ func (s *ConversationSimulator) expertGenerateQueries( ``` **Prompt** (adapted from STORM's `QuestionToQuery`): + ``` Topic: {topic} Question: {question} @@ -219,6 +232,7 @@ func (s *ConversationSimulator) expertAnswer( ``` **Prompt** (adapted from STORM's `AnswerQuestion`): + ``` Topic: {topic} Question: {question} @@ -244,6 +258,7 @@ func (s *ConversationSimulator) SimulateConversation( ``` **Implementation**: + ```go func (s *ConversationSimulator) SimulateConversation( ctx context.Context, @@ -321,6 +336,7 @@ func (s *ConversationSimulator) SimulateConversation( ``` ### Acceptance Criteria + - [x] Multi-turn conversations (3-5 turns) per perspective - [x] WikiWriter exits naturally when satisfied ("Thank you") - [x] TopicExpert grounds answers in search results @@ -335,6 +351,7 @@ func (s *ConversationSimulator) SimulateConversation( **Goal**: Generate outline using STORM's draft→refine pattern ### Files to Modify + - `internal/agents/synthesis.go` ### New Functions @@ -351,6 +368,7 @@ func (s *SynthesisAgent) GenerateDraftOutline( ``` **Prompt** (adapted from STORM's `WritePageOutline`): + ``` Create an outline for a comprehensive research report on: "{topic}" @@ -374,6 +392,7 @@ func (s *SynthesisAgent) RefineOutline( ``` **Prompt** (adapted from STORM's `WritePageOutlineFromConv`): + ``` Improve an outline for a research report on: "{topic}" @@ -423,6 +442,7 @@ func (s *SynthesisAgent) GenerateOutlineWithConversations( ``` ### Acceptance Criteria + - [x] Draft outline generated from topic only - [x] Refined outline incorporates conversation findings - [x] Output structure reflects research, not just LLM's prior knowledge @@ -434,6 +454,7 @@ func (s *SynthesisAgent) GenerateOutlineWithConversations( **Goal**: Wire everything together with event sourcing ### Files to Modify + - `internal/orchestrator/deep_eventsourced.go` - `internal/events/types.go` - `internal/core/domain/aggregate/research_state.go` @@ -443,6 +464,7 @@ func (s *SynthesisAgent) GenerateOutlineWithConversations( #### 4.1 New Event Type In `internal/events/types.go`: + ```go EventConversationCompleted EventType = "conversation.completed" ``` @@ -450,6 +472,7 @@ EventConversationCompleted EventType = "conversation.completed" #### 4.2 Updated Aggregate State In `internal/core/domain/aggregate/research_state.go`: + ```go type ResearchState struct { // ... existing fields ... @@ -545,6 +568,7 @@ pending → planning (with survey) → conversations → analyzing → synthesiz ``` ### Acceptance Criteria + - [x] Conversations run in parallel per perspective - [x] Events emitted after each conversation completes - [x] Conversations stored in result state @@ -579,15 +603,18 @@ Phase 3 ──┘ ## Testing Strategy ### Unit Tests + - `conversation_test.go`: Test WikiWriter/TopicExpert prompts and loop termination - `perspectives_test.go`: Test survey + enhanced persona generation - `synthesis_test.go`: Test two-phase outline generation ### Integration Tests + - Full flow e2e test with mock LLM responses - Resume capability test (interrupt mid-conversation, resume from events) ### Quality Validation + - Compare output reports between old (ReAct) and new (conversation) approaches - Verify conversation transcripts show natural follow-up questioning - Check that outlines reflect conversation content @@ -608,15 +635,15 @@ Phase 3 ──┘ Prompts to port from `knowledge_storm/storm_wiki/modules/`: -| STORM Prompt | Go Location | Status | -|--------------|-------------|--------| -| `FindRelatedTopic` | `perspectives.go:SurveyRelatedTopics()` | Phase 1 | -| `GenPersona` | `perspectives.go:DiscoverWithSurvey()` | Phase 1 | -| `AskQuestionWithPersona` | `conversation.go:wikiWriterAsk()` | Phase 2 | -| `QuestionToQuery` | `conversation.go:expertGenerateQueries()` | Phase 2 | -| `AnswerQuestion` | `conversation.go:expertAnswer()` | Phase 2 | -| `WritePageOutline` | `synthesis.go:GenerateDraftOutline()` | Phase 3 | -| `WritePageOutlineFromConv` | `synthesis.go:RefineOutline()` | Phase 3 | +| STORM Prompt | Go Location | Status | +| -------------------------- | ----------------------------------------- | ------- | +| `FindRelatedTopic` | `perspectives.go:SurveyRelatedTopics()` | Phase 1 | +| `GenPersona` | `perspectives.go:DiscoverWithSurvey()` | Phase 1 | +| `AskQuestionWithPersona` | `conversation.go:wikiWriterAsk()` | Phase 2 | +| `QuestionToQuery` | `conversation.go:expertGenerateQueries()` | Phase 2 | +| `AnswerQuestion` | `conversation.go:expertAnswer()` | Phase 2 | +| `WritePageOutline` | `synthesis.go:GenerateDraftOutline()` | Phase 3 | +| `WritePageOutlineFromConv` | `synthesis.go:RefineOutline()` | Phase 3 | --- diff --git a/thoughts/shared/plans/thinkdeep-gap-closure.md b/thoughts/shared/plans/thinkdeep-gap-closure.md index c7b9f6a..1d40317 100644 --- a/thoughts/shared/plans/thinkdeep-gap-closure.md +++ b/thoughts/shared/plans/thinkdeep-gap-closure.md @@ -6,15 +6,16 @@ Close the critical gaps between go-research ThinkDeep implementation and the Thi ## Current State Analysis -| Gap | Current | Target | -|-----|---------|--------| -| Sub-researcher execution | Sequential for loop | Parallel goroutines | -| Search results | Brave snippets only (~150 chars) | Full page fetch + LLM summarization | -| Supervisor prompt | ~40 lines, missing scaling rules | ~65 lines, full diffusion algorithm | -| Final report prompt | ~55 lines, missing key rules | ~100 lines with insightfulness/helpfulness | -| Search deduplication | None | URL deduplication across sub-researchers | +| Gap | Current | Target | +| ------------------------ | -------------------------------- | ------------------------------------------ | +| Sub-researcher execution | Sequential for loop | Parallel goroutines | +| Search results | Brave snippets only (~150 chars) | Full page fetch + LLM summarization | +| Supervisor prompt | ~40 lines, missing scaling rules | ~65 lines, full diffusion algorithm | +| Final report prompt | ~55 lines, missing key rules | ~100 lines with insightfulness/helpfulness | +| Search deduplication | None | URL deduplication across sub-researchers | ### Key Files: + - `internal/agents/supervisor.go:150-162` - Sequential execution loop - `internal/agents/sub_researcher.go:140-176` - Search tool execution - `internal/think_deep/prompts.go` - All prompts @@ -29,6 +30,7 @@ Close the critical gaps between go-research ThinkDeep implementation and the Thi 4. Duplicate URLs are **deduplicated** across sub-researchers before final report ### Verification: + - `go test ./internal/agents/... -v` passes - `go test ./internal/tools/... -v` passes - `go build ./...` succeeds @@ -44,6 +46,7 @@ Close the critical gaps between go-research ThinkDeep implementation and the Thi ## Implementation Approach Four phases, each independently testable: + 1. Parallel execution (core architectural fix) 2. Webpage summarization (research quality improvement) 3. Prompt migration (behavior alignment) @@ -66,6 +69,7 @@ Convert sequential sub-researcher execution to parallel using goroutines and syn **Changes**: Replace sequential for loop with parallel goroutine execution Replace lines 150-162: + ```go // Execute tool calls var toolResults []string @@ -83,6 +87,7 @@ for _, tc := range toolCalls { ``` With parallel execution: + ```go // Separate conduct_research calls from other tools var conductResearchCalls []think_deep.ToolCallParsed @@ -119,6 +124,7 @@ if len(conductResearchCalls) > 0 { ``` Add new method for parallel execution: + ```go // executeParallelResearch executes multiple conduct_research calls in parallel. // Limited to s.maxConcurrent parallel goroutines. @@ -341,6 +347,7 @@ func (s *SupervisorAgent) executeParallelResearch( ``` Add import at top of file: + ```go import ( "sync" @@ -373,11 +380,13 @@ func (s *SupervisorAgent) executeConductResearch( ### Success Criteria: #### Automated Verification: + - [x] `go build ./internal/agents/...` succeeds - [x] `go test ./internal/agents/... -v` passes - [x] `go vet ./internal/agents/...` passes #### Manual Verification: + - [ ] Run research query "Compare OpenAI vs Anthropic vs DeepMind AI safety approaches" - [ ] Verify logs show 3 sub-researchers starting near-simultaneously (within 1 second) - [ ] Verify all 3 complete and findings are accumulated @@ -389,6 +398,7 @@ func (s *SupervisorAgent) executeConductResearch( ### Overview Add LLM-based summarization of full webpage content to search results. The reference implementation: + 1. Fetches full page content via Tavily's `include_raw_content=True` 2. Summarizes each page using LLM with structured output 3. Returns summary + key excerpts @@ -690,11 +700,13 @@ func (o *ThinkDeepOrchestrator) executeSubResearch( ### Success Criteria: #### Automated Verification: + - [x] `go build ./internal/tools/...` succeeds - [x] `go test ./internal/tools/... -v` passes - [x] `go build ./internal/think_deep/...` succeeds #### Manual Verification: + - [ ] Run a search query and verify output includes "SUMMARY:" sections with full content summaries - [ ] Verify summaries are 25-30% of original page length - [ ] Verify key excerpts are extracted @@ -1241,11 +1253,13 @@ Format the report in clear markdown with proper structure and include source ref ### Success Criteria: #### Automated Verification: + - [x] `go build ./internal/think_deep/...` succeeds - [x] `go test ./internal/think_deep/... -v` passes - [x] `go vet ./internal/think_deep/...` passes #### Manual Verification: + - [ ] Run research and verify supervisor uses scaling rules (comparison → multiple agents) - [ ] Verify final report follows "DO NOT list facts in bullets" rule - [ ] Verify final report includes summary tables for comparisons @@ -1414,11 +1428,13 @@ func ExtractURLs(content string) []string { ### Success Criteria: #### Automated Verification: + - [x] `go build ./internal/think_deep/...` succeeds - [x] `go test ./internal/think_deep/... -v` passes - [x] `go build ./internal/orchestrator/...` succeeds #### Manual Verification: + - [ ] Run research with overlapping topics - [ ] Verify final report doesn't have duplicate sources with same URL - [ ] Verify deduplication doesn't remove unique content @@ -1430,17 +1446,20 @@ func ExtractURLs(content string) []string { ### Unit Tests: **File**: `internal/agents/supervisor_test.go` + - Test parallel execution spawns multiple goroutines - Test semaphore limits concurrent workers - Test results are collected in order - Test context cancellation stops all workers **File**: `internal/tools/summarizer_test.go` + - Test URL fetch + summarization flow - Test graceful degradation on fetch failure - Test content truncation for large pages **File**: `internal/think_deep/state_test.go` + - Test URL extraction - Test deduplication logic - Test SeenURLs tracking @@ -1448,6 +1467,7 @@ func ExtractURLs(content string) []string { ### Integration Tests: **File**: `internal/architectures/think_deep/integration_test.go` + - Test full research flow with parallel sub-researchers - Test summarization appears in search results - Test final report has deduplicated sources diff --git a/thoughts/shared/plans/thinkdeep_implementation.md b/thoughts/shared/plans/thinkdeep_implementation.md index f53cc4a..d5c8100 100644 --- a/thoughts/shared/plans/thinkdeep_implementation.md +++ b/thoughts/shared/plans/thinkdeep_implementation.md @@ -17,10 +17,10 @@ Implement the ThinkDepth.ai "Self-Balancing Test-Time Diffusion Deep Research" a ### Model Mapping -| Original | Implementation | -|----------|----------------| +| Original | Implementation | +| -------------- | ------------------------------------------------------ | | `openai:gpt-5` | `alibaba/tongyi-deepresearch-30b-a3b` (via OpenRouter) | -| Tavily Search | Brave Search (existing) | +| Tavily Search | Brave Search (existing) | --- @@ -87,6 +87,7 @@ type DraftRefinedData struct { ``` ### Success Criteria + - [x] `go build ./...` passes - [x] Event types are exported and accessible @@ -101,6 +102,7 @@ type DraftRefinedData struct { Create the prompts package with all necessary prompt templates. Reference implementation in research document section "3. Prompts (`prompts.go`)". Key prompts: + 1. `LeadResearcherPrompt` - Supervisor with diffusion algorithm instructions 2. `ResearchAgentPrompt` - Sub-researcher with hard limits (2-5 searches) 3. `CompressResearchPrompt` - Context compression preserving all info @@ -110,6 +112,7 @@ Key prompts: 7. `TransformToResearchBriefPrompt` - Convert user query to detailed brief ### Success Criteria + - [x] All prompts defined as functions with proper formatting - [x] `go build ./...` passes @@ -147,6 +150,7 @@ type ResearcherState struct { ``` ### Success Criteria + - [x] State types are properly exported - [x] `go build ./...` passes @@ -189,6 +193,7 @@ func parseToolCalls(content string) []ToolCallParsed { ``` ### Success Criteria + - [x] All tools implement the `tools.Tool` interface - [x] Tool call parsing works with XML-style tags - [x] `go build ./...` passes @@ -202,6 +207,7 @@ func parseToolCalls(content string) []ToolCallParsed { ### Implementation The sub-researcher: + 1. Receives a research topic from supervisor 2. Executes search loop (2-5 iterations) 3. Uses think_tool for reflection after each search @@ -218,6 +224,7 @@ The sub-researcher: - **Compression**: Filter out think_tool calls, preserve all search results verbatim ### Success Criteria + - [x] SubResearcherAgent struct with Research() method - [x] Integration with existing tools.Registry for search - [x] Compression function that filters think_tool calls @@ -233,6 +240,7 @@ The sub-researcher: ### Implementation The supervisor: + 1. Receives research brief and initial draft 2. Executes diffusion loop: - Generate research questions for gaps @@ -249,6 +257,7 @@ The supervisor: - **think_tool**: Used before/after conduct_research for planning ### Success Criteria + - [x] SupervisorAgent struct with Coordinate() method - [x] Takes sub-researcher callback for delegation - [x] Parallel execution support for conduct_research @@ -294,6 +303,7 @@ func WithThinkDeepTools(tools tools.ToolExecutor) ThinkDeepOption { ### Event Emission Emit events at each phase transition: + - `EventDiffusionStarted` at start - `EventDiffusionIterationStart` for each iteration - `EventResearchDelegated` when delegating to sub-researcher @@ -303,6 +313,7 @@ Emit events at each phase transition: - `EventFinalReportStarted/Complete` for final phase ### Success Criteria + - [x] ThinkDeepOrchestrator with Research() method - [x] Functional options for dependency injection - [x] Full event emission for visualization @@ -359,6 +370,7 @@ import ( ``` ### Success Criteria + - [x] Architecture implements `architectures.Architecture` interface - [x] Self-registers via init() - [x] Accessible via `catalog.Get("think_deep")` @@ -381,6 +393,7 @@ Due to import cycle constraints, the shared prompts/state/tools are in `internal ### DiffusionDisplay Component Create a display component that: + 1. Renders initial diffusion plan (4-phase flow diagram) 2. Shows iteration progress with progress bars 3. Displays sub-researcher activity with icons @@ -389,11 +402,13 @@ Create a display component that: ### Visualizer Integration Add to `Visualizer`: + 1. New `diffusionDisplay *DiffusionDisplay` field 2. Subscribe to ThinkDeep event types 3. Route events to diffusionDisplay.HandleEvent() ### Success Criteria + - [x] DiffusionDisplay renders plan visualization - [x] Iteration progress displays correctly - [x] Sub-researcher status updates in real-time @@ -428,6 +443,7 @@ Add to `Visualizer`: - Verify graceful completion ### Success Criteria + - [x] All integration tests pass - [x] Event emission verified - [x] Cost tracking accurate diff --git a/thoughts/shared/research/2024-12-02_storm-implementation-validation.md b/thoughts/shared/research/2024-12-02_storm-implementation-validation.md index 0b4783d..f3494ac 100644 --- a/thoughts/shared/research/2024-12-02_storm-implementation-validation.md +++ b/thoughts/shared/research/2024-12-02_storm-implementation-validation.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 160381af670355a4c2899b504fdd2748b64704c5 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "STORM Implementation Validation: Go vs Original Python" +topic: 'STORM Implementation Validation: Go vs Original Python' tags: [research, storm, deep-research, architecture, validation] status: complete last_updated: 2024-12-02 @@ -29,13 +29,13 @@ Validate that the Go implementation of STORM matches the original Stanford STORM The Go implementation **deviates significantly** from the original STORM algorithm. While it captures the spirit of multi-perspective research, it uses a fundamentally different approach: -| Aspect | Original STORM | Go Implementation | -|--------|---------------|-------------------| -| Information Gathering | **Simulated Conversations** (multi-turn Q&A) | **ReAct Loops** (Think→Act→Observe→Evaluate) | -| Perspective Discovery | **Survey-based** (scrape related Wikipedia articles) | **Single-shot LLM** (no external survey) | -| Outline Creation | **Two-phase** (draft + refinement with conversations) | **Single-shot** (no refinement) | -| Section Writing | **Parallel** with semantic retrieval | **Sequential** with validated facts | -| Analysis Phase | **None** | **Cross-validation, contradiction detection** (novel addition) | +| Aspect | Original STORM | Go Implementation | +| --------------------- | ----------------------------------------------------- | -------------------------------------------------------------- | +| Information Gathering | **Simulated Conversations** (multi-turn Q&A) | **ReAct Loops** (Think→Act→Observe→Evaluate) | +| Perspective Discovery | **Survey-based** (scrape related Wikipedia articles) | **Single-shot LLM** (no external survey) | +| Outline Creation | **Two-phase** (draft + refinement with conversations) | **Single-shot** (no refinement) | +| Section Writing | **Parallel** with semantic retrieval | **Sequential** with validated facts | +| Analysis Phase | **None** | **Cross-validation, contradiction detection** (novel addition) | **Key Finding**: The Go implementation is a valid research agent, but it's NOT a faithful STORM implementation. It's closer to a "ReAct + Multi-Perspective + Analysis" hybrid. @@ -48,6 +48,7 @@ The Go implementation **deviates significantly** from the original STORM algorit #### Original STORM (`knowledge_storm/storm_wiki/modules/persona_generator.py`) **Two-step process:** + 1. **Survey Related Topics** (lines 80-92): - LLM generates related Wikipedia topic URLs - HTTP fetch of each related article @@ -60,6 +61,7 @@ The Go implementation **deviates significantly** from the original STORM algorit - Always includes "Basic fact writer" as default **Prompt** (line 56-59): + ``` "You need to select a group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different @@ -70,6 +72,7 @@ Wikipedia pages of related topics for inspiration." #### Go Implementation (`internal/planning/perspectives.go`) **Single-shot process:** + - **Lines 35-51**: Single LLM call requests 3-5 perspectives - **NO** related topic survey - **NO** Wikipedia scraping for structure inspiration @@ -84,6 +87,7 @@ Wikipedia pages of related topics for inspiration." #### Original STORM (`knowledge_storm/storm_wiki/modules/knowledge_curation.py`) **Simulated Expert Conversations:** + - For each perspective, simulate a multi-turn dialogue (default: 3-5 turns) - **WikiWriter** (with persona) asks questions - **TopicExpert** (grounded on search) answers @@ -91,6 +95,7 @@ Wikipedia pages of related topics for inspiration." - Conversations run **in parallel** per perspective **Loop Structure** (lines 60-80): + ```python for _ in range(self.max_turn): # default max_turn = 3 # WikiWriter asks question (grounded on persona + history) @@ -112,11 +117,13 @@ for _ in range(self.max_turn): # default max_turn = 3 #### Go Implementation (`internal/agents/search.go:85-175`) **ReAct Loop:** + - For each perspective, run a ReAct loop (max 3 iterations) - Think → Act (search) → Observe (extract facts) → Evaluate (find gaps) - No simulated conversation - just iterative search **Loop Structure** (lines 103-167): + ```go for iter := 0; iter < maxIterations && !sufficientCoverage; iter++ { // Generate queries from perspective questions @@ -159,6 +166,7 @@ for iter := 0; iter < maxIterations && !sufficientCoverage; iter++ { - Refines based on collected information **Prompt for refinement** (line 154-159): + ``` "Improve an outline for a Wikipedia page. You already have a draft outline that covers the general information. Now you want to improve it based on the @@ -169,6 +177,7 @@ informative." #### Go Implementation (`internal/agents/synthesis.go:85-127`) **Single-shot generation:** + - Input: Perspectives + fact counts only - No draft phase - No incorporation of collected facts into outline @@ -180,11 +189,13 @@ informative." ### 4. Analysis Phase (Go-Only) #### Original STORM + **No analysis phase** - facts are used directly for section writing. #### Go Implementation (`internal/agents/analysis.go`) **Three-step analysis:** + 1. **Cross-validation**: Score facts 0-1, identify corroborating sources 2. **Contradiction detection**: Find conflicts between facts 3. **Knowledge gap identification**: Find missing coverage areas @@ -198,6 +209,7 @@ informative." #### Original STORM (`knowledge_storm/storm_wiki/modules/article_generation.py`) **Parallel section writing:** + - Sections generated independently via ThreadPoolExecutor - Each section uses **semantic retrieval** from conversation corpus - SentenceTransformer embeddings for similarity search @@ -206,6 +218,7 @@ informative." #### Go Implementation (`internal/agents/synthesis.go:130-211`) **Sequential section writing:** + - Sections generated one-by-one - Uses validated facts with confidence scores - No semantic retrieval - all facts provided to each section @@ -222,6 +235,7 @@ To align the Go implementation with the original STORM algorithm, the following **File**: `internal/planning/perspectives.go` **Changes:** + 1. Add `SurveyRelatedTopics()` function: - Generate related topic queries via LLM - Fetch Wikipedia articles (or use web search) @@ -234,6 +248,7 @@ To align the Go implementation with the original STORM algorithm, the following - Update prompt to match STORM's `GenPersona` signature **New Functions:** + ```go // perspectives.go @@ -261,6 +276,7 @@ func (d *PerspectiveDiscoverer) Discover(ctx context.Context, topic string) ([]P **New File**: `internal/agents/conversation.go` **New Types:** + ```go type DialogueTurn struct { Question string @@ -277,6 +293,7 @@ type ConversationSimulator struct { ``` **Key Functions:** + ```go func (s *ConversationSimulator) SimulateConversation( ctx context.Context, @@ -317,6 +334,7 @@ func (s *ConversationSimulator) SimulateConversation( **File**: `internal/agents/synthesis.go` **Changes:** + 1. Add `generateDraftOutline()` - topic only 2. Add `refineOutline()` - topic + draft + conversations 3. Modify `generateOutline()` to use two-phase process @@ -343,11 +361,13 @@ func (a *SynthesisAgent) GenerateOutline( **File**: `internal/orchestrator/deep_eventsourced.go` **Changes:** + 1. Replace search workers with conversation simulation 2. Store conversations in state (for outline refinement) 3. Update synthesis to use two-phase outline **New Flow:** + ``` 1. Perspective Discovery (with related topic survey) └─> Emit PlanCreatedEvent with perspectives @@ -374,6 +394,7 @@ func (a *SynthesisAgent) GenerateOutline( **New File**: `internal/retrieval/semantic.go` **Functions:** + ```go type ConversationCorpus struct { conversations map[string][]DialogueTurn @@ -394,13 +415,13 @@ func (c *ConversationCorpus) RetrieveRelevant( ## Implementation Priority -| Priority | Change | Effort | Impact | -|----------|--------|--------|--------| -| **P0** | Conversation Simulation | High | Critical - core STORM mechanism | -| **P1** | Related Topic Survey | Medium | Important - perspective quality | -| **P1** | Two-Phase Outline | Medium | Important - outline quality | -| **P2** | Semantic Retrieval | Medium | Nice-to-have - section quality | -| **P3** | Keep Analysis Phase | None | Already done - adds value | +| Priority | Change | Effort | Impact | +| -------- | ----------------------- | ------ | ------------------------------- | +| **P0** | Conversation Simulation | High | Critical - core STORM mechanism | +| **P1** | Related Topic Survey | Medium | Important - perspective quality | +| **P1** | Two-Phase Outline | Medium | Important - outline quality | +| **P2** | Semantic Retrieval | Medium | Nice-to-have - section quality | +| **P3** | Keep Analysis Phase | None | Already done - adds value | --- @@ -428,12 +449,14 @@ The following prompts from the Python implementation should be ported to Go: ## Code References ### Original STORM (Python) + - Entry point: `knowledge_storm/storm_wiki/engine.py:341-442` - Persona generation: `knowledge_storm/storm_wiki/modules/persona_generator.py:114-154` - Conversation simulation: `knowledge_storm/storm_wiki/modules/knowledge_curation.py:47-81` - Outline generation: `knowledge_storm/storm_wiki/modules/outline_generation.py:22-72` ### Go Implementation + - Entry point: `internal/orchestrator/deep_eventsourced.go:88-260` - Perspective discovery: `internal/planning/perspectives.go:34-77` - Search agent: `internal/agents/search.go:85-175` @@ -458,6 +481,7 @@ The following prompts from the Python implementation should be ported to Go: The Go implementation is a functional research agent but **not a faithful STORM implementation**. The key missing piece is the **simulated conversation** mechanism, which is STORM's core innovation. The ReAct loop approach, while valid, produces different results. To create a true STORM implementation in Go: + 1. Replace ReAct loops with conversation simulation 2. Add related topic survey for perspective discovery 3. Implement two-phase outline generation diff --git a/thoughts/shared/research/2025-01-25_ports-adapters-architecture-go-research.md b/thoughts/shared/research/2025-01-25_ports-adapters-architecture-go-research.md index b85a34d..64a052f 100644 --- a/thoughts/shared/research/2025-01-25_ports-adapters-architecture-go-research.md +++ b/thoughts/shared/research/2025-01-25_ports-adapters-architecture-go-research.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 0e487a7ff2dd4eb9aa690ac395e828f8324014aa branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Ports/Adapters Architecture for go-research Agent" +topic: 'Ports/Adapters Architecture for go-research Agent' tags: [research, architecture, hexagonal, ports-adapters, go-research, refactoring] status: complete last_updated: 2025-01-25 @@ -22,6 +22,7 @@ last_updated_by: Claude ## Research Question How to refactor the go-research deep research agent to separate: + 1. The agent core from the CLI (REPL) 2. The agent core from storage (sessions, reports, insights) 3. Enable swappable frontends (CLI first, web/API later) @@ -66,24 +67,26 @@ go-research/ ### 1.2 Current Coupling Points -| Component | Coupling Level | Issues | -|-----------|---------------|--------| -| Event Bus | **Low** | Clean pub/sub, already abstracted | -| LLM Client | **Low** | Has `ChatClient` interface | -| Tool Executor | **Low** | Has `ToolExecutor` interface | -| CLI/REPL | **Medium** | Direct orchestrator instantiation in handlers | -| Storage | **High** | Session struct is both domain AND storage schema | -| Obsidian | **Medium** | VaultWriter interface exists but limited | +| Component | Coupling Level | Issues | +| ------------- | -------------- | ------------------------------------------------ | +| Event Bus | **Low** | Clean pub/sub, already abstracted | +| LLM Client | **Low** | Has `ChatClient` interface | +| Tool Executor | **Low** | Has `ToolExecutor` interface | +| CLI/REPL | **Medium** | Direct orchestrator instantiation in handlers | +| Storage | **High** | Session struct is both domain AND storage schema | +| Obsidian | **Medium** | VaultWriter interface exists but limited | ### 1.3 Key Discoveries **Strong Points:** + - `llm.ChatClient` interface already exists (`internal/llm/client.go:19-25`) - `tools.ToolExecutor` interface already exists (`internal/tools/registry.go:16-19`) - Event bus enables loose coupling for progress updates - Options pattern used for dependency injection in orchestrators **Weak Points:** + - `Session` struct conflates domain model and storage schema - Handlers directly instantiate orchestrators (`handlers/start.go:57`, `145`) - Result→Session transformation logic lives in handlers @@ -1217,6 +1220,7 @@ func main() { 3. No functional changes - existing code continues to work **Files to Create**: + - `internal/core/ports/research.go` - `internal/core/ports/storage.go` - `internal/core/ports/llm.go` @@ -1224,6 +1228,7 @@ func main() { - `internal/core/ports/events.go` **Verification**: + ```go // Compile-time interface compliance var _ ports.LLMProvider = (*llm.Client)(nil) @@ -1239,6 +1244,7 @@ var _ ports.ToolExecutor = (*tools.Registry)(nil) 3. Update storage adapters to use entity types internally **Key Changes**: + - `Session` (domain) has no `json` tags - `SessionEntity` (storage) has `json` tags - Mapping functions `toEntity()` / `toDomain()` in storage adapters @@ -1252,6 +1258,7 @@ var _ ports.ToolExecutor = (*tools.Registry)(nil) 3. Move orchestration logic from handlers to services **Key Changes**: + - Handlers become thin adapters calling services - Services own all business logic - Services depend only on ports diff --git a/thoughts/shared/research/2025-11-07_blog-post-rendering-improvement-plan.md b/thoughts/shared/research/2025-11-07_blog-post-rendering-improvement-plan.md index e225c01..a028f80 100644 --- a/thoughts/shared/research/2025-11-07_blog-post-rendering-improvement-plan.md +++ b/thoughts/shared/research/2025-11-07_blog-post-rendering-improvement-plan.md @@ -429,8 +429,7 @@ const headingVariants = cva('font-bold text-balance scroll-mt-20', { }); export interface BlogHeadingProps - extends React.HTMLAttributes, - VariantProps { + extends React.HTMLAttributes, VariantProps { level: 1 | 2 | 3 | 4 | 5 | 6; } @@ -481,7 +480,8 @@ const listVariants = cva('my-6 space-y-2', { }); export interface BlogListProps - extends React.HTMLAttributes, + extends + React.HTMLAttributes, VariantProps { variant?: 'unordered' | 'ordered' | 'checklist'; } diff --git a/thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md b/thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md index b711f78..fd2f80b 100644 --- a/thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md +++ b/thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md @@ -4,7 +4,7 @@ researcher: Claude (Sonnet 4.5) git_commit: 22dbf8d52dc8c995afcf147c11fad7f347571464 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Custom Deep Research Agent - Architecture & Implementation Plan" +topic: 'Custom Deep Research Agent - Architecture & Implementation Plan' tags: [research, deep-research, agent, langgraph, react, eda, python, uv] status: complete last_updated: 2025-11-15 @@ -116,18 +116,21 @@ The agent will operate as a CLI tool that takes natural language research questi #### 1. Agent Core (`agent/`) **ReAct Loop Implementation** + - Iterative thought-action-observation cycle - XML-tagged structured outputs (``, ``, ``) - Context window management (110K token threshold) - Graceful degradation on token limits **LangGraph State Machine** + - `StateGraph` with typed state (Pydantic models) - Conditional routing based on tool calls - Checkpointing with PostgreSQL/SQLite - Human-in-the-loop interrupt points **State Definition**: + ```python class AgentState(TypedDict): messages: Annotated[list[BaseMessage], add_messages] @@ -146,12 +149,14 @@ class AgentState(TypedDict): #### 2. Tool Suite (`tools/`) **Search Tool** (`search.py`) + - Web search via configurable provider (Serper, Brave, DuckDuckGo) - Batch query support - Markdown-formatted results with citations - Language detection for locale customization **Code Executor** (`code_executor.py`) + - Jupyter kernel management via `jupyter-client` - Sandboxed Python execution - Rich output capture (stdout, stderr, display data) @@ -159,12 +164,14 @@ class AgentState(TypedDict): - Timeout and resource limits **Web Fetch Tool** (`web_fetch.py`) + - URL content retrieval and summarization - Jina AI Reader integration - Progressive content truncation (95K tokens) - Goal-based extraction with LLM **File Operations** (`file_ops.py`) + - Read/write local files - CSV/Excel/Parquet loading - Schema extraction @@ -173,12 +180,14 @@ class AgentState(TypedDict): #### 3. Notebook Generation (`notebook/`) **Builder** (`builder.py`) + - Programmatic notebook creation with `nbformat` - Cell management (markdown, code) - Execution result capture - Output serialization **Templates** (`templates.py`) + - EDA narrative structure (7-act storyline) - Act I: Set the Scene - Act II: Meet the Data @@ -189,6 +198,7 @@ class AgentState(TypedDict): - Act VII: Insights & Recommendations **Validator** (`validator.py`) + - AST-based code validation - Dangerous operation detection - Pandas pattern checking @@ -197,12 +207,14 @@ class AgentState(TypedDict): #### 4. CLI Interface (`cli.py`) **Commands**: + - `deep-research research ` - General research - `deep-research eda ` - Exploratory data analysis - `deep-research resume ` - Continue session - `deep-research export ` - Export results **Features**: + - Rich terminal output with progress bars - Streaming token display - Interactive mode for refinement @@ -248,30 +260,35 @@ dev = [ ### Why These Choices? **uv Package Manager**: + - 10-100x faster than pip/poetry - Automatic Python version management - Universal lock file (cross-platform) - Single binary for entire toolchain **LangGraph over LangChain LCEL**: + - Explicit state management vs. implicit - Better debugging with state inspection - Checkpointing and time-travel - Clearer control flow with conditional edges **Click over Typer**: + - More mature and stable - Better documentation - Wider community adoption - Simpler for this use case **jupyter-client over E2B**: + - No external dependencies - Local execution for privacy - Free (no API costs) - Full control over environment **Pydantic v2**: + - Native validation performance - JSON schema generation - Type safety for LLM outputs @@ -346,6 +363,7 @@ deep-research-agent/ **Goal**: Establish project structure and basic CLI **Tasks**: + 1. Initialize uv project with dependencies 2. Set up development tooling (ruff, mypy, pytest) 3. Implement basic CLI with Click @@ -353,6 +371,7 @@ deep-research-agent/ 5. Set up logging with rich **Deliverables**: + - Working CLI that accepts commands - Configuration file loading (.env) - Basic logging to console @@ -363,6 +382,7 @@ deep-research-agent/ **Goal**: Implement core tools for research **Tasks**: + 1. Implement base tool interface 2. Build search tool with multiple providers 3. Create web fetch tool with summarization @@ -371,6 +391,7 @@ deep-research-agent/ 6. Create tool registry **Deliverables**: + - All tools independently testable - Tool registry with dynamic loading - Comprehensive test coverage (>80%) @@ -381,6 +402,7 @@ deep-research-agent/ **Goal**: Build core ReAct loop with LangGraph **Tasks**: + 1. Define LangGraph state schema 2. Implement ReAct loop logic 3. Build conditional routing @@ -389,6 +411,7 @@ deep-research-agent/ 6. Create system prompts **Deliverables**: + - Working ReAct agent - State persistence to SQLite - Token counting and limits @@ -399,6 +422,7 @@ deep-research-agent/ **Goal**: Build EDA and notebook capabilities **Tasks**: + 1. Implement notebook builder 2. Create EDA templates (7-act structure) 3. Build code validator @@ -406,6 +430,7 @@ deep-research-agent/ 5. Implement execution and capture **Deliverables**: + - Programmatic notebook generation - EDA template with narrative - Safe code execution @@ -416,6 +441,7 @@ deep-research-agent/ **Goal**: End-to-end workflows and UX **Tasks**: + 1. Integrate all components 2. Implement CLI workflows 3. Add streaming output @@ -423,6 +449,7 @@ deep-research-agent/ 5. Performance optimization **Deliverables**: + - Complete research workflow - Complete EDA workflow - User documentation @@ -436,6 +463,7 @@ deep-research-agent/ **File**: `src/deep_research/agent/react_agent.py` **Responsibilities**: + - Orchestrate thought-action-observation loop - Manage LLM API calls with retry logic - Parse structured outputs (XML tags) @@ -443,6 +471,7 @@ deep-research-agent/ - Accumulate context and results **Key Methods**: + ```python class ReActAgent: def __init__( @@ -465,12 +494,14 @@ class ReActAgent: ``` **Termination Conditions**: + 1. Answer tag found (`...`) 2. Max iterations reached (100) 3. Token limit approached (110K) 4. Explicit user termination **Error Handling**: + - Exponential backoff for API errors (1-30s) - Max 10 retries per API call - Graceful degradation on tool failures @@ -481,6 +512,7 @@ class ReActAgent: **File**: `src/deep_research/agent/graph.py` **Workflow**: + ```python def build_research_graph() -> CompiledGraph: workflow = StateGraph(AgentState) @@ -526,6 +558,7 @@ def build_research_graph() -> CompiledGraph: ``` **State Updates**: + - Use reducers for accumulation (`add_messages`, custom reducers) - Atomic updates per node - Immutable history via checkpointing @@ -535,6 +568,7 @@ def build_research_graph() -> CompiledGraph: **File**: `src/deep_research/tools/base.py` **Base Interface**: + ```python from abc import ABC, abstractmethod from typing import Any @@ -579,6 +613,7 @@ class BaseTool(ABC): ``` **Example: Search Tool**: + ```python class SearchInput(ToolInput): query: list[str] @@ -611,6 +646,7 @@ class SearchTool(BaseTool): **File**: `src/deep_research/tools/code_executor.py` **Implementation**: + ```python from jupyter_client import KernelManager from queue import Empty @@ -674,6 +710,7 @@ class CodeExecutor: **File**: `src/deep_research/notebook/builder.py` **Implementation**: + ```python import nbformat as nbf from nbformat.v4 import new_notebook, new_code_cell, new_markdown_cell @@ -702,6 +739,7 @@ class NotebookBuilder: ``` **EDA Template**: + ```python class EDATemplate: @staticmethod diff --git a/thoughts/shared/research/2025-11-15_multi-agent-deep-research-architecture-v2.md b/thoughts/shared/research/2025-11-15_multi-agent-deep-research-architecture-v2.md index 9c93411..2673dd1 100644 --- a/thoughts/shared/research/2025-11-15_multi-agent-deep-research-architecture-v2.md +++ b/thoughts/shared/research/2025-11-15_multi-agent-deep-research-architecture-v2.md @@ -4,8 +4,9 @@ researcher: GPT-5.1 Codex (Tongyi DeepResearch 30B A3B) git_commit: 22dbf8d52dc8c995afcf147c11fad7f347571464 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Multi-Agent Deep Research Architecture - Production-Ready Design (v2)" -tags: [research, deep-research, multi-agent, langgraph, context-engineering, production, architecture] +topic: 'Multi-Agent Deep Research Architecture - Production-Ready Design (v2)' +tags: + [research, deep-research, multi-agent, langgraph, context-engineering, production, architecture] status: complete last_updated: 2025-11-15 last_updated_by: GPT-5.1 Codex @@ -199,6 +200,7 @@ Final Output > **Model Policy**: The MVP exclusively uses Tongyi DeepResearch 30B A3B for every agent type to avoid fragmented reasoning behaviors; Anthropic models are intentionally excluded in this phase. [\[link\]](https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b) **Responsibilities**: + 1. **Query Analysis**: Assess complexity, identify sub-tasks, estimate required agents 2. **Scaling Decision**: Simple (1-2 agents, 3-10 calls) vs. Complex (3-10 agents, 20+ calls) 3. **Sub-Agent Spawning**: Generate focused prompts with clear objectives and output schemas @@ -207,12 +209,14 @@ Final Output 6. **Memory Management**: Persist strategies, retrieve past learnings, manage evolving report **Context Engineering**: + - **Input**: User query (200-1K tokens) + Summary from previous round (if iterative) - **Receives from sub-agents**: Compressed 1-2K summaries (not full 50K exploration) - **Maintains**: Evolving research report using Markovian workspace pattern - **Output**: Final report or decision to spawn next research round **Scaling Rules** (industry benchmarks from Alibaba IterResearch + Anthropic studies): + ```python def determine_scaling(query_complexity: float, query_specificity: float) -> ScalingStrategy: if complexity < 0.3 and specificity > 0.7: @@ -224,6 +228,7 @@ def determine_scaling(query_complexity: float, query_specificity: float) -> Scal ``` **Prompt Template**: + ``` You are a LeadResearcher coordinating specialized research agents. @@ -290,6 +295,7 @@ Return JSON: - Output: Validation report with flagged issues **Worker Lifecycle**: + 1. **Spawn**: LeadResearcher creates with focused prompt 2. **Execute**: Independent exploration with full tool access 3. **Compress**: Condense findings using LLM (50K → 2K) @@ -298,6 +304,7 @@ Return JSON: 6. **Terminate**: Worker agent lifecycle ends **Context Isolation Pattern**: + ```python class WorkerAgent: def execute(self, objective: str, tools: list[str]) -> WorkerResult: @@ -369,6 +376,7 @@ class SearchTool(BaseTool): ``` **Features**: + - Batch processing (3-5 queries in single call) - Provider fallback (Serper → Brave → DuckDuckGo) - Result caching with semantic similarity check @@ -411,6 +419,7 @@ class WebFetchTool(BaseTool): ``` **Context Optimization**: + - Pre-truncation before LLM processing (95K token limit) - Goal-directed extraction (only relevant sections) - Progressive truncation on failure (70% reduction per retry) @@ -505,6 +514,7 @@ class CodeExecutorTool(BaseTool): ``` **Safety Features**: + - AST-based code validation - Resource limits (CPU, memory, time) - Network isolation (optional) @@ -543,6 +553,7 @@ class ShortTermMemory: ``` **Stores**: + - Current conversation state - Active research round progress - In-flight tool calls @@ -602,6 +613,7 @@ class LongTermMemory: ``` **Stores**: + - Past research results - Sub-agent exploration outputs (full context) - Learned patterns and strategies @@ -679,6 +691,7 @@ class MarkovianWorkspace: ``` **Key Properties**: + - **Markovian**: Only depends on current state, not full history - **Bounded**: Evolving report has maximum size (e.g., 8K tokens) - **Compressed**: Lossy compression filters noise while preserving signal @@ -859,6 +872,7 @@ Based on [LangChain Context Engineering Guide](https://blog.langchain.com/contex #### 1. Write Context (Persist Outside Window) **Scratchpad Pattern**: + ```python class AgentState(TypedDict): messages: Annotated[list, add_messages] @@ -879,6 +893,7 @@ def research_node(state: AgentState) -> dict: ``` **Memory Blocks Pattern**: + ```python # Letta/MemGPT-inspired memory blocks memory_blocks = { @@ -901,6 +916,7 @@ memory_blocks = { ``` **External Memory Store**: + ```python # Store full sub-agent outputs long_term_memory.store({ @@ -919,6 +935,7 @@ if need_details: #### 2. Select Context (Pull Relevant Information) **Semantic Memory Retrieval**: + ```python def select_relevant_memories(query: str, k: int = 5) -> list[Memory]: # Embedding-based retrieval @@ -935,6 +952,7 @@ def select_relevant_memories(query: str, k: int = 5) -> list[Memory]: ``` **Tool Selection**: + ```python def select_tools_for_task(objective: str, available_tools: list[Tool]) -> list[Tool]: # RAG on tool descriptions @@ -949,6 +967,7 @@ def select_tools_for_task(objective: str, available_tools: list[Tool]) -> list[T #### 3. Compress Context (Retain Essential Tokens) **Summarization Pattern**: + ```python def compress_findings(full_exploration: str, max_tokens: int = 2000) -> str: compression_prompt = f""" @@ -966,6 +985,7 @@ def compress_findings(full_exploration: str, max_tokens: int = 2000) -> str: ``` **LazyLLM-Inspired Token Pruning** (if implementing): + ```python def prune_low_relevance_tokens(context: str, attention_scores: np.ndarray, k: float = 0.3): # Calculate k-th percentile threshold @@ -980,6 +1000,7 @@ def prune_low_relevance_tokens(context: str, attention_scores: np.ndarray, k: fl ``` **Auto-Compact at Context Limits**: + ```python def check_context_limit(state: AgentState, threshold: float = 0.95): total_tokens = sum(count_tokens(m.content) for m in state["messages"]) @@ -999,6 +1020,7 @@ def check_context_limit(state: AgentState, threshold: float = 0.95): #### 4. Isolate Context (Split Across Agents) **Multi-Agent Isolation**: + ```python # Each worker operates in isolated context # Does NOT see: @@ -1020,6 +1042,7 @@ def create_isolated_worker(objective: str, tools: list[str]) -> WorkerAgent: ``` **Environment Isolation**: + ```python # Token-heavy objects stored as environment variables class JupyterExecutionEnvironment: @@ -1048,6 +1071,7 @@ class JupyterExecutionEnvironment: **Goal**: Establish orchestrator-worker pattern with basic tools **Tasks**: + 1. **Project Setup** - Initialize uv project with dependencies - Configure LangGraph with PostgreSQL checkpointing @@ -1077,12 +1101,14 @@ class JupyterExecutionEnvironment: - Basic error handling **Deliverables**: + - Working multi-agent system with 1 orchestrator + N workers - Basic research workflow end-to-end - Unit tests for core components - CLI accepting queries and returning results **Success Criteria**: + - Spawn 3 parallel workers successfully - Workers operate in isolated contexts - Results aggregate correctly @@ -1093,6 +1119,7 @@ class JupyterExecutionEnvironment: **Goal**: Implement Markovian workspace and memory architecture **Tasks**: + 1. **Markovian Workspace** - Implement IterResearch-inspired state management - Bounded-size evolving report @@ -1117,12 +1144,14 @@ class JupyterExecutionEnvironment: - Auto-compact triggers **Deliverables**: + - Markovian workspace with O(1) context complexity - Multi-tier memory system - Compression achieving 60%+ token reduction - Monitoring dashboard for context usage **Success Criteria**: + - Orchestrator maintains <50% context usage across 10+ rounds - Successfully complete 50+ interaction workflow without overflow - Workers compress 50K exploration to 2K summary @@ -1133,6 +1162,7 @@ class JupyterExecutionEnvironment: **Goal**: Build comprehensive tool suite and error recovery **Tasks**: + 1. **Enhanced Tools** - Parallel search with batch queries - Goal-directed web fetch with retry @@ -1159,12 +1189,14 @@ class JupyterExecutionEnvironment: - Source attribution **Deliverables**: + - Complete tool suite (8+ tools) - Robust error handling - Validation agents - Tool usage analytics **Success Criteria**: + - <5% tool failure rate with retries - Successful recovery from sub-agent failures - Citation accuracy >90% @@ -1175,6 +1207,7 @@ class JupyterExecutionEnvironment: **Goal**: Specialized data analysis capabilities **Tasks**: + 1. **Data Agent Type** - Specialized worker for data analysis - Pandas/Polars integration @@ -1200,12 +1233,14 @@ class JupyterExecutionEnvironment: - Report synthesis **Deliverables**: + - Data analysis agent type - Notebook generation pipeline - EDA template library - Multi-agent EDA orchestration **Success Criteria**: + - Generate executable notebooks from data files - 7-act narrative structure - Visualizations embedded correctly @@ -1216,6 +1251,7 @@ class JupyterExecutionEnvironment: **Goal**: Observability, optimization, and deployment **Tasks**: + 1. **Observability** - Structured logging - LangSmith integration @@ -1241,12 +1277,14 @@ class JupyterExecutionEnvironment: - Deployment guide **Deliverables**: + - Production-ready system - Comprehensive observability - Optimized for cost and performance - Complete documentation **Success Criteria**: + - Average cost <$0.10 per research task - 90% reduction in research time vs. single-agent - Observability covering all critical paths @@ -1581,6 +1619,7 @@ class WorkerAgent: ### Cost Optimization **Single-Model Baseline (Tongyi DeepResearch 30B A3B)**: + ```python DEFAULT_MODEL = "alibaba/tongyi-deepresearch-30b-a3b" @@ -1604,6 +1643,7 @@ def build_llm(profile: ResearchProfile, temperature: float = 0): Using a single reasoning-optimized model keeps behavior consistent, simplifies evaluation, and honors the "no Anthropic models" requirement while still allowing future swaps by editing `profile.model_name` rather than touching code. [\[link\]](https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b) **Token Budget Enforcement**: + ```python class TokenBudgetManager: def __init__(self, monthly_budget: float): @@ -1630,6 +1670,7 @@ class TokenBudgetManager: ``` **Reasoning Traces & Prompt Reuse (OpenRouter)**: + ```python def build_openrouter_payload(system_prompt: str, user_query: str, profile: ResearchProfile): messages = [ @@ -1653,6 +1694,7 @@ OpenRouter exposes a native `reasoning` parameter and returns `reasoning_details ### Observability **Structured Logging**: + ```python import structlog @@ -1686,6 +1728,7 @@ class ObservableAgent: ``` **LangSmith Integration**: + ```python from langsmith import trace @@ -1698,6 +1741,7 @@ def execute_research(query: str): ``` **Metrics Dashboard**: + ```python from prometheus_client import Counter, Histogram @@ -1738,6 +1782,7 @@ def execute_with_metrics(query: str): ### Error Recovery **Retry with Exponential Backoff**: + ```python from tenacity import retry, stop_after_attempt, wait_exponential @@ -1752,6 +1797,7 @@ class ResilientAgent: ``` **Graceful Degradation**: + ```python def execute_research_with_degradation(query: str): try: @@ -1768,6 +1814,7 @@ def execute_research_with_degradation(query: str): ``` **Sub-Agent Failure Isolation**: + ```python def synthesize_with_partial_results(sub_agent_results: list[dict]) -> str: # Filter successful results @@ -2083,21 +2130,22 @@ By implementing this architecture in **6 phased weeks**, you'll have a robust mu The key differentiators from the initial design: -| Aspect | Initial Design | Improved Design | -|--------|---------------|-----------------| -| **Context Management** | Accumulate all outputs | Compress to 2K summaries | -| **Orchestration** | Single ReAct loop | Hierarchical orchestrator + workers | -| **Execution** | Sequential | Parallel map-reduce | -| **Memory** | Simple checkpointing | Three-tier with Markovian workspace | -| **Scaling** | Fixed architecture | Dynamic 1-10 agents | -| **Cost** | Unoptimized | Tongyi baseline efficiency, budgets, caching | -| **Observability** | Basic logging | LangSmith, metrics, dashboards | +| Aspect | Initial Design | Improved Design | +| ---------------------- | ---------------------- | -------------------------------------------- | +| **Context Management** | Accumulate all outputs | Compress to 2K summaries | +| **Orchestration** | Single ReAct loop | Hierarchical orchestrator + workers | +| **Execution** | Sequential | Parallel map-reduce | +| **Memory** | Simple checkpointing | Three-tier with Markovian workspace | +| **Scaling** | Fixed architecture | Dynamic 1-10 agents | +| **Cost** | Unoptimized | Tongyi baseline efficiency, budgets, caching | +| **Observability** | Basic logging | LangSmith, metrics, dashboards | **Next Steps**: Begin Phase 1 implementation, focusing on the orchestrator-worker pattern and basic parallel execution. Iterate based on real usage, and progressively add advanced features. --- **Related Research**: + - [2025-11-15_deep-research-agent-architecture.md](./2025-11-15_deep-research-agent-architecture.md) - Initial single-agent design (superseded) - Future: Deep Research Agent v3 - Test-time scaling with Heavy Mode - Future: Deep Research Agent - Production Deployment Guide diff --git a/thoughts/shared/research/2025-11-16_alibaba-deepresearch-gap-analysis.md b/thoughts/shared/research/2025-11-16_alibaba-deepresearch-gap-analysis.md index e50fe93..16712c3 100644 --- a/thoughts/shared/research/2025-11-16_alibaba-deepresearch-gap-analysis.md +++ b/thoughts/shared/research/2025-11-16_alibaba-deepresearch-gap-analysis.md @@ -4,7 +4,7 @@ researcher: Claude (Sonnet 4.5) git_commit: 22dbf8d52dc8c995afcf147c11fad7f347571464 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Alibaba-NLP/DeepResearch Architecture & Gap Analysis vs MVP Plan" +topic: 'Alibaba-NLP/DeepResearch Architecture & Gap Analysis vs MVP Plan' tags: [research, deep-research, multi-agent, alibaba, tongyi, gap-analysis, architecture] status: complete last_updated: 2025-11-16 @@ -42,6 +42,7 @@ The architecture supports two inference paradigms: **ReAct mode** (classic thoug #### Overall System Architecture **Multi-Agent vs Single-Agent**: + - Supports both paradigms depending on inference mode - **ReAct Mode**: Single agent with classic reasoning loop - **Heavy Mode**: Multi-agent orchestration with parallel exploration and synthesis @@ -50,6 +51,7 @@ The architecture supports two inference paradigms: **ReAct mode** (classic thoug - Enables "wider range of research paths within limited context window" **Core Model Architecture**: + - **Base Model**: Qwen3-30B-A3B-Base with MoE structure - **Total Parameters**: 30.5 billion - **Activated Parameters**: 3.3 billion per token (sparse activation) @@ -97,21 +99,25 @@ DeepResearch/ #### Agent Types **Primary Agent**: **MultiTurnReactAgent** + - Implements core execution engine at `inference/react_agent.py:180-227` - Follows classic Think→Action→Observation loop - Token limit: 110K (enforced by `_num_tokens()` at lines 166-179) **Heavy Mode Agents**: + 1. **Research Agents**: Multiple parallel agents using IterResearch paradigm 2. **Synthesis Agent**: Aggregates findings from research agents **Supporting Components** (from WebAgent family): + - **Lead Researcher**: High-level planning and decomposition - **Tool-specific Agents**: Specialized for search, visit, scholar operations #### Agent Loop/Reasoning Pattern **ReAct Mode Implementation**: + ``` Loop: 1. Thought Generation: Model reasons about current state and next action @@ -122,6 +128,7 @@ Loop: ``` **IterResearch Paradigm (Heavy Mode)**: + - **Workspace Reconstruction**: "Each round, agent reconstructs streamlined workspace using only essential outputs from previous round" - **Context Management**: Replaces full history with strategic reconstruction: - Question q @@ -131,6 +138,7 @@ Loop: - **Enables**: "Consistent reasoning capacity across arbitrary exploration depths" **Retry Logic**: + - Exponential backoff with 10 attempts - Formula: `base_sleep_time * (2 ** attempt) + random.uniform(0, 1)` - Max sleep: 30 seconds @@ -139,11 +147,13 @@ Loop: #### Agent Communication **Intra-Agent Communication**: + - Shared state through compressed reports (Sₜ) - Message passing via JSONL format - Thread-safe output with `threading.Lock` **Multi-Agent Coordination (Heavy Mode)**: + - Parallel execution with independent context management trajectories - No direct communication between research agents (embarrassingly parallel) - Synthesis phase aggregates compressed reports @@ -152,11 +162,13 @@ Loop: #### Task Decomposition & Distribution **Decomposition Strategy**: + - Planning action synthesis (FAS) strengthens initial planning - "Strong correlation between initial planning and trajectory's accuracy" - Multi-step decision-making processes explored at each step **Distribution**: + - Heavy Mode: Tasks distributed across multiple research agents - Each agent explores different research paths - Sticky port assignment for load balancing @@ -171,6 +183,7 @@ Loop: #### Available Tools (5 Core Tools) **1. Search Tool** + - **Provider**: Serper.dev API (Google Search) - **Capability**: Web search returning top-10 results - **Parameters**: @@ -179,6 +192,7 @@ Loop: - **Implementation**: `inference/tool_search.py` **2. Visit Tool** + - **Provider**: Jina.ai for page content extraction - **Capability**: Targeted webpage extraction with goal-specific summarization - **Parameters**: @@ -188,6 +202,7 @@ Loop: - **Implementation**: `inference/tool_visit.py` **3. Python Interpreter** + - **Provider**: SandboxFusion endpoints - **Capability**: Sandboxed code execution - **Format Requirements**: @@ -197,12 +212,14 @@ Loop: - **Implementation**: `inference/tool_python.py` **4. Google Scholar** + - **Capability**: Academic publication retrieval - **Parameters**: Multiple queries supported - **Output**: Scholarly search results - **Implementation**: `inference/tool_scholar.py` **5. File Parser** + - **Provider**: Dashscope (Alibaba) - **Supported Formats**: PDF, DOCX, PPTX, TXT, CSV, XLSX, MP4, MP3 - **Capability**: Multi-format document analysis @@ -212,6 +229,7 @@ Loop: #### Tool Implementation & Integration **Unified Sandbox Architecture**: + - **Concurrency**: Graceful handling with ThreadPoolExecutor - **Failure Handling**: - Result caching to avoid re-execution @@ -221,12 +239,14 @@ Loop: - **Purpose**: "Preventing tool errors from corrupting learning trajectory" **Tool Call Format**: + - JSON format within structured tags - Custom mapping via `TOOL_MAP` - Special handling in `custom_call_tool` method - Current date provided at runtime for temporal grounding **API Configuration** (via `.env` file): + - `SERPER_KEY_ID`: Web search - `JINA_API_KEYS`: Page reading - `API_KEY/API_BASE`: LLM summarization (OpenAI-compatible) @@ -244,18 +264,21 @@ Loop: #### State Persistence **File-Based Storage**: + - **Input Format**: JSONL (line-delimited JSON, recommended) or JSON (array format) - **Location**: `eval_data/` directory - **Schema**: Question-answer pairs with optional file references - **File Corpus**: `eval_data/file_corpus/` for document inputs **Trajectory Storage**: + - **Format**: JSONL for agent trajectories - **Content**: Sequences of thoughts, actions, observations - **Usage**: Training data generation, evaluation results - **Output**: Results saved to designated output directory in JSONL **Caching Strategy**: + - **File Parser**: `diskcache.Cache` with SHA256 keys to avoid re-parsing identical files - **Tool Results**: Cached to prevent redundant API calls - **Determinism**: Ensures consistent experiences during training @@ -263,6 +286,7 @@ Loop: #### Data Structures **Trajectory Structure** (ℋₜ): + ```python Trajectory = { question: str, @@ -281,6 +305,7 @@ Message = { ``` **Compressed State (Sₜ)**: + ```python CompressedState = { question: str, @@ -291,6 +316,7 @@ CompressedState = { ``` **Environment Abstraction** (Three Forms): + 1. **Prior World Environment**: Task elements + tools without responses (infinite scalability, zero cost) 2. **Simulated Environment**: Offline Wikipedia-based RAG with local tools (rapid iteration, sim-to-real gap) 3. **Real-world Environment**: Authentic distributions with deterministic sandbox wrapper @@ -304,16 +330,19 @@ CompressedState = { #### Context Window Management **Progressive Expansion**: + - Training: 32K → 128K token context - Two-stage Agentic CPT with increasing context length - SFT: Two-stage approach (40K context → 128K context) **Context Management Paradigm**: + - **Challenge**: 128K context insufficient for extreme long-horizon tasks - **Solution**: Markovian state reconstruction via IterResearch - **Mechanism**: Strategic compression replacing full history **IterResearch Context Strategy**: + - **Workspace Reconstruction**: Each round creates streamlined workspace - **Retention Policy**: Only essential outputs from previous round - **Benefits**: @@ -325,6 +354,7 @@ CompressedState = { #### Memory Compression Strategies **ReSum Paradigm** (from WebAgent family): + - **Purpose**: "Overcome context window limitations on complex, long-horizon search tasks" - **Approach**: Periodically compress growing interaction history into compact, structured summary - **Components**: @@ -333,6 +363,7 @@ CompressedState = { - **WebResummer-30B**: Agent trained with ReSum-GRPO achieving SOTA on BrowseComp benchmarks **Compression Techniques**: + 1. **Evidence Distillation**: Extract key findings from noisy, extensive histories 2. **Gap Identification**: Recognize information deficiencies 3. **Action Suggestion**: Recommend next research steps @@ -341,12 +372,14 @@ CompressedState = { #### Aggregation Across Agents **Heavy Mode Synthesis**: + - Multiple research agents operate independently - Each produces compressed report (Sₜᵘ) - Synthesis Agent aggregates {Sₜ¹, Sₜ², ..., Sₜⁿ} - Final answer integrates diverse research paths **Aggregation Strategy**: + - No shared memory during research phase (embarrassingly parallel) - Post-processing synthesis combines findings - Enables exploration of wider solution space @@ -361,12 +394,14 @@ CompressedState = { #### Supported LLM Providers/Models **Primary Model**: + - **Tongyi-DeepResearch-30B-A3B**: Custom-trained MoE model - **Architecture**: Qwen3-30B-A3B-Base lineage - **Format**: Hugging Face Transformers compatible - **License**: Apache 2.0 **Deployment Options**: + 1. **Local Deployment**: Full model inference with GPU - Download weights from HuggingFace or ModelScope - Requires Python 3.10.0 (strict requirement) @@ -386,6 +421,7 @@ CompressedState = { - Hugging Face Spaces demo **Integration for Summarization**: + - OpenAI-compatible APIs for Visit tool content synthesis - Configurable via `API_KEY/API_BASE` environment variables - Goal-specific summarization aligned with information needs @@ -393,6 +429,7 @@ CompressedState = { #### Prompt Structure **System Prompt**: + ``` Your core function is to conduct thorough, multi-source investigations into any topic. You must handle both broad, @@ -401,17 +438,20 @@ academic fields. ``` **Tool Descriptions**: Comprehensive JSON schemas for each tool with: + - Tool name and purpose - Parameter specifications (types, arrays, requirements) - Output format descriptions - Usage constraints and formatting rules **Context Injection**: + - Current date provided at runtime for temporal grounding - Question inserted at conversation start - File references prepended for document-based queries **Message Format**: + ``` [ {role: 'user', content: ''}, @@ -424,18 +464,21 @@ academic fields. #### Special Prompting Techniques **ReAct Mode**: + - "Vanilla setup, no prompt hacks" - "Strictly adheres to Thought-Action-Observation cycle" - Model natively generates structured outputs - No heavy prompt engineering required **Heavy Mode (IterResearch)**: + - Workspace reconstruction prompts - Report synthesis instructions - Context compression directives - Multi-agent coordination prompts for synthesis phase **Custom Tokenizer**: + - Optimized for agentic tokens - Action prefixes and observation delimiters - Embedded tool vocabulary @@ -449,15 +492,18 @@ academic fields. #### Programming Language & Frameworks **Core Language**: Python 3.10.0 (strict requirement) + - Other versions may cause dependency issues - Isolated environment strongly recommended (conda/virtualenv) **ML Frameworks**: + - **Transformers**: Hugging Face ecosystem for model loading - **vLLM**: Likely used for efficient inference (not explicitly confirmed in sources) - **rLLM Framework**: Custom step-level asynchronous RL training loop **Concurrency & Parallelism**: + - **ThreadPoolExecutor**: For parallel tool execution (default 20 workers) - **threading.Lock**: Thread-safe file writes - **Async Patterns**: Step-level asynchronous RL rollouts @@ -465,6 +511,7 @@ academic fields. #### Key Dependencies **Based on repository structure and documentation**: + - `requirements.txt`: All dependencies listed (specific packages not detailed in search results) - **diskcache**: For file parser caching - **requests** (implied): For API calls to external services @@ -472,6 +519,7 @@ academic fields. - **JSON/JSONL processing**: Standard library **External Services** (Required for full functionality): + - **Serper.dev**: Web search API - **Jina.ai**: Page content extraction - **Dashscope**: Alibaba's file parsing service @@ -481,6 +529,7 @@ academic fields. #### Specialized Libraries **Training Infrastructure**: + - **rLLM Framework**: Custom async RL training - **Group Relative Policy Optimization (GRPO)**: Custom implementation - Token-level policy gradients @@ -488,11 +537,13 @@ academic fields. - Conservative negative sampling strategy **Data Synthesis**: + - **AgentFounder Pipeline**: Entity-anchored knowledge graph processing - **AgentScaler Pipeline**: Environment scaling for RL - **Knowledge Graph Libraries**: For entity extraction and relationship mapping **Evaluation**: + - Custom benchmark frameworks in `evaluation/` directory - Support for 7+ benchmark datasets - Model judges: Qwen2.5-72B, Gemini-2.0-Flash, GPT-4o variants, o3-mini @@ -504,6 +555,7 @@ academic fields. ### 1. Architecture & Scale **Alibaba-NLP/DeepResearch:** + - Production-ready 30.5B parameter MoE model (3.3B activated) - Custom-trained foundation model optimized for agentic tasks - 128K context window with progressive expansion training @@ -511,6 +563,7 @@ academic fields. - End-to-end training pipeline (CPT + SFT + RL) **MVP Plan** (`thoughts/shared/plans/deep-research-agent-python-mvp.md`): + - Uses off-the-shelf models (GPT-4o-mini, GPT-4, OpenAI/Anthropic/OpenRouter) - No custom model training - Relies on provider context limits (~50K-200K depending on model) @@ -518,6 +571,7 @@ academic fields. - No RL training infrastructure **Gap Summary:** + - ❌ **Model Training**: MVP has no model training; relies entirely on API providers - ❌ **Scale**: No MoE architecture or parameter optimization - ✅ **Multi-Model Support**: MVP more flexible with provider selection @@ -528,6 +582,7 @@ academic fields. ### 2. Agent Loop Strategy **Alibaba-NLP/DeepResearch:** + - **ReAct Mode**: Classic Think→Action→Observation with native model generation - **Heavy Mode (IterResearch)**: - Markovian state reconstruction @@ -539,6 +594,7 @@ academic fields. - **Retry Logic**: Exponential backoff (10 attempts, max 30s sleep) **MVP Plan:** + - **Phase 1**: Basic ReAct loop with manual tool call parsing - Custom XML tag parsing (``, ``) - Max 20 iterations, 50K token budget @@ -550,6 +606,7 @@ academic fields. - **Phase 3**: Code execution in Jupyter kernel for EDA **Gap Summary:** + - ❌ **Context Management**: No IterResearch-style workspace reconstruction - ❌ **Native ReAct**: MVP uses XML tag parsing; Alibaba has native model generation - ✅ **LangGraph**: MVP uses production framework; Alibaba uses custom orchestration @@ -561,6 +618,7 @@ academic fields. ### 3. Tools & Capabilities **Alibaba-NLP/DeepResearch (5 Tools):** + 1. **Search**: Serper.dev (Google Search), top-10 results, concurrent queries 2. **Visit**: Jina.ai with goal-specific summarization 3. **Python**: SandboxFusion with strict format (empty args JSON, `` tags) @@ -570,13 +628,14 @@ academic fields. **Unified Sandbox**: Thread-safe, cached results, exponential backoff, redundant providers **MVP Plan (Phase 1: 2 Tools, Phase 3: +1 Tool):** + - **Phase 1**: 1. **Search**: Brave or Serper API, max 10 results 2. **Fetch**: Jina Reader or simple httpx (basic HTML stripping) -- **Phase 3**: - 3. **Python Executor**: Jupyter kernel with AST-based safety checks +- **Phase 3**: 3. **Python Executor**: Jupyter kernel with AST-based safety checks **Gap Summary:** + - ❌ **Scholar Tool**: MVP has no academic search capability - ❌ **File Parser**: MVP lacks multi-format document parsing - ❌ **Goal-Specific Summarization**: MVP's fetch is basic; no goal-alignment @@ -590,6 +649,7 @@ academic fields. ### 4. Storage & State Management **Alibaba-NLP/DeepResearch:** + - **Input**: JSONL (recommended) or JSON array format - **Trajectory Storage**: JSONL for training data and evaluation - **Caching**: `diskcache.Cache` with SHA256 keys for file parser @@ -597,6 +657,7 @@ academic fields. - **Environment Abstraction**: Three-tier (Prior World, Simulated, Real-world) **MVP Plan:** + - **Phase 1**: - Simple JSON file store (`FileStore` class) - Session logs saved to `outputs/sessions/.json` @@ -605,6 +666,7 @@ academic fields. - **Phase 3**: Notebook outputs saved as `.ipynb` files **Gap Summary:** + - ❌ **Trajectory Format**: MVP uses simple JSON; no JSONL training format - ❌ **Result Caching**: MVP has no caching layer (could add easily) - ❌ **Environment Abstraction**: MVP only interacts with real-world; no simulated environments @@ -616,6 +678,7 @@ academic fields. ### 5. Memory & Context Management **Alibaba-NLP/DeepResearch:** + - **128K Context Window**: Trained with progressive expansion (32K → 128K) - **IterResearch Paradigm**: - Workspace reconstruction each round @@ -629,6 +692,7 @@ academic fields. - ReSumTool-30B specialized compression model **MVP Plan:** + - **Token Counting**: Basic `tiktoken` usage - **Budget Management**: - Phase 1: 50K token limit with 90% warning @@ -639,6 +703,7 @@ academic fields. - **No Structured Memory**: Linear message history until budget exhausted **Gap Summary:** + - ❌ **Sophisticated Compression**: No IterResearch or ReSum paradigm - ❌ **Workspace Reconstruction**: MVP maintains full message history - ❌ **Specialized Compression Model**: MVP uses same LLM for compression @@ -652,6 +717,7 @@ academic fields. ### 6. LLM Integration & Prompting **Alibaba-NLP/DeepResearch:** + - **Custom Model**: Tongyi-DeepResearch-30B-A3B (Apache 2.0 license) - **Training**: Agentic CPT + SFT + GRPO reinforcement learning - **Prompting**: "Vanilla setup, no prompt hacks" - native ReAct generation @@ -660,6 +726,7 @@ academic fields. - **Summarization**: OpenAI-compatible APIs for Visit tool **MVP Plan:** + - **Phase 1**: OpenRouter/OpenAI/Anthropic API clients - **Models**: GPT-4o-mini (default), GPT-4, Claude variants - **Prompting**: @@ -670,6 +737,7 @@ academic fields. - **No Custom Training**: Zero model optimization **Gap Summary:** + - ❌ **Custom Model**: MVP has no proprietary model; fully API-dependent - ❌ **Agentic Training**: No CPT, SFT, or RL training pipeline - ❌ **Native ReAct**: MVP requires prompt engineering and parsing @@ -683,6 +751,7 @@ academic fields. ### 7. Tech Stack & Infrastructure **Alibaba-NLP/DeepResearch:** + - **Language**: Python 3.10.0 (strict) - **ML Framework**: Transformers, likely vLLM, rLLM (custom RL framework) - **Concurrency**: ThreadPoolExecutor (20 workers), threading.Lock, async RL rollouts @@ -695,6 +764,7 @@ academic fields. - **Data**: JSONL processing, knowledge graph libraries **MVP Plan:** + - **Language**: Python 3.11+ - **Package Manager**: `uv` (modern, fast) - **Framework**: LangGraph (multi-agent orchestration) @@ -710,6 +780,7 @@ academic fields. - **No Training Infrastructure** **Gap Summary:** + - ❌ **Training Pipeline**: MVP has zero training capabilities - ❌ **Custom RL Framework**: No rLLM, GRPO, or AgentFounder/AgentScaler - ✅ **Modern Package Manager**: `uv` faster than pip/conda @@ -724,6 +795,7 @@ academic fields. ### 8. Evaluation & Quality **Alibaba-NLP/DeepResearch:** + - **Benchmarks**: 7+ datasets (BrowseComp, Wiki1B, etc.) - **Model Judges**: Qwen2.5-72B, Gemini-2.0-Flash, GPT-4o, o3-mini - **Performance**: Comparable or exceeding o3 @@ -731,6 +803,7 @@ academic fields. - **Metrics**: Task success rate, reasoning quality, tool usage efficiency **MVP Plan:** + - **Automated Verification**: - `uv sync`, `ruff check`, `mypy src/`, `pytest tests/` - CLI execution tests @@ -742,6 +815,7 @@ academic fields. - **No Formal Benchmarks**: Testing on ad-hoc queries **Gap Summary:** + - ❌ **Benchmark Suite**: MVP has no formal evaluation datasets - ❌ **Model Judges**: No automated quality assessment - ❌ **Performance Metrics**: No systematic measurement @@ -910,10 +984,12 @@ academic fields. ### Main Repository & Papers **GitHub**: + - [Alibaba-NLP/DeepResearch](https://github.com/Alibaba-NLP/DeepResearch) - Main repository - [Alibaba-NLP/WebAgent](https://github.com/Alibaba-NLP/WebAgent) - WebAgent family **Technical Papers**: + 1. [Tongyi DeepResearch Technical Report (arXiv:2510.24701)](https://arxiv.org/abs/2510.24701) - Main paper 2. [AgentFounder: Scaling Agents via Continual Pre-training (arXiv:2509.13310)](https://arxiv.org/abs/2509.13310) 3. [AgentScaler: Environment Scaling for Agentic Intelligence (arXiv:2509.13311)](https://arxiv.org/abs/2509.13311) diff --git a/thoughts/shared/research/2025-11-17_improving-eda-agent-multi-source-parallel.md b/thoughts/shared/research/2025-11-17_improving-eda-agent-multi-source-parallel.md index 166473f..3083afb 100644 --- a/thoughts/shared/research/2025-11-17_improving-eda-agent-multi-source-parallel.md +++ b/thoughts/shared/research/2025-11-17_improving-eda-agent-multi-source-parallel.md @@ -4,7 +4,7 @@ researcher: Claude (Sonnet 4.5) git_commit: f02b5c6740b7d3c156f172c0e49106b37563d25a branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Improving EDA Agent: Multi-Data Source Support & Parallel Execution as Sub-Agent" +topic: 'Improving EDA Agent: Multi-Data Source Support & Parallel Execution as Sub-Agent' tags: [research, deep-research, eda, multi-agent, data-analysis, parallel-execution, tool-system] status: complete last_updated: 2025-11-17 @@ -22,6 +22,7 @@ last_updated_by: Claude ## Research Question How can the IterativeEDAAgent be transformed into a sub-agent tool that: + 1. Can be used by the multi-agent system as a tool 2. Supports running multiple EDA agents concurrently (parallel execution) 3. Handles multiple data source types (pickle, csv, excel, parquet, etc.) @@ -43,22 +44,26 @@ This research provides a comprehensive implementation plan with code references, ### 1. Current EDA Agent Architecture #### Implementation Location + - **Primary**: `deep-research-agent/src/deep_research/agent/iterative_eda.py:19-655` - **Alternative**: `deep-research-agent/src/deep_research/agent/data_agent.py:16-419` (simpler, older version) #### Current Workflow (4 Phases) **Phase 1: Load & Understand** (`iterative_eda.py:49-50`, `iterative_eda.py:95-172`) + - Loads CSV into pandas: `df = pd.read_csv(filepath)` (line 98) - Executes setup code in Jupyter kernel via CodeExecutor - Extracts schema: shape, columns, dtypes, missing values, head samples **Phase 2: Identify Target** (`iterative_eda.py:53-55`, `iterative_eda.py:174-212`) + - LLM analyzes query + schema to identify target variable - Uses heuristic fallbacks: "price", "cost", "value" patterns - Returns target column for analysis focus **Phase 3: Iterative Exploration** (`iterative_eda.py:58-86`) + - Loop up to `max_iterations` (default: 7): 1. **Goal Check** (after 3+ iterations): LLM evaluates if sufficient insights gathered → early stop 2. **Plan**: LLM generates Python exploration code based on insights so far @@ -67,6 +72,7 @@ This research provides a comprehensive implementation plan with code references, 5. **Store**: Append insight to list, continue loop **Phase 4: Generate Outputs** (`iterative_eda.py:89-90`, `iterative_eda.py:435-456`) + - Generate markdown report summarizing insights - Build executed Jupyter notebook with: - Executive summary (LLM-generated from all insights) @@ -77,16 +83,19 @@ This research provides a comprehensive implementation plan with code references, #### Current Limitations **Single File Format** (`iterative_eda.py:98`, `data_agent.py:34`) + - Hardcoded `pd.read_csv(filepath)` calls - No file extension detection or format validation - Only CSV support (no Excel, Parquet, Pickle, etc.) **Not a Tool** + - EDA agent is directly instantiated by CLI (`cli.py:275`) - Not registered in tool system - Cannot be called by other agents (ReactAgent, WorkerAgent) **No Parallel Support** + - Single synchronous execution - No mechanism to run multiple EDA analyses concurrently - CodeExecutor creates single kernel instance (blocks concurrent use) @@ -127,6 +136,7 @@ class Tool(ABC): ``` **Key Pattern**: Tools return `ToolResult` with: + - `success`: Boolean execution status - `content`: String output (formatted for LLM consumption) - `metadata`: Dict with diagnostic info (query, URL, tokens, etc.) @@ -174,6 +184,7 @@ class SearchTool(Tool): ``` **Key Aspects**: + - Environment-based configuration (API keys) - Error handling with try/except → returns failure ToolResult - Helper methods for API calls and formatting @@ -212,6 +223,7 @@ class ReactAgent: ``` **Key Pattern**: + 1. Custom tool implements `Tool` base class (domain logic) 2. `@tool` decorated function wraps custom tool (LangChain integration) 3. Decorated function returns string (LangChain requirement) @@ -257,12 +269,14 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: ``` **Key Mechanism**: + 1. **Dynamic Fan-Out**: `Send` objects created for each task → LangGraph parallelizes execution 2. **State Reducer**: `worker_results: Annotated[list[dict], add]` automatically merges parallel outputs 3. **Isolated Instances**: Each worker gets own agent instance (no shared state) 4. **Result Compression**: Summaries compressed to prevent state bloat **Graph Structure** (`orchestrator.py:80-101`): + ``` START → analyze → plan → [workers in parallel] → synthesize → END ↑ @@ -369,11 +383,13 @@ class CodeExecutor: ``` **Limitation for Parallel Execution**: + - Single kernel instance per CodeExecutor - `execute()` is synchronous (blocks) - No kernel pooling or management **Needed for Parallel EDA**: + - Kernel pool or factory pattern - Async execution support - Kernel lifecycle management (auto-cleanup) @@ -385,22 +401,27 @@ class CodeExecutor: #### File Format Support Analysis **Current Implementation** (`iterative_eda.py:98`, `data_agent.py:34`): + ```python # Only CSV loading implemented df = pd.read_csv(filepath) ``` **CLI Validation** (`cli.py:231`): + ```python @click.argument("filepath", type=click.Path(exists=True)) ``` + - Only validates file existence, not format - No extension checking **Available Dependencies** (`pyproject.toml:19`): + - `pandas>=2.2.0` - supports all formats via different readers **Missing Formats**: + - Excel (.xlsx, .xls) - no `read_excel` calls - Parquet (.parquet) - no `read_parquet` calls - Pickle (.pkl) - no `read_pickle` calls @@ -408,6 +429,7 @@ df = pd.read_csv(filepath) - Other: TSV, HDF5, Feather, etc. **Required Dependencies** (need to add to pyproject.toml): + ```toml dependencies = [ # ... existing ... @@ -496,6 +518,7 @@ class DataLoader: ``` **Usage in IterativeEDAAgent**: + ```python # Replace line 98 in iterative_eda.py from ..tools.data_loader import DataLoader @@ -659,6 +682,7 @@ self.tools = [search, fetch, eda] # Add eda to list **Result**: ReactAgent can now perform EDA during research workflows! Example usage: + ```bash uv run research research "Analyze customer_churn.csv and identify key churn drivers" @@ -677,6 +701,7 @@ uv run research research "Analyze customer_churn.csv and identify key churn driv #### Problem: Single Kernel Limitation Current CodeExecutor creates one kernel per agent instance: + - `IterativeEDAAgent.__init__()` → `self.executor = CodeExecutor()` → starts single kernel - Multiple agents → multiple kernels (OK for parallelization!) @@ -685,6 +710,7 @@ Current CodeExecutor creates one kernel per agent instance: #### Step 1: Verify Kernel Isolation Each EDA agent gets own executor: + ```python # In EDATool.execute() agent = IterativeEDAAgent(model=self.model) # Line 46 @@ -692,6 +718,7 @@ agent = IterativeEDAAgent(model=self.model) # Line 46 ``` **Test**: Spawn 2 EDA agents with same dataset: + ```python # Both run concurrently without interference task1 = eda_tool.execute(filepath="data.csv", query="analyze sales trends") @@ -743,6 +770,7 @@ Return JSON array of tasks. ``` **Result**: Orchestrator can now spawn parallel mix of: + - Web research workers (using search/fetch tools) - Data analysis workers (using eda tool) @@ -753,6 +781,7 @@ Return JSON array of tasks. #### Implementation Plan **Step 1**: Add DataLoader to project + - Create `src/deep_research/tools/data_loader.py` (code shown above in section 5) - Add dependencies to `pyproject.toml`: ```toml @@ -904,6 +933,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - [ ] Update README.md with examples for each format **Success Criteria**: + - [ ] `uv run research eda data.csv "query"` works - [ ] `uv run research eda data.xlsx "query"` works - [ ] `uv run research eda data.parquet "query"` works @@ -924,6 +954,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - [ ] Test integration with ReactAgent **Success Criteria**: + - [ ] ReactAgent can call eda tool during research - [ ] Tool returns formatted insights to agent - [ ] Agent can reason about data analysis results @@ -942,6 +973,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - Test result merging **Success Criteria**: + - [ ] Multiple EDA analyses can run concurrently without conflicts - [ ] No kernel crashes or interference - [ ] Results from parallel executions are correctly merged @@ -961,6 +993,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - [ ] Performance optimization: cache data loading, kernel reuse **Success Criteria**: + - [ ] Query like "Analyze sales.csv and compare with industry" completes - [ ] Final report includes both data insights and web research - [ ] Notebook + markdown report both generated @@ -979,6 +1012,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - [ ] Security review: ensure AST checks cover all formats **Success Criteria**: + - [ ] Documentation has clear examples for all use cases - [ ] New users can run examples without errors - [ ] Performance meets targets (<3 min for combined queries) @@ -1016,12 +1050,14 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: ### Relevant Planning Documents **Deep Research Agent MVP Plan** (`thoughts/shared/plans/deep-research-agent-python-mvp.md`) + - Phase 3 (Week 3) details EDA implementation with notebook generation - 7-act narrative structure for notebooks - Jupyter kernel integration patterns - Code executor safety checks **Architecture Research** (`thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md`) + - ReAct framework implementation - LangGraph state management patterns - Tool suite design @@ -1031,18 +1067,21 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: ### Design Decisions **Why Not External Sandboxing** (from architecture doc): + - Local Jupyter kernel for privacy - No API costs (E2B/Modal avoided) - Full control over execution environment - Sufficient for MVP with AST safety checks **Why LangGraph Over Custom Orchestration**: + - Built-in state management and checkpointing - Send API for dynamic parallelization - Conditional routing without complex control flow - State reducers handle result merging automatically **Why Tool Base Pattern**: + - Consistent interface across all tools - Easy to test in isolation - Metadata tracking for observability @@ -1059,6 +1098,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: **Current**: Each EDA agent creates new kernel (isolated but resource-heavy) **Options**: + - **A. Keep current approach**: Simple, isolated, but uses more memory - **B. Kernel pool**: Reuse kernels across analyses, add cleanup between uses - **C. Hybrid**: Pool for sequential, isolated for parallel @@ -1072,6 +1112,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: **Current**: Uses default pandas options **Options**: + - **A. Hardcoded defaults**: Simple, works for 90% of cases - **B. Kwargs passthrough**: `DataLoader.load(path, **pandas_kwargs)` - **C. Config file**: `.edaconfig.json` with per-file settings @@ -1085,6 +1126,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: **Current**: Executed at end of IterativeEDAAgent.analyze() **Options**: + - **A. Always execute**: Validates code works, adds outputs - **B. Optional flag**: `--execute/--no-execute` - **C. Lazy execution**: Generate notebook, let user execute @@ -1098,6 +1140,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: **Current**: WorkerAgent compresses to 2000 tokens **Options**: + - **A. Compress EDA insights**: Summary only (loses detail) - **B. Full insights**: Better context, uses more tokens - **C. Hybrid**: Full insights + notebook path reference @@ -1162,12 +1205,14 @@ The IterativeEDAAgent can be successfully transformed into a multi-agent tool wi 4. **Incremental delivery** - each phase adds standalone value **Key Insights**: + - **Parallelization already works** - each agent gets isolated kernel - **Tool pattern is proven** - SearchTool and FetchTool demonstrate the pattern - **Orchestrator is ready** - Send API generalizes to any worker type (web or data) - **Multi-format is straightforward** - DataLoader factory pattern handles all pandas formats **Biggest Wins**: + - **Combined insights**: Web research + data analysis in single query - **Parallel execution**: 3-5x faster with multiple concurrent analyses - **Format flexibility**: CSV, Excel, Parquet, Pickle all supported diff --git a/thoughts/shared/research/2025-11-20_obsidian-iterative-research-architecture.md b/thoughts/shared/research/2025-11-20_obsidian-iterative-research-architecture.md index e186a35..ba25184 100644 --- a/thoughts/shared/research/2025-11-20_obsidian-iterative-research-architecture.md +++ b/thoughts/shared/research/2025-11-20_obsidian-iterative-research-architecture.md @@ -4,7 +4,7 @@ researcher: Emil Wareus git_commit: c032b77aec5b215d766dad45f6511c629f728e73 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Obsidian-Based Iterative Research System Architecture" +topic: 'Obsidian-Based Iterative Research System Architecture' tags: [research, deep-research-agent, obsidian, knowledge-graph, multi-agent, langgraph] status: complete last_updated: 2025-11-20 @@ -22,6 +22,7 @@ last_updated_by: Emil Wareus ## Research Question How can we transform the deep-research multi-agent system into an iterative, knowledge-graph-driven research platform using Obsidian as the persistence layer, enabling: + 1. Full traceable session storage with complete worker context 2. Worker-specific research expansion via CLI 3. Report recompilation with custom synthesis instructions @@ -33,6 +34,7 @@ How can we transform the deep-research multi-agent system into an iterative, kno Through comprehensive codebase analysis and external research into Obsidian's knowledge management capabilities, we discovered a viable architectural pattern for transforming the existing multi-agent research system into an iterative research platform. The current system loses critical context during worker result compression (orchestrator.py:352), limiting the ability to expand or reanalyze research. By implementing full context capture and Obsidian-based storage, we can enable iterative research workflows while maintaining backwards compatibility. **Key Findings**: + - Current architecture compresses worker outputs from full context (10-50KB) to 2000 tokens (~8KB), losing 80-90% of research detail - Obsidian's markdown + YAML frontmatter + wikilinks provide natural structure for research sessions as knowledge graphs - LangGraph's state machine supports session tracking with minimal modifications @@ -87,6 +89,7 @@ return { ``` **What's Lost**: + - Full worker reasoning steps (ReAct thought-action-observation loops) - All individual tool calls and their complete results - Complete search queries and fetched page contents @@ -94,6 +97,7 @@ return { - Intermediate insights before compression **Impact**: 80-90% of worker research context is discarded, making it impossible to: + - Accurately expand on specific worker findings - Debug why certain conclusions were reached - Recompile reports with different analytical angles @@ -115,6 +119,7 @@ def save_session(self, session_id: str, data: dict[str, Any]) -> None: ``` **Limitations Discovered**: + 1. Only saves compressed summaries (not full worker context) 2. Single JSON file per session (no versioning) 3. No graph structure (can't link insights across sessions) @@ -132,7 +137,7 @@ Research into Obsidian documentation revealed that YAML frontmatter (also called type: research_session session_id: session_abc123 version: 1 -query: "What are the latest trends in AI research?" +query: 'What are the latest trends in AI research?' status: completed created_at: 2025-01-20T14:25:00Z updated_at: 2025-01-20T14:45:00Z @@ -148,6 +153,7 @@ tags: ``` **Benefits**: + - Queryable via Dataview plugin - Native Obsidian UI display - Git-friendly version control @@ -160,14 +166,17 @@ Obsidian's `[[note name]]` syntax creates bidirectional links automatically: ```markdown **Session**: [[session_abc123_v1]] **Workers**: + - [[task_1_worker|Worker 1]]: Foundation models research - [[task_2_worker|Worker 2]]: Multimodal AI systems **Key Insights**: + - [[insight_20250120_143052|Foundation models reaching 10T parameters]] ``` **Graph View Benefits**: + - Visual exploration of research connections - Discover non-obvious relationships - Navigate between session → worker → insight → source @@ -185,6 +194,7 @@ SORT created_at ASC ``` **Use Cases**: + - Aggregate worker costs across sessions - Find all insights related to a topic - Generate session summaries dynamically @@ -221,6 +231,7 @@ outputs/obsidian/ ``` **Rationale**: + - **sessions/**: Central MOCs that link to all related entities - **workers/**: Grouped by session for organizational clarity - **insights/**: Flat structure enables cross-session linking @@ -306,6 +317,7 @@ class ResearchSession: ``` **Versioning Pattern**: `session_{hash}_v{N}` + - Same base query → same session ID - Expansions/recompilations increment version - Never delete previous versions (audit trail) @@ -316,12 +328,12 @@ class ResearchSession: **File**: `outputs/obsidian/sessions/session_abc123_v1.md` -```markdown +````markdown --- type: research_session session_id: session_abc123 version: 1 -query: "What are the latest trends in AI research?" +query: 'What are the latest trends in AI research?' status: completed created_at: 2025-01-20T14:25:00Z updated_at: 2025-01-20T14:45:00Z @@ -339,6 +351,7 @@ tags: # Research Session: Latest AI Research Trends ## Query + > What are the latest trends in AI research? ## Research Plan @@ -346,6 +359,7 @@ tags: Complexity: 0.75 (5 workers) ### Workers + 1. [[task_1_worker|Worker 1]]: Foundation models and scaling laws 2. [[task_2_worker|Worker 2]]: Multimodal AI systems 3. [[task_3_worker|Worker 3]]: AI safety and alignment research @@ -367,6 +381,7 @@ Complexity: 0.75 (5 workers) Total: 45 sources across all workers ### By Worker + - Worker 1: 12 sources - Worker 2: 9 sources - Worker 3: 8 sources @@ -380,7 +395,9 @@ LIST FROM [[session_abc123_v1]] WHERE type = "worker" OR type = "insight" ``` -``` +```` + +```` **Purpose**: Central navigation hub linking to all session components @@ -441,7 +458,7 @@ tags: *This is what gets passed to the synthesis step:* Foundation models in 2024-2025 show continued scaling trends with models reaching 10T parameters through mixture-of-experts architectures... -``` +```` **Critical Feature**: Full ReAct trace preserved for expansion and debugging @@ -454,8 +471,8 @@ Foundation models in 2024-2025 show continued scaling trends with models reachin type: insight insight_id: insight_20250120_143052 created_at: 2025-01-20T14:30:52Z -source_session: "[[session_abc123_v1]]" -source_worker: "[[task_1_worker]]" +source_session: '[[session_abc123_v1]]' +source_worker: '[[task_1_worker]]' tags: - scaling-laws - foundation-models @@ -508,6 +525,7 @@ def expand(session: str, worker: str, prompt: str, model: str | None, verbose: b ``` **Workflow**: + 1. Load session from Obsidian vault 2. Extract target worker's full context 3. Build expansion query incorporating previous findings @@ -515,6 +533,7 @@ def expand(session: str, worker: str, prompt: str, model: str | None, verbose: b 5. Save as new version (v2, v3, etc.) **Example Usage**: + ```bash research expand --session=session_abc123 --worker=task_1 "Research GPU costs in detail" ``` @@ -531,6 +550,7 @@ def recompile_report(session: str, instructions: str | None, model: str | None) ``` **Workflow**: + 1. Load session from Obsidian 2. Extract ALL worker full contexts (not compressed summaries) 3. Generate new synthesis with custom instructions @@ -538,6 +558,7 @@ def recompile_report(session: str, instructions: str | None, model: str | None) 5. Update session MOC to reference new report **Example Usage**: + ```bash research recompile-report --session=session_abc123 "Focus on cost-benefit analysis" ``` @@ -549,6 +570,7 @@ research recompile-report --session=session_abc123 "Focus on cost-benefit analys **Location**: `src/deep_research/obsidian/writer.py` (new module) **Core Responsibilities**: + 1. Create vault directory structure 2. Write session MOCs with frontmatter and wikilinks 3. Write worker notes with full ReAct traces @@ -558,6 +580,7 @@ research recompile-report --session=session_abc123 "Focus on cost-benefit analys 7. Maintain bidirectional link integrity **Key Methods**: + ```python class ObsidianWriter: def write_session(self, session: ResearchSession) -> Path: @@ -584,6 +607,7 @@ class ObsidianWriter: **Location**: `src/deep_research/agent/orchestrator.py` (modifications) **Required Changes**: + 1. Add `save_to_obsidian` flag to `__init__()` 2. Initialize `ObsidianWriter` instance 3. Create `ResearchSession` tracking at start of `research()` @@ -592,6 +616,7 @@ class ObsidianWriter: 6. Add session ID and version to result metadata **Critical Modification** (orchestrator.py:320-393): + ```python async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: """Execute worker - CAPTURE FULL CONTEXT.""" @@ -627,6 +652,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: #### Why Obsidian Over Database? **Advantages Discovered**: + - **Human-readable**: Direct markdown editing and reading - **Built-in graph view**: Visual knowledge exploration (no custom visualization needed) - **Dataview plugin**: SQL-like queries without database setup @@ -636,6 +662,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: - **Offline-first**: Works without network connection **Trade-offs**: + - File I/O slower than database for large-scale queries (10,000+ notes) - No ACID guarantees (but single-user research doesn't need them) - Manual index management (Dataview provides virtual indexes) @@ -645,6 +672,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: #### Why Store Full Context (Not Compressed)? **Rationale**: + 1. **Expansion accuracy**: LLM needs full context to continue research coherently 2. **Recompilation quality**: Different synthesis angles require access to all evidence 3. **Debugging capability**: Understand exactly what the worker saw and did @@ -652,6 +680,7 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: 5. **Research evolution**: Build knowledge over time, not just final outputs **Cost Analysis**: + - Compressed: ~50KB per worker - Full context: ~500KB - 2MB per worker (10-40x larger) - Storage cost: ~$0.023/GB/month (S3 Standard) @@ -664,12 +693,14 @@ async def _worker_execution(self, state: dict[str, Any]) -> dict[str, Any]: **Pattern**: `session_{hash}_v{N}` **How It Works**: + - Initial query "What are AI trends?" → `session_abc123_v1` - Expansion of worker 1 → `session_abc123_v2` (same base ID) - Recompilation → `session_abc123_v3` - Different query "What is Python?" → `session_def456_v1` (new base ID) **Alternative Considered**: UUID per execution + - **Rejected**: Loses connection between related research sessions - No way to know v2 expanded v1 - Graph becomes disconnected forest instead of connected graph @@ -727,9 +758,11 @@ class ReActIteration: The key architectural insight: **Compress for synthesis, never for storage.** Current flow: + 1. Worker generates full output → compress → store compressed → synthesize Proposed flow: + 1. Worker generates full output → store full → compress copy → synthesize This adds minimal overhead (one extra data structure) but preserves all context. @@ -744,11 +777,13 @@ The proposed architecture maintains full backwards compatibility: - `research recompile-report` - New command, requires Obsidian Users opt-in via: + ```bash research multi "query" --save-obsidian ``` Or environment variable: + ```bash export DEEP_RESEARCH_OBSIDIAN=true ``` @@ -793,6 +828,7 @@ This research builds on previous work documented in: ### Similar Systems **Comparison with Obsidian Research Plugins**: + - **Research Assistant** plugin - Manual note-taking focused - **Zotero Integration** plugin - Academic paper management - **Smart Connections** plugin - Embedding-based search @@ -825,6 +861,7 @@ Based on the research, we identified measurable success criteria: **Question**: Should insights be auto-extracted from worker outputs using LLM, or manually curated? **Options**: + - **Auto-extraction**: Use LLM to identify key insights from worker output - Pro: Automated, consistent, scales - Con: May miss nuance, requires prompt engineering, adds cost @@ -840,6 +877,7 @@ Based on the research, we identified measurable success criteria: **Question**: Should we link insights between sessions (cross-session graph)? **Options**: + - **Within-session only**: Each session is isolated graph - Pro: Simpler, clearer boundaries - Con: Misses connections across research topics @@ -855,6 +893,7 @@ Based on the research, we identified measurable success criteria: **Question**: How to handle same URL fetched by multiple workers? **Options**: + - **No deduplication**: Store separately per worker - Pro: Simple, preserves exact context - Con: Wastes storage, fragments references @@ -870,6 +909,7 @@ Based on the research, we identified measurable success criteria: **Question**: Should we require Dataview plugin for full functionality? **Options**: + - **Required**: Vault assumes Dataview installed - Pro: Rich queries, better UX - Con: Adds setup friction @@ -885,10 +925,12 @@ Based on the research, we identified measurable success criteria: Based on the research findings, a phased implementation approach is recommended: ### Phase 1: Data Capture (Foundation) + **Duration**: 1 week **Focus**: Preserve full context without changing synthesis **Tasks**: + 1. Create enhanced data structures (`WorkerFullContext`, `ResearchSession`, `ToolCall`, `ReActIteration`) 2. Modify `WorkerAgent` to track ReAct iterations 3. Modify `LeadResearcher._worker_execution()` to build full context @@ -897,10 +939,12 @@ Based on the research findings, a phased implementation approach is recommended: **Validation**: Workers complete successfully, full context captured in memory ### Phase 2: Obsidian Writer (Storage) + **Duration**: 1 week **Focus**: Write sessions to Obsidian vault **Tasks**: + 1. Create `obsidian/` module 2. Implement `ObsidianWriter` class 3. Implement note generation (session MOC, workers, sources, reports) @@ -910,10 +954,12 @@ Based on the research findings, a phased implementation approach is recommended: **Validation**: Session writes to Obsidian, graph view shows connections ### Phase 3: CLI Commands (Iteration) + **Duration**: 1 week **Focus**: Enable expansion and recompilation **Tasks**: + 1. Implement `research expand` command 2. Implement `research recompile-report` command 3. Add session loading from Obsidian @@ -923,10 +969,12 @@ Based on the research findings, a phased implementation approach is recommended: **Validation**: Can expand worker research, recompile with new instructions ### Phase 4: Testing & Documentation (Polish) + **Duration**: 1 week **Focus**: Ensure reliability and usability **Tasks**: + 1. Test full workflow with multiple sessions 2. Performance testing with 10+ workers 3. Write user documentation @@ -950,6 +998,7 @@ The research demonstrates that transforming the deep-research multi-agent system 5. **Implementation Feasibility**: 4-week phased approach leverages existing patterns (CLI, state management, HTTP pooling) **Critical Success Factors**: + - Full context capture must not break existing synthesis pipeline - Obsidian vault structure must be intuitive for human exploration - Version management must clearly show research lineage @@ -960,6 +1009,7 @@ The research demonstrates that transforming the deep-research multi-agent system ## References ### Internal Codebase + - `deep-research-agent/src/deep_research/agent/orchestrator.py` - Multi-agent orchestration - `deep-research-agent/src/deep_research/agent/worker.py` - Worker agent implementation - `deep-research-agent/src/deep_research/agent/state.py` - LangGraph state definitions @@ -967,12 +1017,14 @@ The research demonstrates that transforming the deep-research multi-agent system - `deep-research-agent/src/deep_research/cli.py` - CLI command patterns ### External Documentation + - **LangGraph**: https://langchain-ai.github.io/langgraph/ - State machine and parallel execution patterns - **Obsidian Format**: https://help.obsidian.md/ - Markdown, frontmatter, wikilinks specification - **Dataview Plugin**: https://blacksmithgu.github.io/obsidian-dataview/ - Query language documentation - **Zettelkasten Method**: https://zettelkasten.de/posts/overview/ - Knowledge management principles ### Research Papers + - Alibaba DeepResearch (2024) - Multi-agent research system architecture - LangGraph Documentation - Dynamic parallel execution with Send API - Obsidian Community Best Practices - MOC patterns and vault organization diff --git a/thoughts/shared/research/2025-11-21_interactive-research-cli-architecture.md b/thoughts/shared/research/2025-11-21_interactive-research-cli-architecture.md index a7d32c5..076e069 100644 --- a/thoughts/shared/research/2025-11-21_interactive-research-cli-architecture.md +++ b/thoughts/shared/research/2025-11-21_interactive-research-cli-architecture.md @@ -4,8 +4,18 @@ researcher: Emil Wareus git_commit: 6a27b87a0b5c98277f9e2c7f1fb3348e5edadc17 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Interactive Research CLI Mode Architecture - REPL for Deep Research Agent" -tags: [research, deep-research-agent, cli, repl, interactive-mode, session-management, prompt-toolkit, user-experience] +topic: 'Interactive Research CLI Mode Architecture - REPL for Deep Research Agent' +tags: + [ + research, + deep-research-agent, + cli, + repl, + interactive-mode, + session-management, + prompt-toolkit, + user-experience, + ] status: complete last_updated: 2025-11-21 last_updated_by: Emil Wareus @@ -34,6 +44,7 @@ Through comprehensive parallel research across the codebase and external resourc - ✅ Rich terminal output with progress indicators **What's Missing**: + - ❌ REPL/interactive prompt loop - ❌ In-memory active session state manager - ❌ Natural language command parser @@ -82,6 +93,7 @@ def cli() -> None: ``` **Existing Commands**: + 1. **`research`** (`cli.py:74-159`) - Single-agent research 2. **`multi`** (`cli.py:178-239`) - Multi-agent parallel research 3. **`expand`** (`cli.py:242-355`) - Expand specific worker from session @@ -90,6 +102,7 @@ def cli() -> None: 6. **`list-workers`** (`cli.py:552-650`) - Show workers for specific session **Async Execution Pattern** (used by all commands): + ```python def command_name(...): async def _async_implementation(): @@ -135,6 +148,7 @@ class WorkerFullContext: ``` **Critical Feature**: `WorkerFullContext` stores **both** full output and compressed summary. This dual storage enables: + - Full context for human review and continuation - Compressed summaries for LLM synthesis - Complete audit trail of tool usage @@ -177,6 +191,7 @@ class ResearchSession: #### Orchestrator Execution Flow (`orchestrator.py:69-136`) **Session Initialization** (`orchestrator.py:72-85`): + ```python timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") session_hash = abs(hash(query)) % 1000000 @@ -195,11 +210,13 @@ self.session = ResearchSession( ``` **LangGraph Workflow**: + ``` START → analyze → plan → [worker₁, worker₂, ...] → synthesize → END ``` **Worker Spawning** (`orchestrator.py:378-383`): + ```python def _spawn_workers(self, state: ResearchState) -> list[Send]: sends = [Send("worker", {"task": task}) for task in state["sub_tasks"]] @@ -209,6 +226,7 @@ def _spawn_workers(self, state: ResearchState) -> list[Send]: LangGraph's `Send` API enables **dynamic parallel worker execution** - workers run concurrently and populate results as they complete. **Context Capture** (`orchestrator.py:419-445`): + ```python # After worker completes if self.session and hasattr(worker, "full_context") and worker.full_context: @@ -241,12 +259,14 @@ if self.session and hasattr(worker, "full_context") and worker.full_context: #### Progress Display (`utils/live_progress.py:24-201`) **LiveProgress System**: + - Uses Rich library's `Live` display with 4Hz refresh - Thread-safe with `threading.RLock()` - Protocol-based design (`ProgressCallback` protocol) - Displays worker table with real-time status updates **Example Output**: + ``` 🔍 Multi-agent research: What are Swedish housing prices? @@ -273,6 +293,7 @@ Recent Activity #### Obsidian Vault Structure **Directory Layout**: + ``` outputs/obsidian/ ├── session_20250120_142530_abc123/ @@ -291,12 +312,13 @@ outputs/obsidian/ ``` **Session MOC Frontmatter** (`writer.py:92-106`): + ```yaml --- type: research_session session_id: session_20250120_142530_abc123 version: 1 -query: "What are Swedish housing prices?" +query: 'What are Swedish housing prices?' status: completed created_at: 2025-01-20T14:25:00Z updated_at: 2025-01-20T14:45:00Z @@ -304,7 +326,7 @@ model: anthropic/claude-3.5-sonnet complexity: 0.75 num_workers: 3 total_cost: 0.0234 -parent_session: session_20250120_142530_abc123_v1 # For v2+ +parent_session: session_20250120_142530_abc123_v1 # For v2+ tags: [sweden, housing, economics] --- ``` @@ -314,6 +336,7 @@ tags: [sweden, housing, economics] #### Session Loading (`loader.py:19-65`) **Load Flow**: + 1. Locate session file at `{vault_path}/{session_id}/session.md` 2. Parse YAML frontmatter and markdown body 3. Reconstruct `ResearchSession` object from frontmatter @@ -323,6 +346,7 @@ tags: [sweden, housing, economics] 7. Aggregate unique sources across all workers **Example Usage**: + ```python loader = SessionLoader(vault_path="outputs/obsidian") session = loader.load_session("session_20250120_142530_abc123", version=1) @@ -337,6 +361,7 @@ print(session.workers[0].final_output) # Full uncompressed output #### Worker Full Context Preservation (`writer.py:141-183`) **What Gets Stored**: + - **ReAct iterations**: Full thought-action-observation loops - **Tool calls**: All invocations with arguments, results, duration - **Final output**: Complete uncompressed worker output @@ -344,12 +369,13 @@ print(session.workers[0].final_output) # Full uncompressed output - **Sources**: All accessed URLs **Worker Note Template** (`templates.py:85-144`): + ```markdown --- type: worker session_id: session_abc123 task_id: task_1 -objective: "Research Swedish housing prices" +objective: 'Research Swedish housing prices' status: completed tool_calls: 15 cost: 0.0245 @@ -360,19 +386,23 @@ cost: 0.0245 ## Research Process (ReAct Loop) ### Iteration 1 + **Thought**: I need to find recent data on Swedish housing prices **Actions**: + - `search(query="swedish housing prices 2024")` - Success: true - Duration: 2.3s -**Observation**: Found 10 results including SCB statistics + **Observation**: Found 10 results including SCB statistics [... all iterations preserved ...] ## Final Output + [Full uncompressed output - 5000+ words] ## Compressed Summary + [2000-token summary for synthesis] ``` @@ -381,6 +411,7 @@ cost: 0.0245 #### Session Versioning Pattern (`cli.py:332-338`) **Expand Command Creates v2**: + ```python # Load parent session parent_session = loader.load_session(session_id, version) @@ -399,6 +430,7 @@ result = await orchestrator.research(expansion_query) ``` **Versioning Strategy**: + - **Same session_id** across versions (e.g., `session_20250120_142530_abc123`) - **Incremented version** (1, 2, 3...) - **Parent reference** format: `{session_id}_v{version}` @@ -416,6 +448,7 @@ result = await orchestrator.research(expansion_query) **Official Documentation**: https://python-prompt-toolkit.readthedocs.io/en/master/ **Core Features**: + - Native asyncio support (version 3.0+) - Built-in history management via `PromptSession` - Syntax highlighting using Pygments lexers @@ -424,6 +457,7 @@ result = await orchestrator.research(expansion_query) - Multi-line editing support **Async REPL Pattern**: + ```python from prompt_toolkit.patch_stdout import patch_stdout from prompt_toolkit.shortcuts import PromptSession @@ -439,6 +473,7 @@ async def interactive_shell(): ``` **Auto-completion Example**: + ```python from prompt_toolkit.completion import NestedCompleter @@ -462,12 +497,14 @@ session = PromptSession(completer=completer) ``` **Key Features for Interactive Research CLI**: + - `patch_stdout()` context manager prevents async output from corrupting prompt - `prompt_async()` method integrates with asyncio event loop - `ThreadedCompleter` wrapper for expensive completion operations - Command history persisted to `~/.research_history` **Production Examples**: + - **IPython** - Uses prompt_toolkit for terminal REPL - **ptpython** - Enhanced Python REPL built on prompt_toolkit - **AWS CLI tools** - Interactive mode implementations @@ -477,6 +514,7 @@ session = PromptSession(completer=completer) **Repository**: https://github.com/python-cmd2/cmd2 **Comparison**: + - ✅ Out-of-the-box tab completion, history, scripting - ✅ Minimal development effort - ❌ Limited async support (GitHub Issue #764) @@ -487,6 +525,7 @@ session = PromptSession(completer=completer) #### Command Parsing Patterns **shlex + argparse Integration**: + ```python import shlex import argparse @@ -516,12 +555,14 @@ except argparse.ArgumentError as e: ``` **Natural Language Patterns**: + - Tokenize → intent classification → entity extraction → argparse - Support aliases: `start`/`new`/`begin`, `continue`/`resume`, `exit`/`quit` #### Async State Management **aiomonitor Pattern** (https://aiomonitor.aio-libs.org/): + ```python import aiomonitor import asyncio @@ -535,6 +576,7 @@ with aiomonitor.start_monitor(loop, locals=locals): ``` **State Injection Strategy**: + - Maintain `active_session: ResearchSession | None` in REPL context - Inject into REPL namespace for debugging - Update on session start/switch/complete @@ -542,6 +584,7 @@ with aiomonitor.start_monitor(loop, locals=locals): #### Progress Display During Long Operations **Rich Progress Integration**: + ```python from rich.progress import Progress from rich.console import Console @@ -559,6 +602,7 @@ async def start_research(query: str): ``` **tqdm with asyncio**: + ```python import tqdm.asyncio @@ -573,12 +617,14 @@ await tqdm.asyncio.gather(*worker_tasks, desc="Workers") #### Component Overview **New Components** (to be implemented): + 1. **REPL Shell** - `prompt_toolkit` based interactive loop 2. **SessionManager** - In-memory active session tracking 3. **CommandParser** - Natural language → action mapping 4. **ContextManager** - Prepare continuation context from previous sessions **Existing Components** (reuse): + - `LeadResearcher` (orchestrator) - Research execution - `ObsidianWriter` - Session persistence - `SessionLoader` - Session loading @@ -587,12 +633,14 @@ await tqdm.asyncio.gather(*worker_tasks, desc="Workers") #### SessionManager Class (New) **Responsibilities**: + - Track active session in memory - Session lifecycle (start/pause/resume/stop) - Sync with ObsidianWriter/Loader - Context preparation for continuations **Interface**: + ```python class SessionManager: def __init__(self, vault_path: str = "outputs/obsidian"): @@ -642,11 +690,13 @@ class SessionManager: #### CommandParser Class (New) **Responsibilities**: + - Parse user input into commands - Support aliases and natural language - Extract arguments and validate **Interface**: + ```python class CommandParser: def __init__(self): @@ -712,6 +762,7 @@ class CommandParser: #### Interactive REPL Loop (New) **Main Loop**: + ```python async def interactive_repl(): """Main interactive REPL for research CLI.""" @@ -796,6 +847,7 @@ async def interactive_repl(): #### Command Implementations **Start Command**: + ```python async def cmd_start( manager: SessionManager, @@ -838,6 +890,7 @@ async def cmd_start( ``` **Continue Command**: + ```python async def cmd_continue( manager: SessionManager, @@ -880,6 +933,7 @@ async def cmd_continue( ``` **Status Command**: + ```python async def cmd_status(manager: SessionManager, console: Console) -> None: """Show active session status.""" @@ -915,6 +969,7 @@ async def cmd_status(manager: SessionManager, console: Console) -> None: ``` **List Sessions Command**: + ```python async def cmd_list( manager: SessionManager, @@ -979,6 +1034,7 @@ async def cmd_list( #### Context Preparation for Continuation **Compression Strategy**: + ```python def _build_continuation_context(session: ResearchSession) -> str: """Build compressed context from previous session for continuation. @@ -1009,6 +1065,7 @@ def _build_continuation_context(session: ResearchSession) -> str: ``` **Alternative: Include Full Worker Context**: + ```python def _build_full_continuation_context(session: ResearchSession, worker_id: str) -> str: """Include full worker output for deep continuation.""" @@ -1041,6 +1098,7 @@ SOURCES: **Goal**: Basic REPL loop with command parsing **Tasks**: + 1. Add `prompt_toolkit` dependency to `pyproject.toml` 2. Create `src/deep_research/repl/` module 3. Implement `CommandParser` class @@ -1051,6 +1109,7 @@ SOURCES: **Deliverable**: Interactive shell that accepts commands but doesn't execute research yet **Validation**: + ```bash $ deep-research interactive research> start What is quantum computing? @@ -1063,6 +1122,7 @@ research> exit **Goal**: In-memory session tracking and lifecycle **Tasks**: + 1. Implement `SessionManager` class 2. Integrate with `ObsidianWriter` and `SessionLoader` 3. Implement `start` command with full research execution @@ -1073,6 +1133,7 @@ research> exit **Deliverable**: Can start research sessions and track active session **Validation**: + ```bash research> start What is quantum computing? [... research executes with live progress ...] @@ -1091,6 +1152,7 @@ Cost: $0.45 **Goal**: Enable continuation and expansion **Tasks**: + 1. Implement `continue` command 2. Implement context compression from previous session 3. Implement `expand` command for worker-specific expansion @@ -1100,6 +1162,7 @@ Cost: $0.45 **Deliverable**: Can continue previous sessions with new queries **Validation**: + ```bash research> continue How does quantum computing relate to AI? [Loads v1 context, creates v2, executes research] @@ -1116,6 +1179,7 @@ research> expand --worker task_1 "Research specific quantum algorithms" **Goal**: Track and switch between multiple sessions **Tasks**: + 1. Implement session stack in `SessionManager` 2. Implement `switch` command 3. Implement `reset` command @@ -1126,6 +1190,7 @@ research> expand --worker task_1 "Research specific quantum algorithms" **Deliverable**: Can manage multiple concurrent sessions **Validation**: + ```bash research> list sessions Session ID | Version | Query @@ -1144,6 +1209,7 @@ research> continue Analyze price trends in Malmö **Goal**: Production-ready UX **Tasks**: + 1. Implement rich auto-completion with context-aware suggestions 2. Add cost estimation before starting research 3. Implement `export` command (to notebook, PDF) @@ -1155,6 +1221,7 @@ research> continue Analyze price trends in Malmö **Deliverable**: Polished, production-ready interactive CLI **Validation**: + - Auto-completion suggests session IDs when typing `switch` - Help text shows examples for each command - Keyboard shortcuts work as expected @@ -1167,6 +1234,7 @@ research> continue Analyze price trends in Malmö #### Async REPL Architecture **Key Pattern**: Nested event loops + ```python async def interactive_repl(): # Outer loop: REPL prompt @@ -1185,6 +1253,7 @@ async def interactive_repl(): #### Command Aliases **Natural Language Support**: + - `start` / `new` / `begin` - `continue` / `resume` - `exit` / `quit` / `q` @@ -1192,6 +1261,7 @@ async def interactive_repl(): - `reset` / `clear` **Implementation**: + ```python start = subparsers.add_parser('start', aliases=['new', 'begin']) ``` @@ -1199,6 +1269,7 @@ start = subparsers.add_parser('start', aliases=['new', 'begin']) #### Session State Persistence **On REPL Start**: + ```python state_file = Path.home() / ".deep_research_state" if state_file.exists(): @@ -1210,6 +1281,7 @@ if state_file.exists(): ``` **On Session Change**: + ```python state_file.write_text(json.dumps({ "last_session_id": session.session_id, @@ -1220,6 +1292,7 @@ state_file.write_text(json.dumps({ #### Auto-completion for Session IDs **Dynamic Completer**: + ```python def build_completer(manager: SessionManager) -> Completer: """Build dynamic completer with session IDs.""" @@ -1256,6 +1329,7 @@ def build_completer(manager: SessionManager) -> Completer: **Decision**: Use `prompt_toolkit` **Rationale**: + - ✅ Native async support (critical for long-running research) - ✅ Used by production tools (IPython, ptpython, AWS CLI) - ✅ Rich customization (styling, completion, history) @@ -1269,12 +1343,14 @@ def build_completer(manager: SessionManager) -> Completer: **Decision**: SessionManager maintains active session in memory, syncs with vault **Rationale**: + - Vault (Obsidian) is **slow** for querying (file I/O) - In-memory tracking enables fast `status` and `switch` commands - Vault remains single source of truth (SSOT) - memory is cache - Sync on session start/complete/switch **Alternative Considered**: Load from vault every time + - **Rejected**: Too slow, would block REPL responsiveness #### Continuation Context Size @@ -1282,16 +1358,19 @@ def build_completer(manager: SessionManager) -> Completer: **Decision**: Compress context to ~50k tokens (insights + report summary) **Rationale**: + - Full session context can be 500k-5M tokens (all workers, tool calls) - LLM context limits (200k for Claude 3.5 Sonnet) - Balance: Enough context for coherent continuation, not overwhelming - Full context available in vault for human review **Strategy**: + - Include: Original query, insights (10 max), report summary (2000 chars), worker count, source count - Exclude: Full worker outputs, all tool calls, full sources **Alternative**: Include full worker outputs + - **Rejected**: Would hit token limits with 5+ workers #### Session ID Reuse for Versions @@ -1299,12 +1378,14 @@ def build_completer(manager: SessionManager) -> Completer: **Decision**: Same session_id across versions (e.g., `session_20250121_120000_abc123`) **Rationale**: + - ✅ Clear lineage (all versions in same directory) - ✅ Easy to find related research - ✅ Graph view shows version chain - ✅ Simplifies session switching (don't need to remember v2 ID) **Alternative**: UUID per version + - **Rejected**: Loses relationship between versions, harder to navigate #### Command vs Natural Language Parsing @@ -1312,12 +1393,14 @@ def build_completer(manager: SessionManager) -> Completer: **Decision**: Structured commands with aliases (not full natural language) **Rationale**: + - ✅ Predictable, unambiguous - ✅ Faster to implement - ✅ Tab completion works - ✅ Easier to document **Natural Language**: Could add later with LLM-based intent parsing + - Example: "Continue my last session" → parse → `continue` --- @@ -1367,21 +1450,25 @@ def build_completer(manager: SessionManager) -> Completer: #### Design Patterns **Repository Pattern**: + - `SessionLoader` - Read operations - `ObsidianWriter` - Write operations - Separation of concerns **Protocol Pattern** (`ProgressCallback`): + - `NoOpProgress` - Quiet mode - `LiveProgress` - Rich display - Pluggable progress implementations **State Machine** (LangGraph): + - analyze → plan → workers → synthesize - Already supports parallel execution - Ready for REPL integration **Context Manager** (`LiveProgress`): + - Clean setup/teardown - Exception-safe resource management - Integrates with Rich `Live` @@ -1393,7 +1480,9 @@ def build_completer(manager: SessionManager) -> Completer: #### Previous Research **2025-11-15_deep-research-agent-architecture.md** (lines 850-860): + > **Interactive Mode** section describes step-by-step execution: +> > ``` > CLI: deep-research research "query" --interactive > User prompt: Continue? [Y/n/stop] after each agent step @@ -1403,6 +1492,7 @@ def build_completer(manager: SessionManager) -> Completer: This was the **original vision** for interactive mode - step-by-step control. Our REPL design extends this to full session management and continuation. **2025-11-20_obsidian-iterative-research-architecture.md**: + > "Previous research identified the need for 'session expansion' and 'iterative refinement' but didn't specify the storage mechanism. This research provides the missing piece: Obsidian as the persistence and exploration layer." The **storage problem is solved**. Now we're adding the **interaction layer**. @@ -1419,6 +1509,7 @@ Detailed plan for session versioning, expand command, recompile command. Our REP #### Key Historical Decision From 2025-11-16_alibaba-deepresearch-gap-analysis.md: + > "Human-in-the-loop decision points enhance research quality" This reinforces the value of interactive mode - not just for UX, but for **research quality** through human guidance. @@ -1430,11 +1521,13 @@ This reinforces the value of interactive mode - not just for UX, but for **resea #### High Risk **1. Async REPL Complexity** + - **Risk**: Nested event loops (prompt + research execution) can deadlock - **Mitigation**: Use `patch_stdout()`, test extensively with concurrent output - **Fallback**: Implement synchronous mode that blocks during research **2. State Synchronization** + - **Risk**: In-memory session state gets out of sync with vault - **Mitigation**: Always sync on session start/complete, add sync command - **Fallback**: Reload from vault on every command (slower but safe) @@ -1442,16 +1535,19 @@ This reinforces the value of interactive mode - not just for UX, but for **resea #### Medium Risk **3. Token Limits for Continuation** + - **Risk**: Compressed context still too large for some queries - **Mitigation**: Aggressive summarization, configurable context size - **Fallback**: Allow user to select which workers to include in context **4. User Confusion** + - **Risk**: Command syntax unclear, users struggle - **Mitigation**: Rich help text, examples, tab completion - **Fallback**: Add wizard mode for guided workflows **5. REPL Performance** + - **Risk**: Auto-completion sluggish with many sessions - **Mitigation**: Cache session list, lazy load details - **Fallback**: Disable auto-completion, use manual entry @@ -1459,11 +1555,13 @@ This reinforces the value of interactive mode - not just for UX, but for **resea #### Low Risk **6. Dependency Stability** + - **Risk**: prompt_toolkit breaking changes - **Mitigation**: Pin version, test upgrades - **Fallback**: Minimal dependencies, can fork if needed **7. History File Corruption** + - **Risk**: `.deep_research_history` gets corrupted - **Mitigation**: Graceful fallback to no history - **Fallback**: Clear history file on corruption @@ -1473,23 +1571,28 @@ This reinforces the value of interactive mode - not just for UX, but for **resea ### 11. Success Metrics **1. Command Execution Speed**: + - Metric: `status` command < 100ms - Metric: `list sessions` < 500ms - Metric: Session switch < 200ms **2. User Productivity**: + - Metric: Time to continue session < 30s (vs 2 min with CLI commands) - Metric: 80% of users prefer interactive mode in survey **3. Context Effectiveness**: + - Metric: Continuation context < 50k tokens - Metric: Continuation coherence score > 8/10 (human eval) **4. Reliability**: + - Metric: Zero crashes in 100 REPL sessions - Metric: State sync accuracy 100% (vault matches memory) **5. UX Quality**: + - Metric: First-time users complete tutorial in < 5 min - Metric: Tab completion used in 60%+ of commands @@ -1502,6 +1605,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea **Question**: Should we allow users to customize continuation context size? **Options**: + - **Fixed 50k tokens**: Simple, consistent - **Auto-adjust by complexity**: More context for complex queries - **User-configurable**: `--context-size` flag @@ -1513,6 +1617,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea **Question**: Should we support collaborative research sessions? **Options**: + - **Single-user only**: Simpler, current architecture - **Shared vault**: Multiple users, same Obsidian vault - **Cloud sync**: Real-time collaboration @@ -1524,6 +1629,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea **Question**: Should REPL warn/block expensive operations? **Options**: + - **No limits**: Trust users - **Soft warnings**: Show estimate, ask confirmation - **Hard limits**: Configurable max cost per session @@ -1535,6 +1641,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea **Question**: What export formats should we support? **Options**: + - **Markdown**: Already in vault - **Jupyter Notebook**: For analysis workflows - **PDF**: For sharing/archiving @@ -1547,6 +1654,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea **Question**: Should we auto-delete old sessions? **Options**: + - **Never delete**: Manual cleanup only - **Auto-archive**: Move old sessions to `.archive/` - **TTL-based**: Delete after N days @@ -1558,6 +1666,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea ## Code References ### Current CLI Implementation + - `deep-research-agent/src/deep_research/cli.py:68-71` - Click group entry point - `deep-research-agent/src/deep_research/cli.py:74-159` - `research` command (single-agent) - `deep-research-agent/src/deep_research/cli.py:178-239` - `multi` command (multi-agent) @@ -1567,10 +1676,12 @@ This reinforces the value of interactive mode - not just for UX, but for **resea - `deep-research-agent/src/deep_research/cli.py:552-650` - `list-workers` command ### State Management + - `deep-research-agent/src/deep_research/agent/state.py:8-58` - Core data structures (`ToolCall`, `ReActIteration`, `WorkerFullContext`, `ResearchSession`) - `deep-research-agent/src/deep_research/agent/state.py:60-93` - `ResearchSession` with versioning fields ### Orchestrator + - `deep-research-agent/src/deep_research/agent/orchestrator.py:69-136` - Research execution entry point - `deep-research-agent/src/deep_research/agent/orchestrator.py:72-85` - Session initialization - `deep-research-agent/src/deep_research/agent/orchestrator.py:138-160` - LangGraph workflow construction @@ -1578,6 +1689,7 @@ This reinforces the value of interactive mode - not just for UX, but for **resea - `deep-research-agent/src/deep_research/agent/orchestrator.py:419-445` - Full context capture from workers ### Obsidian Integration + - `deep-research-agent/src/deep_research/obsidian/writer.py:22-346` - Complete ObsidianWriter implementation - `deep-research-agent/src/deep_research/obsidian/writer.py:27-49` - Directory structure creation - `deep-research-agent/src/deep_research/obsidian/writer.py:51-77` - Session write flow @@ -1589,12 +1701,14 @@ This reinforces the value of interactive mode - not just for UX, but for **resea - `deep-research-agent/src/deep_research/obsidian/loader.py:143-203` - ReAct trace parsing ### Progress Display + - `deep-research-agent/src/deep_research/utils/live_progress.py:24-201` - LiveProgress implementation - `deep-research-agent/src/deep_research/utils/live_progress.py:14-24` - Initialization with thread-safe lock - `deep-research-agent/src/deep_research/utils/live_progress.py:26-41` - Context manager pattern - `deep-research-agent/src/deep_research/utils/live_progress.py:94-185` - Rich rendering logic ### Testing + - `deep-research-agent/tests/test_session_loader.py:82-105` - Round-trip persistence test - `deep-research-agent/tests/test_session_loader.py:266-310` - Parent session linking test @@ -1603,11 +1717,13 @@ This reinforces the value of interactive mode - not just for UX, but for **resea ## Architecture Comparison ### Current Architecture (Fire-and-Forget) + ``` User types command → Click parses → asyncio.run() → Execute research → Save to vault → Exit ``` ### Proposed Architecture (Interactive REPL) + ``` Launch REPL → prompt_toolkit loop ↓ @@ -1637,6 +1753,7 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: 5. ✅ **Rich progress display** - Production-grade terminal UI **What's needed**: + 1. ❌ REPL loop with `prompt_toolkit` 2. ❌ SessionManager for in-memory state 3. ❌ CommandParser for natural language @@ -1645,6 +1762,7 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: **Implementation is straightforward** because the hard parts (state management, persistence, execution) are **already implemented**. The REPL is a thin layer on top. **Critical Success Factors**: + - ✅ prompt_toolkit for rich async REPL experience - ✅ SessionManager for fast session switching - ✅ Context compression for continuation (50k tokens) @@ -1658,6 +1776,7 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: ## References ### Internal Codebase + - `deep-research-agent/src/deep_research/cli.py` - Current CLI implementation - `deep-research-agent/src/deep_research/agent/orchestrator.py` - Research execution engine - `deep-research-agent/src/deep_research/agent/state.py` - Data structures @@ -1667,6 +1786,7 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: ### External Documentation **prompt_toolkit**: + - Official Documentation: https://python-prompt-toolkit.readthedocs.io/en/master/ - SQLite REPL Tutorial: https://python-prompt-toolkit.readthedocs.io/en/master/pages/tutorials/repl.html - Asyncio Support: https://python-prompt-toolkit.readthedocs.io/en/master/pages/advanced_topics/asyncio.html @@ -1674,17 +1794,20 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: - Async Example: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/main/examples/prompts/asyncio-prompt.py **REPL Frameworks**: + - cmd2 Documentation: https://cmd2.readthedocs.io/ - cmd2 GitHub: https://github.com/python-cmd2/cmd2 - cmd2 Alternatives Comparison: https://cmd2.readthedocs.io/en/0.9.0/alternatives.html **Async REPL Patterns**: + - aiomonitor Documentation: https://aiomonitor.aio-libs.org/en/latest/tutorial/ - aiomonitor GitHub: https://github.com/aio-libs/aiomonitor - IPython Autoawait: https://ipython.readthedocs.io/en/stable/interactive/autoawait.html - Jupyter Architecture: https://docs.jupyter.org/en/latest/projects/architecture/content-architecture.html **Command Parsing**: + - shlex Documentation: https://docs.python.org/3/library/shlex.html - argparse Documentation: https://docs.python.org/3/library/argparse.html - REPL + argparse Pattern: https://gist.github.com/benkehoe/2e6a08b385e3385f8a54805c99914c75 @@ -1692,10 +1815,12 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: - Stack Overflow Discussion: https://stackoverflow.com/questions/69062838/python-library-for-repl-and-cli-argument-parsing **Progress Display**: + - Rich Progress: https://rich.readthedocs.io/en/stable/progress.html - tqdm with asyncio: https://towardsdatascience.com/using-tqdm-with-asyncio-in-python-5c0f6e747d55 **Production Examples**: + - ptpython GitHub: https://github.com/prompt-toolkit/ptpython - ptpython Basic Embed: https://github.com/prompt-toolkit/ptpython/blob/main/examples/python-embed.py - ptpython Asyncio Embed: https://github.com/prompt-toolkit/ptpython/blob/main/examples/asyncio-python-embed.py @@ -1703,12 +1828,14 @@ The deep-research agent has a **strong foundation** for interactive CLI mode: - aws-cli-repl (Performance-Optimized): https://github.com/janakaud/aws-cli-repl **Additional Resources**: + - 4 Python Libraries for CLIs: https://opensource.com/article/17/5/4-practical-python-libraries - Python asyncio Documentation: https://docs.python.org/3/library/asyncio.html - Writing an async REPL (blog): https://carreau.github.io/posts/27-Writing-an-async-REPL---Part-1.ipynb/ - Python readline module: https://docs.python.org/3/library/readline.html ### Historical Research Documents + - `thoughts/shared/research/2025-11-20_obsidian-iterative-research-architecture.md` - Obsidian-based iterative research architecture - `thoughts/shared/research/2025-11-15_deep-research-agent-architecture.md` - Original deep research agent architecture with interactive mode vision (lines 850-860) - `thoughts/shared/research/2025-11-15_multi-agent-deep-research-architecture-v2.md` - Multi-agent architecture v2 diff --git a/thoughts/shared/research/2025-11-22_go-deep-research-agent-architecture.md b/thoughts/shared/research/2025-11-22_go-deep-research-agent-architecture.md index 0c919df..309301f 100644 --- a/thoughts/shared/research/2025-11-22_go-deep-research-agent-architecture.md +++ b/thoughts/shared/research/2025-11-22_go-deep-research-agent-architecture.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 86dca03ec2e572219e7ffd1612e60a4aae8331ef branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Go Deep Research Agent Architecture" +topic: 'Go Deep Research Agent Architecture' tags: [architecture, golang, deep-research, interactive-cli, multi-agent] status: complete last_updated: 2025-11-22 @@ -34,14 +34,14 @@ This document defines the architecture for a Go-based deep research agent that r ## Core Requirements -| Requirement | Description | -|-------------|-------------| -| Fast Research | Single worker, quick turnaround for simple queries | -| Deep Research | Multi-worker parallel execution for complex queries | -| Obsidian Integration | Markdown vault with YAML frontmatter, linked notes | -| Session Management | Persist, load, continue, expand sessions | -| Interactive Mode | REPL with `/commands`, natural follow-ups route to expand | -| Streaming Output | Real-time feedback as workers progress | +| Requirement | Description | +| -------------------- | --------------------------------------------------------- | +| Fast Research | Single worker, quick turnaround for simple queries | +| Deep Research | Multi-worker parallel execution for complex queries | +| Obsidian Integration | Markdown vault with YAML frontmatter, linked notes | +| Session Management | Persist, load, continue, expand sessions | +| Interactive Mode | REPL with `/commands`, natural follow-ups route to expand | +| Streaming Output | Real-time feedback as workers progress | --- @@ -2010,21 +2010,21 @@ func getEnvOrDefault(key, def string) string { ## Command Reference -| Command | Alias | Description | -|---------|-------|-------------| -| `/fast ` | `/f` | Single-worker quick research | -| `/deep ` | `/d` | Multi-worker deep research | -| `/expand ` | `/e` | Expand on current session | -| `/sessions` | `/s` | List all saved sessions | -| `/load ` | `/l` | Load a specific session | -| `/workers` | `/w` | Show workers in current session | -| `/rerun ` | `/r` | Re-run worker n | -| `/recompile [text]` | `/rc` | Recompile report with optional instructions | -| `/model` | | Show/change current model | -| `/verbose` | `/v` | Toggle verbose output | -| `/help` | `/h`, `/?` | Show help | -| `/quit` | `/q` | Exit REPL | -| `` | | Follow-up question (routes to expand) | +| Command | Alias | Description | +| ------------------- | ---------- | ------------------------------------------- | +| `/fast ` | `/f` | Single-worker quick research | +| `/deep ` | `/d` | Multi-worker deep research | +| `/expand ` | `/e` | Expand on current session | +| `/sessions` | `/s` | List all saved sessions | +| `/load ` | `/l` | Load a specific session | +| `/workers` | `/w` | Show workers in current session | +| `/rerun ` | `/r` | Re-run worker n | +| `/recompile [text]` | `/rc` | Recompile report with optional instructions | +| `/model` | | Show/change current model | +| `/verbose` | `/v` | Toggle verbose output | +| `/help` | `/h`, `/?` | Show help | +| `/quit` | `/q` | Exit REPL | +| `` | | Follow-up question (routes to expand) | --- @@ -2109,6 +2109,7 @@ func getEnvOrDefault(key, def string) string { ## Implementation Phases ### Phase 1: Core Infrastructure + - [ ] Project structure and go.mod - [ ] Configuration loading - [ ] LLM client (OpenRouter + Alibaba model) @@ -2116,6 +2117,7 @@ func getEnvOrDefault(key, def string) string { - [ ] Basic terminal renderer ### Phase 2: Agent & Tools + - [ ] Search tool (Brave API) - [ ] Fetch tool (web scraping) - [ ] Tool registry @@ -2123,6 +2125,7 @@ func getEnvOrDefault(key, def string) string { - [ ] Answer detection and extraction ### Phase 3: Orchestration + - [ ] Query complexity analysis - [ ] Task decomposition planner - [ ] Worker pool with goroutines @@ -2130,6 +2133,7 @@ func getEnvOrDefault(key, def string) string { - [ ] Cost tracking ### Phase 4: Session Management + - [ ] Session data structures - [ ] In-memory store - [ ] JSON persistence @@ -2137,6 +2141,7 @@ func getEnvOrDefault(key, def string) string { - [ ] Context building for continuation ### Phase 5: Obsidian Integration + - [ ] Vault directory structure - [ ] Worker markdown files - [ ] Report files with versions @@ -2144,6 +2149,7 @@ func getEnvOrDefault(key, def string) string { - [ ] YAML frontmatter ### Phase 6: Interactive REPL + - [ ] Readline integration - [ ] Command parser - [ ] Router (command vs text) @@ -2152,6 +2158,7 @@ func getEnvOrDefault(key, def string) string { - [ ] Session restore on startup ### Phase 7: Polish + - [ ] Streaming output - [ ] Progress indicators - [ ] Error handling @@ -2183,17 +2190,22 @@ Minimal dependencies - no heavy frameworks. The LLM client uses stdlib `net/http ## Open Questions 1. **Streaming vs Batched**: Should final synthesis stream to terminal or batch? - - Streaming. Try to use streaming APIs when interacting with LLMs, I want it to feel responsive to users. + +- Streaming. Try to use streaming APIs when interacting with LLMs, I want it to feel responsive to users. + 2. **Insight Extraction**: Keep separate LLM call (Python uses Claude) or same model? - - No, same model for everything. but centralize the configuration of what model I use for what parts. so it is easy to change. but same model (alibaba/tongyi-deepresearch-30b-a3b) for everything now for simplicity. + +- No, same model for everything. but centralize the configuration of what model I use for what parts. so it is easy to change. but same model (alibaba/tongyi-deepresearch-30b-a3b) for everything now for simplicity. + 3. **Search Provider**: Brave only, or add fallback to DuckDuckGo? - - Brave only + +- Brave only + 4. **State File Format**: JSON sufficient, or use SQLite for complex queries? - - JSON to begin with. +- JSON to begin with. -ALSO, no fallback logic whatsoever. fallback logic is strictly forbidding. IMPORTANT! NO FALLBACK LOGIC EVER! ---- +## ALSO, no fallback logic whatsoever. fallback logic is strictly forbidding. IMPORTANT! NO FALLBACK LOGIC EVER! ## References diff --git a/thoughts/shared/research/2025-12-01_event-sourced-storage-architecture.md b/thoughts/shared/research/2025-12-01_event-sourced-storage-architecture.md index d676471..6a9d307 100644 --- a/thoughts/shared/research/2025-12-01_event-sourced-storage-architecture.md +++ b/thoughts/shared/research/2025-12-01_event-sourced-storage-architecture.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 389794cff579752d9f38f5df80b0da22ab1c6e24 branch: feat/custom-deep-research repository: addcommitpush.io -topic: "Event-Sourced Adapter-Based Storage Architecture for Interruptible Agents" +topic: 'Event-Sourced Adapter-Based Storage Architecture for Interruptible Agents' tags: [research, architecture, event-sourcing, storage, adapters, interruptible-agents, go-research] status: complete last_updated: 2025-12-01 @@ -22,6 +22,7 @@ last_updated_by: Claude ## Research Question How to move storage to an adapter-based system that: + 1. Mirrors state with extra information (metadata, timestamps, audit trail) 2. Keeps "memory" that can be paused and restored from any point (interruptible agent) 3. Allows the agent to pick up work where it left off @@ -46,23 +47,23 @@ The architecture uses a **port/adapter pattern** for pluggable storage backends, ### 1.1 Current Architecture Problems -| Problem | Current State | Impact | -|---------|--------------|--------| -| **No State Persistence** | State is local variables in `Research()` | Cannot resume interrupted research | -| **No Event Log** | Events are fire-and-forget | Cannot replay or audit state changes | -| **Direct State Mutation** | `plan.DAG.SetStatus()` mutates directly | No history, no undo capability | -| **Session-Centric Storage** | Only saves complete sessions | Partial progress lost on failure | -| **Tight Storage Coupling** | JSON filesystem hardcoded | Cannot swap backends easily | +| Problem | Current State | Impact | +| --------------------------- | ---------------------------------------- | ------------------------------------ | +| **No State Persistence** | State is local variables in `Research()` | Cannot resume interrupted research | +| **No Event Log** | Events are fire-and-forget | Cannot replay or audit state changes | +| **Direct State Mutation** | `plan.DAG.SetStatus()` mutates directly | No history, no undo capability | +| **Session-Centric Storage** | Only saves complete sessions | Partial progress lost on failure | +| **Tight Storage Coupling** | JSON filesystem hardcoded | Cannot swap backends easily | ### 1.2 Key Files to Transform -| File | Current Role | New Role | -|------|--------------|----------| -| `internal/events/bus.go` | Fire-and-forget pub/sub | Event bus + persistence trigger | -| `internal/events/types.go` | UI progress events | Domain events (state changes) | -| `internal/session/session.go` | Domain + storage conflated | Pure domain aggregate | -| `internal/session/store.go` | Direct JSON persistence | Event store + projection | -| `internal/orchestrator/deep.go` | Stateless coordinator | State machine with event sourcing | +| File | Current Role | New Role | +| ------------------------------- | -------------------------- | --------------------------------- | +| `internal/events/bus.go` | Fire-and-forget pub/sub | Event bus + persistence trigger | +| `internal/events/types.go` | UI progress events | Domain events (state changes) | +| `internal/session/session.go` | Domain + storage conflated | Pure domain aggregate | +| `internal/session/store.go` | Direct JSON persistence | Event store + projection | +| `internal/orchestrator/deep.go` | Stateless coordinator | State machine with event sourcing | --- @@ -180,10 +181,10 @@ The architecture uses a **port/adapter pattern** for pluggable storage backends, The system needs two categories of events: -| Category | Purpose | Persistence | Examples | -|----------|---------|-------------|----------| -| **Domain Events** | State changes (facts) | YES - Event Store | `ResearchStarted`, `WorkerCompleted`, `ReportGenerated` | -| **Progress Events** | UI updates (ephemeral) | NO - Fire-and-forget | `LLMChunk`, `ToolCall`, `Progress` | +| Category | Purpose | Persistence | Examples | +| ------------------- | ---------------------- | -------------------- | ------------------------------------------------------- | +| **Domain Events** | State changes (facts) | YES - Event Store | `ResearchStarted`, `WorkerCompleted`, `ReportGenerated` | +| **Progress Events** | UI updates (ephemeral) | NO - Fire-and-forget | `LLMChunk`, `ToolCall`, `Progress` | ### 3.2 Domain Event Definitions @@ -2294,23 +2295,23 @@ func main() { ### Benefits -| Benefit | How It's Achieved | -|---------|-------------------| -| **Interruptibility** | State persisted after every event | -| **Resumability** | Replay events to reconstruct state | -| **Audit Trail** | Every change is an immutable event | -| **Pluggable Storage** | Port/adapter pattern for backends | -| **Time Travel** | Replay to any point in history | -| **Multiple Projections** | Same events → Obsidian, DB, API | +| Benefit | How It's Achieved | +| ------------------------ | ---------------------------------- | +| **Interruptibility** | State persisted after every event | +| **Resumability** | Replay events to reconstruct state | +| **Audit Trail** | Every change is an immutable event | +| **Pluggable Storage** | Port/adapter pattern for backends | +| **Time Travel** | Replay to any point in history | +| **Multiple Projections** | Same events → Obsidian, DB, API | ### Trade-offs -| Trade-off | Mitigation | -|-----------|------------| -| Storage overhead | Snapshots reduce replay cost | -| Complexity | Clear command/event separation | +| Trade-off | Mitigation | +| -------------------- | ------------------------------------- | +| Storage overhead | Snapshots reduce replay cost | +| Complexity | Clear command/event separation | | Eventual consistency | Inline projections for critical paths | -| Event versioning | Schema evolution strategy needed | +| Event versioning | Schema evolution strategy needed | --- @@ -2333,6 +2334,7 @@ func main() { ## Sources ### Event Sourcing References + - [Event Sourcing pattern - Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing) - [Event Sourcing pattern - AWS Prescriptive Guidance](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/event-sourcing.html) - [Event Sourcing - Martin Fowler](https://martinfowler.com/eaaDev/EventSourcing.html) @@ -2340,10 +2342,12 @@ func main() { - [Snapshots in Event Sourcing - Kurrent](https://www.kurrent.io/blog/snapshots-in-event-sourcing) ### Domain Events References + - [Domain Events vs. Event Sourcing - INNOQ](https://www.innoq.com/en/blog/2019/01/domain-events-versus-event-sourcing/) - [Domain Events vs. Integration Events - Cesar de la Torre](https://devblogs.microsoft.com/cesardelatorre/domain-events-vs-integration-events-in-domain-driven-design-and-microservices-architectures/) ### Go Implementation References + - [hallgren/eventsourcing - GitHub](https://github.com/hallgren/eventsourcing) - [Event Sourcing in Go: From Zero to Production - Serge Skoredin](https://skoredin.pro/blog/golang/event-sourcing-go) - [Simplifying Event Sourcing in Golang - TheFabric.IO](https://www.thefabric.io/blog/simplifying-event-sourcing-in-golang) @@ -2351,6 +2355,7 @@ func main() { - [Implementing pluggable backends in Go - Justin Azoff](https://justin.azoff.dev/blog/implementing-pluggable-backends-in-go/) ### Patterns and Best Practices + - [CQRS Best Practices - GitHub](https://github.com/slashdotdash/cqrs-best-practices) - [Guide to Projections and Read Models - Event-Driven.io](https://event-driven.io/en/projections_and_read_models_in_event_driven_architecture/) - [Saga Pattern in Distributed Transactions - Rost Glukhov](https://www.glukhov.org/post/2025/11/saga-transactions-in-microservices/) diff --git a/thoughts/shared/research/2025-12-03_09-45-00_thinkdepth-architecture.md b/thoughts/shared/research/2025-12-03_09-45-00_thinkdepth-architecture.md index 1bcb353..4ecb696 100644 --- a/thoughts/shared/research/2025-12-03_09-45-00_thinkdepth-architecture.md +++ b/thoughts/shared/research/2025-12-03_09-45-00_thinkdepth-architecture.md @@ -4,12 +4,12 @@ researcher: Claude git_commit: 7a0d7034c05fc3e2dd0010ea7c396615afe9d632 branch: main repository: go-research -topic: "ThinkDepth.ai Deep Research Architecture Analysis and Implementation Plan" +topic: 'ThinkDepth.ai Deep Research Architecture Analysis and Implementation Plan' tags: [research, thinkdepth, deep-research, agentic-ai, architecture, cli-visualization] status: complete last_updated: 2025-12-03 last_updated_by: Claude -last_updated_note: "Added comprehensive CLI visualization design section with event types, display components, and example output" +last_updated_note: 'Added comprehensive CLI visualization design section with event types, display components, and example output' --- # Research: ThinkDepth.ai Deep Research Architecture Analysis and Implementation Plan @@ -72,6 +72,7 @@ ThinkDepth uses a multi-agent supervisor pattern with LangGraph: ### 2. Key Components #### 2.1 Supervisor Agent (Lead Researcher) + - **File**: `multi_agent_supervisor.py` - **Model Used**: `openai:gpt-5` (in original) - **Our Model**: `alibaba/tongyi-deepresearch-30b-a3b` @@ -82,6 +83,7 @@ ThinkDepth uses a multi-agent supervisor pattern with LangGraph: - `think_tool` - Strategic reflection #### 2.2 Research Sub-Agent + - **File**: `research_agent.py` - **Model Used**: `openai:gpt-5` (in original) - **Tools Available**: @@ -89,6 +91,7 @@ ThinkDepth uses a multi-agent supervisor pattern with LangGraph: - `think_tool` - Reflection after each search #### 2.3 State Management + - `SupervisorState`: supervisor_messages, research_brief, notes, research_iterations, raw_notes, draft_report - `ResearcherState`: researcher_messages, tool_call_iterations, research_topic, compressed_research, raw_notes - `AgentState`: messages, research_brief, supervisor_messages, raw_notes, notes, draft_report, final_report @@ -110,11 +113,13 @@ Key insight: **Never complete based on draft report looking good** - always veri ### 4. Self-Balancing Rules #### 4.1 Insightfulness Rules (Applied in Final Report) + - Granular breakdown of topics with specific causes/impacts - Detailed mapping tables connecting relationships - Nuanced exploration with explicit discussion #### 4.2 Helpfulness Rules (Applied in Final Report) + - Direct user intent satisfaction - Fluent, coherent logical structure - Factual accuracy @@ -125,6 +130,7 @@ Key insight: **Never complete based on draft report looking good** - always veri ### 5. Context Compression Strategy #### 5.1 Research Compression (`compress_research_system_prompt`) + - Preserve ALL information from tool calls verbatim - Clean up but don't summarize - Include inline citations for each source @@ -135,6 +141,7 @@ Key insight: **Never complete based on draft report looking good** - always veri - List of All Relevant Sources #### 5.2 Draft Report as Context + - Acts as "dynamic context" that guides subsequent research - Similar to "gradually adding features to an initial prototype" - Mitigates context pollution, distraction, confusion, and clash @@ -142,6 +149,7 @@ Key insight: **Never complete based on draft report looking good** - always veri ### 6. Tool Implementations #### 6.1 tavily_search Tool + ```python # Executes search → deduplicates by URL → summarizes raw content → formats output @tool @@ -153,6 +161,7 @@ def tavily_search(query: str, max_results: int = 3, topic: str = "general") -> s ``` #### 6.2 think_tool + ```python @tool def think_tool(reflection: str) -> str: @@ -161,6 +170,7 @@ def think_tool(reflection: str) -> str: ``` #### 6.3 refine_draft_report Tool + ```python @tool def refine_draft_report(research_brief: str, findings: str, draft_report: str) -> str: @@ -172,6 +182,7 @@ def refine_draft_report(research_brief: str, findings: str, draft_report: str) - ### 7. Configuration Constants From the original implementation: + - `max_researcher_iterations = 15` (tool calls per sub-agent) - `max_concurrent_researchers = 3` (parallel sub-agents) - `MAX_CONTEXT_LENGTH = 250000` (for webpage summarization) @@ -179,12 +190,14 @@ From the original implementation: ### 8. Key Prompts (Detailed Analysis) #### 8.1 Lead Researcher Prompt (`lead_researcher_with_multiple_steps_diffusion_double_check_prompt`) + - Emphasizes diffusion algorithm - Critical: "CompleteResearch only based on findings' completeness, not draft report" - Always run diverse research questions to verify comprehensiveness - Use parallel ConductResearch for multi-faceted questions #### 8.2 Research Agent Prompt (`research_agent_prompt`) + - Hard limits: 2-3 searches for simple, up to 5 for complex - Stop immediately when: - Can answer comprehensively @@ -192,6 +205,7 @@ From the original implementation: - Last 2 searches returned similar information #### 8.3 Final Report Prompt (`final_report_generation_with_helpfulness_insightfulness_hit_citation_prompt`) + - Applies both Insightfulness and Helpfulness rules - Flexible section structure (comparison, list, overview, etc.) - Strict citation rules with sequential numbering @@ -199,6 +213,7 @@ From the original implementation: ## Implementation Plan for Go ### File Structure + ``` internal/architectures/think_deep/ ├── README.md # Architecture documentation @@ -1107,17 +1122,19 @@ func init() { 5. **Parallel Sub-Agents**: Up to 3 concurrent sub-researchers for independent topics ### Model Mapping -| Original Model | Our Model | -|----------------|-----------| + +| Original Model | Our Model | +| -------------- | ------------------------------------- | | `openai:gpt-5` | `alibaba/tongyi-deepresearch-30b-a3b` | -| Tavily Search | Brave Search (existing) | +| Tavily Search | Brave Search (existing) | ### Configuration Mapping -| Original | Our Go Implementation | -|----------|----------------------| + +| Original | Our Go Implementation | +| -------------------------------- | ------------------------------ | | `max_researcher_iterations = 15` | `MaxSupervisorIterations = 15` | -| `max_concurrent_researchers = 3` | `MaxConcurrentResearch = 3` | -| `MAX_CONTEXT_LENGTH = 250000` | (handled by model) | +| `max_concurrent_researchers = 3` | `MaxConcurrentResearch = 3` | +| `MAX_CONTEXT_LENGTH = 250000` | (handled by model) | ## Open Questions @@ -1533,15 +1550,15 @@ When running ThinkDeep research, the user sees: ### Color Scheme -| Phase | Color | Icon | -|-------|-------|------| -| Brief | Blue (HiBlue) | 📋 | -| Draft | Yellow (HiYellow) | 📝 | -| Diffuse | Magenta (HiMagenta) | 🔄 | -| Sub-research | Yellow/Cyan | 🔍💭📦 | -| Refinement | Cyan | ✏️ | -| Final | Green (HiGreen) | ✓ | -| Thinking | Dim | 💭 | +| Phase | Color | Icon | +| ------------ | ------------------- | ------ | +| Brief | Blue (HiBlue) | 📋 | +| Draft | Yellow (HiYellow) | 📝 | +| Diffuse | Magenta (HiMagenta) | 🔄 | +| Sub-research | Yellow/Cyan | 🔍💭📦 | +| Refinement | Cyan | ✏️ | +| Final | Green (HiGreen) | ✓ | +| Thinking | Dim | 💭 | ## References diff --git a/thoughts/shared/research/2025-12-03_interactive-cli-agentic-research.md b/thoughts/shared/research/2025-12-03_interactive-cli-agentic-research.md index 86ce925..fb446e5 100644 --- a/thoughts/shared/research/2025-12-03_interactive-cli-agentic-research.md +++ b/thoughts/shared/research/2025-12-03_interactive-cli-agentic-research.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 6a32cb5cc41e10a32999f565d10ca639bbecc06c branch: main repository: addcommitpush.io/go-research -topic: "Interactive CLI Agentic Research Experience" +topic: 'Interactive CLI Agentic Research Experience' tags: [research, cli, interactive, obsidian, think_deep, storm, agents] status: complete last_updated: 2025-12-03 @@ -22,6 +22,7 @@ last_updated_by: Claude ## Research Question Design an interactive CLI experience for deep research where: + 1. Users can invoke different research modes (fast, storm, think_deep) 2. Sessions maintain context about written reports (outputs only, not full agent context) 3. Smart mode selection based on query complexity @@ -32,12 +33,14 @@ Design an interactive CLI experience for deep research where: ## Summary The current go-research CLI has strong foundations for an interactive agentic experience. The architecture supports: + - Session management with versioning and parent tracking - Event-driven visualization of research progress - Existing `/expand` handler for follow-up queries - Obsidian integration for persistence (though sub-insights not yet saved) The proposed "Claude Code-style" interactive experience requires: + 1. A **Chat Router** that intelligently routes queries to appropriate agents 2. **Session Context Manager** that maintains report summaries without full agent state 3. **Expand Knowledge Pipeline** with injection points into each agent architecture @@ -48,18 +51,21 @@ The proposed "Claude Code-style" interactive experience requires: ### 1. Current Architecture Analysis #### Entry Points (`cmd/research/main.go:51-52`) + ```go vaultWriter := obsidian.NewWriter(cfg.VaultPath) store.SetVaultWriter(vaultWriter) ``` The CLI initializes with: + - Session store (filesystem-based JSON) - Event bus for real-time visualization - Obsidian vault writer for human-readable output - REPL with command router #### Router Intelligence (`internal/repl/router.go:59-74`) + ```go // Natural language: if session exists, expand; otherwise start storm research if r.ctx.Session != nil { @@ -72,6 +78,7 @@ return handler, []string{parsed.RawText}, nil ``` **Current behavior**: + - Commands (`/fast`, `/deep`, `/expand`) → explicit routing - Natural language WITH session → `/expand` handler - Natural language WITHOUT session → `/storm` handler @@ -79,6 +86,7 @@ return handler, []string{parsed.RawText}, nil **Gap**: No smart mode selection or chat/QA detection. #### Expand Handler (`internal/repl/handlers/expand.go:32-55`) + ```go // Build continuation context from previous session continuationCtx := session.BuildContinuationContext(ctx.Session) @@ -93,6 +101,7 @@ newSess := ctx.Session.NewVersion() ``` **Current behavior**: + - Builds context from previous session (report + sources) - Creates versioned session with parent link - Runs research with injected context @@ -104,15 +113,16 @@ newSess := ctx.Session.NewVersion() #### ThinkDeep Injection Points (`internal/orchestrator/think_deep.go`) -| Stage | Line | Injection Opportunity | -|-------|------|----------------------| -| Research Brief | 264 | Inject domain knowledge, previous findings | -| Initial Draft | 284 | Inject existing report as baseline | -| Supervisor Context | `supervisor.go:209` | Add `` section | -| Sub-Researcher | `sub_researcher.go:102` | Inject visited URLs, known facts | -| Final Report | 312 | Add style guidelines, structure template | +| Stage | Line | Injection Opportunity | +| ------------------ | ----------------------- | ------------------------------------------ | +| Research Brief | 264 | Inject domain knowledge, previous findings | +| Initial Draft | 284 | Inject existing report as baseline | +| Supervisor Context | `supervisor.go:209` | Add `` section | +| Sub-Researcher | `sub_researcher.go:102` | Inject visited URLs, known facts | +| Final Report | 312 | Add style guidelines, structure template | **Key insight**: ThinkDeep's `SupervisorState` already tracks: + - `Notes []string` - compressed findings - `RawNotes []string` - raw search results - `VisitedURLs map[string]bool` - deduplication @@ -121,14 +131,15 @@ These can be pre-populated for "expand" workflows. #### STORM Injection Points (`internal/orchestrator/deep_storm.go`) -| Stage | Line | Injection Opportunity | -|-------|------|----------------------| -| Perspective Discovery | 124 | Inject known perspectives, skip survey | -| Conversation Simulation | 159 | Inject previous conversations as context | -| Cross-Validation | 192 | Inject validated facts from previous run | -| Synthesis | 230 | Inject previous outline, sections | +| Stage | Line | Injection Opportunity | +| ----------------------- | ---- | ---------------------------------------- | +| Perspective Discovery | 124 | Inject known perspectives, skip survey | +| Conversation Simulation | 159 | Inject previous conversations as context | +| Cross-Validation | 192 | Inject validated facts from previous run | +| Synthesis | 230 | Inject previous outline, sections | **Key insight**: STORM produces rich intermediate artifacts: + - `[]Perspective` - expert viewpoints - `map[string]*ConversationResult` - full dialogue transcripts - `*AnalysisResult` - validated facts, contradictions, gaps @@ -169,6 +180,7 @@ const ( ``` **Classification Logic**: + 1. No session → `IntentResearch` 2. Question about report content → `IntentQuestion` 3. "Expand on X", "Tell me more about Y" → `IntentExpand` @@ -302,6 +314,7 @@ type InjectionContext struct { ``` **ThinkDeep Expansion Flow**: + ```go func (h *ExpandKnowledgeHandler) expandWithThinkDeep( ctx *repl.Context, @@ -338,6 +351,7 @@ func (h *ExpandKnowledgeHandler) expandWithThinkDeep( Current gap: `internal/obsidian/writer.go` creates `insights/` directory but never populates it. **Proposed structure**: + ``` / └── / @@ -505,33 +519,39 @@ research> /fast Who invented transformers? ### 6. Implementation Roadmap #### Phase 1: Sub-Insight Capture (think_deep) + - [ ] Add `SubInsights []SubInsight` to `SupervisorState` - [ ] Capture insights in `executeParallelResearch` - [ ] Return insights in `ThinkDeepResult` - [ ] Update Obsidian writer to save insights #### Phase 2: Session Context Manager + - [ ] Create `SessionContext` struct - [ ] Implement `ExtractContext(session) SessionContext` - [ ] Add context to session store #### Phase 3: Question Handler + - [ ] Create `QuestionHandler` - [ ] Implement `buildQAContext` - [ ] Add expansion suggestion logic #### Phase 4: Smart Router + - [ ] Create `QueryClassifier` interface - [ ] Implement LLM-based classifier - [ ] Update `Router` to use classifier #### Phase 5: Expand Knowledge Pipeline + - [ ] Create `InjectionContext` struct - [ ] Implement ThinkDeep injection options - [ ] Implement STORM injection options - [ ] Create merge logic for expanded sessions #### Phase 6: Enhanced Obsidian + - [ ] Implement research-notes structure - [ ] Add source index with quality scores - [ ] Create bi-directional links between insights diff --git a/thoughts/shared/research/2025-12-03_think-deep-data-tools.md b/thoughts/shared/research/2025-12-03_think-deep-data-tools.md index 421a865..d3beb49 100644 --- a/thoughts/shared/research/2025-12-03_think-deep-data-tools.md +++ b/thoughts/shared/research/2025-12-03_think-deep-data-tools.md @@ -4,7 +4,7 @@ researcher: Claude git_commit: 6a32cb5cc41e10a32999f565d10ca639bbecc06c branch: main repository: addcommitpush.io/go-research -topic: "Data Analysis and File Reading Tools for ThinkDeep Agent" +topic: 'Data Analysis and File Reading Tools for ThinkDeep Agent' tags: [research, think_deep, tools, data_analysis, eda, pdf, csv, pickle] status: complete last_updated: 2025-12-03 @@ -39,6 +39,7 @@ The design should mirror the existing `search` → `ContentSummarizer` pattern w ### 1. Current Tool Architecture **Tool Interface** (`internal/tools/registry.go:9-13`): + ```go type Tool interface { Name() string @@ -48,6 +49,7 @@ type Tool interface { ``` **ToolExecutor Interface** (`internal/tools/registry.go:15-19`): + ```go type ToolExecutor interface { Execute(ctx context.Context, name string, args map[string]interface{}) (string, error) @@ -56,6 +58,7 @@ type ToolExecutor interface { ``` **Key Pattern - Search with Optional Summarization** (`internal/tools/search.go`): + - `SearchTool` performs basic web search - Optional `ContentSummarizer` enhances results with LLM-generated summaries - This pattern is directly applicable to data analysis tools @@ -63,6 +66,7 @@ type ToolExecutor interface { ### 2. Sub-Researcher Agent Pattern The `SubResearcherAgent` (`internal/agents/sub_researcher.go:23-29`) demonstrates how to create a focused agent that: + - Has access to specific tools (search, think) - Executes an iterative loop with hard limits - Compresses findings for the supervisor @@ -75,6 +79,7 @@ This pattern can be adapted for a `DataAnalysisAgent` sub-researcher. #### 3.1 Data Analysis Tools **CSVAnalysisTool** - For CSV/tabular data: + ```go // internal/tools/csv_analysis.go @@ -101,6 +106,7 @@ func (t *CSVAnalysisTool) Execute(ctx context.Context, args map[string]interface ``` **PickleAnalysisTool** - For Python pickle files: + ```go // internal/tools/pickle_analysis.go @@ -123,6 +129,7 @@ func (t *PickleAnalysisTool) Execute(ctx context.Context, args map[string]interf ``` **GoalDirectedEDATool** - High-level EDA orchestrator: + ```go // internal/tools/eda.go @@ -142,6 +149,7 @@ Args: {"path": "/path/to/data", "goal": "research question or hypothesis to expl #### 3.2 Document Reading Tools **PDFReadTool** - For PDF documents: + ```go // internal/tools/pdf.go @@ -165,6 +173,7 @@ func (t *PDFReadTool) Execute(ctx context.Context, args map[string]interface{}) ``` **DOCXReadTool** - For Word documents: + ```go // internal/tools/docx.go @@ -180,6 +189,7 @@ func (t *DOCXReadTool) Description() string { ``` **PPTXReadTool** - For PowerPoint: + ```go // internal/tools/pptx.go @@ -195,6 +205,7 @@ func (t *PPTXReadTool) Description() string { ``` **GenericDocumentReadTool** - Auto-detect format: + ```go // internal/tools/document.go @@ -259,6 +270,7 @@ func (a *DataAnalystAgent) Analyze(ctx context.Context, dataPath string, goal st ``` **Data Analyst Prompt** (new file: `internal/think_deep/data_prompts.go`): + ```go func DataAnalystPrompt(date, dataDescription string) string { return fmt.Sprintf(`You are a data analyst conducting exploratory data analysis. Today is %s. @@ -344,6 +356,7 @@ Args: {"data_path": "/path/to/data", "goal": "analysis objective"}` ``` Supervisor prompt update (`internal/think_deep/prompts.go`): + ```go // Add to LeadResearcherPrompt: @@ -359,28 +372,34 @@ Supervisor prompt update (`internal/think_deep/prompts.go`): For implementing these tools in Go: **CSV Processing:** + - `encoding/csv` (stdlib) - Basic CSV reading - `github.com/go-gota/gota` - DataFrame operations, statistics **Pickle Files:** + - Python subprocess approach (safest) - `github.com/nlpodyssey/spago` has some pickle support **PDF Extraction:** + - `github.com/pdfcpu/pdfcpu` - Pure Go, good extraction - `github.com/unidoc/unipdf` - Commercial, more features **Office Documents:** + - `github.com/unidoc/unioffice` - DOCX, XLSX, PPTX - `github.com/nguyenthenguyen/docx` - Simpler DOCX-only **Statistics:** + - `gonum.org/v1/gonum/stat` - Statistical functions - `github.com/montanaflynn/stats` - Descriptive statistics ### 7. Implementation Roadmap **Phase 1: Document Reading Tools** + - [ ] Implement `PDFReadTool` with pdfcpu - [ ] Implement `DOCXReadTool` with unioffice - [ ] Implement `PPTXReadTool` with unioffice @@ -388,22 +407,26 @@ For implementing these tools in Go: - [ ] Add to SubResearcherToolRegistry **Phase 2: Basic Data Analysis Tools** + - [ ] Implement `CSVAnalysisTool` with gota - [ ] Add basic statistics: shape, dtypes, missing values, summary stats - [ ] Add correlation analysis - [ ] Create LLM-enhanced interpretation mode **Phase 3: Goal-Directed EDA** + - [ ] Create `GoalDirectedEDATool` with LLM planning - [ ] Implement iterative analysis loop - [ ] Add hypothesis testing support **Phase 4: Data Analyst Sub-Agent** (optional) + - [ ] Create `DataAnalystAgent` following SubResearcherAgent pattern - [ ] Implement `conduct_data_analysis` supervisor tool - [ ] Update supervisor prompt with new capability **Phase 5: Pickle Support** (optional) + - [ ] Implement Python subprocess bridge for pickle inspection - [ ] Add sandbox/security measures - [ ] Create `PickleAnalysisTool` @@ -427,6 +450,7 @@ For implementing these tools in Go: 2. **Optional LLM Enhancement**: Like `SearchTool.SetSummarizer()`, data tools should work standalone but optionally leverage LLM for deeper analysis 3. **Event Bus Integration**: For long-running analysis, emit progress events: + ```go bus.Publish(events.Event{ Type: events.EventDataAnalysisProgress, diff --git a/thoughts/shared/research/2025-12-03_thinkdeep-gap-analysis.md b/thoughts/shared/research/2025-12-03_thinkdeep-gap-analysis.md index 05cb06f..0444bfa 100644 --- a/thoughts/shared/research/2025-12-03_thinkdeep-gap-analysis.md +++ b/thoughts/shared/research/2025-12-03_thinkdeep-gap-analysis.md @@ -10,18 +10,19 @@ The go-research ThinkDeep implementation successfully captures the **core architecture** of the reference (diffusion-based multi-agent research with supervisor coordination), but has **significant gaps** in several areas: -| Category | Alignment | Criticality | -|----------|-----------|-------------| -| Core Architecture | ✅ 90% | - | -| Workflow Phases | ⚠️ 75% | Medium | -| Prompts | ⚠️ 70% | High | -| State Management | ✅ 85% | Low | -| Tool Handling | ⚠️ 60% | High | -| Search Strategy | ⚠️ 65% | Medium | -| Synthesis Process | ⚠️ 70% | High | -| Configuration | ✅ 90% | Low | +| Category | Alignment | Criticality | +| ----------------- | --------- | ----------- | +| Core Architecture | ✅ 90% | - | +| Workflow Phases | ⚠️ 75% | Medium | +| Prompts | ⚠️ 70% | High | +| State Management | ✅ 85% | Low | +| Tool Handling | ⚠️ 60% | High | +| Search Strategy | ⚠️ 65% | Medium | +| Synthesis Process | ⚠️ 70% | High | +| Configuration | ✅ 90% | Low | **Critical Gaps Requiring Immediate Attention:** + 1. Missing async parallel execution of sub-researchers 2. Missing webpage content summarization in search results 3. Prompt differences affecting research behavior @@ -33,18 +34,19 @@ The go-research ThinkDeep implementation successfully captures the **core archit ### What Matches ✅ -| Feature | Reference | Go Implementation | Status | -|---------|-----------|-------------------|--------| -| 4-phase workflow | Brief → Draft → Diffusion → Final | Brief → Draft → Diffusion → Final | ✅ Match | -| Supervisor-Worker pattern | Supervisor delegates to sub-agents | Supervisor delegates to sub-researchers | ✅ Match | -| Diffusion concept | Draft as noisy signal, refine via research | Same conceptual approach | ✅ Match | -| Max iterations | 15 supervisor iterations | 15 supervisor iterations | ✅ Match | -| Max concurrent | 3 parallel sub-agents | 3 parallel sub-researchers | ✅ Match | -| Max search/agent | 5 searches per agent | 5 searches per sub-researcher | ✅ Match | +| Feature | Reference | Go Implementation | Status | +| ------------------------- | ------------------------------------------ | --------------------------------------- | -------- | +| 4-phase workflow | Brief → Draft → Diffusion → Final | Brief → Draft → Diffusion → Final | ✅ Match | +| Supervisor-Worker pattern | Supervisor delegates to sub-agents | Supervisor delegates to sub-researchers | ✅ Match | +| Diffusion concept | Draft as noisy signal, refine via research | Same conceptual approach | ✅ Match | +| Max iterations | 15 supervisor iterations | 15 supervisor iterations | ✅ Match | +| Max concurrent | 3 parallel sub-agents | 3 parallel sub-researchers | ✅ Match | +| Max search/agent | 5 searches per agent | 5 searches per sub-researcher | ✅ Match | ### Critical Gap: User Clarification Phase 🔴 **Reference Implementation** (`research_agent_scope.py:37-68`): + - Has a `clarify_with_user` stage (currently disabled but implemented) - Uses structured output `ClarifyWithUser` schema with fields: - `need_clarification: bool` @@ -53,6 +55,7 @@ The go-research ThinkDeep implementation successfully captures the **core archit - Prompt asks LLM to determine if user input needs clarification **Go Implementation**: + - ❌ No clarification phase implemented - Jumps directly from user query to research brief generation @@ -61,11 +64,13 @@ The go-research ThinkDeep implementation successfully captures the **core archit ### Gap: Entry Point Separation 🟡 **Reference Implementation**: + - `research_agent_full.py` defines the complete workflow - `research_agent_scope.py` handles scoping (clarification + brief + draft) - Clear separation between scoping and execution **Go Implementation**: + - All phases combined in `orchestrator/think_deep.go` - Less modular - harder to customize individual phases @@ -80,7 +85,9 @@ The go-research ThinkDeep implementation successfully captures the **core archit **Reference Prompt** (`prompts.py:196-261`) - Key sections missing from Go: #### Missing: Explicit Diffusion Algorithm Statement + Reference has detailed algorithm explanation: + ``` 1. generate the next research questions to address gaps in the draft report 2. **ConductResearch**: retrieve external information to provide concrete delta for denoising @@ -92,7 +99,9 @@ Reference has detailed algorithm explanation: Go version has a simplified version but **lacks the critical instruction**: "even if the draft report looks complete, you should continue doing the research until all the research findings are collected." #### Missing: Scaling Rules + Reference (`prompts.py:248-261`): + ``` Simple fact-finding, lists, rankings → 1 sub-agent Example: "List top 10 coffee shops in San Francisco" → 1 agent @@ -104,7 +113,9 @@ Comparisons → 1 sub-agent per element **Go Implementation**: No explicit scaling rules in supervisor prompt. #### Missing: "Show Your Thinking" Integration + Reference instructs supervisor to use think_tool after each ConductResearch with specific questions: + - What key information did I find? - What's missing? - Do I have enough? @@ -129,6 +140,7 @@ Reference instructs supervisor to use think_tool after each ConductResearch with ### Gap: Compress Research Prompt 🟡 **Reference** (`prompts.py:263-308`) includes: + - Explicit tool call filtering instructions (include search, exclude think) - "Report can be as long as necessary" - "Don't lose any sources - downstream LLM will merge reports" @@ -139,6 +151,7 @@ Reference instructs supervisor to use think_tool after each ConductResearch with ### Gap: Final Report Prompt 🟡 **Reference** (`prompts.py:326-426`) includes detailed section guidelines: + ``` - Explicit discussion in simple, clear language - DO NOT oversimplify - clarify ambiguity @@ -149,20 +162,21 @@ Reference instructs supervisor to use think_tool after each ConductResearch with ``` **Go Implementation**: Has insightfulness/helpfulness rules but missing: + - "DO NOT list facts in bullets" rule - "Long, verbose sections expected" instruction - Detailed structure examples (comparison, lists, overview patterns) ### Prompt Alignment Summary -| Prompt | Reference Lines | Go Approx Lines | Content Match | -|--------|-----------------|-----------------|---------------| -| Supervisor | ~65 lines | ~40 lines | 70% | -| Research Agent | ~45 lines | ~30 lines | 75% | -| Compress | ~45 lines | ~35 lines | 80% | -| Final Report | ~100 lines | ~55 lines | 65% | -| Refine Draft | ~80 lines | ~35 lines | 60% | -| Research Brief | ~50 lines | ~45 lines | 85% | +| Prompt | Reference Lines | Go Approx Lines | Content Match | +| -------------- | --------------- | --------------- | ------------- | +| Supervisor | ~65 lines | ~40 lines | 70% | +| Research Agent | ~45 lines | ~30 lines | 75% | +| Compress | ~45 lines | ~35 lines | 80% | +| Final Report | ~100 lines | ~55 lines | 65% | +| Refine Draft | ~80 lines | ~35 lines | 60% | +| Research Brief | ~50 lines | ~45 lines | 85% | --- @@ -170,14 +184,14 @@ Reference instructs supervisor to use think_tool after each ConductResearch with ### What Matches ✅ -| State Field | Reference | Go | Status | -|-------------|-----------|-----|--------| -| supervisor_messages | ✅ | Messages | ✅ Match | -| research_brief | ✅ | ResearchBrief | ✅ Match | -| notes | ✅ (with operator.add) | Notes []string | ✅ Match | -| raw_notes | ✅ (with operator.add) | RawNotes []string | ✅ Match | -| draft_report | ✅ | DraftReport | ✅ Match | -| research_iterations | ✅ | Iterations | ✅ Match | +| State Field | Reference | Go | Status | +| ------------------- | ---------------------- | ----------------- | -------- | +| supervisor_messages | ✅ | Messages | ✅ Match | +| research_brief | ✅ | ResearchBrief | ✅ Match | +| notes | ✅ (with operator.add) | Notes []string | ✅ Match | +| raw_notes | ✅ (with operator.add) | RawNotes []string | ✅ Match | +| draft_report | ✅ | DraftReport | ✅ Match | +| research_iterations | ✅ | Iterations | ✅ Match | ### Minor Gap: Message Accumulation Pattern 🟢 @@ -194,6 +208,7 @@ Reference instructs supervisor to use think_tool after each ConductResearch with ### Critical Gap: Parallel Execution of Sub-Researchers 🔴 **Reference Implementation** (`multi_agent_supervisor.py:189-223`): + ```python coros = [ researcher_agent.ainvoke({ @@ -206,6 +221,7 @@ tool_results = await asyncio.gather(*coros) # TRUE PARALLELISM ``` **Go Implementation** (`supervisor.go:150-162`): + ```go for _, tc := range toolCalls { result, err := s.executeToolCall(...) // SEQUENTIAL EXECUTION @@ -214,6 +230,7 @@ for _, tc := range toolCalls { ``` **Impact**: HIGH + - Reference executes multiple sub-researchers truly in parallel - Go version executes them sequentially - Significantly impacts research speed for comparison queries @@ -222,6 +239,7 @@ for _, tc := range toolCalls { ### Critical Gap: refine_draft_report Tool Implementation 🔴 **Reference Implementation** (`multi_agent_supervisor.py:225-241`): + ```python def refine_draft_report(research_brief, findings, draft_report): """Refine draft report - Synthesizes research findings into comprehensive draft""" @@ -231,6 +249,7 @@ def refine_draft_report(research_brief, findings, draft_report): ``` **Go Implementation** (`think_deep/tools.go:124-183`): + - Takes args from tool call (none expected) - Joins state.Notes correctly - BUT: Missing the `InjectedToolArg` pattern - reference auto-injects state values @@ -240,11 +259,13 @@ def refine_draft_report(research_brief, findings, draft_report): ### Gap: Tool Call Format 🟡 **Reference**: Uses LangChain's native tool calling with `bind_tools()`: + ```python supervisor_model_with_tools = supervisor_model.bind_tools(supervisor_tools) ``` **Go Implementation**: Uses XML-style tool call parsing: + ```xml {"research_topic": "..."} ``` @@ -254,11 +275,13 @@ supervisor_model_with_tools = supervisor_model.bind_tools(supervisor_tools) ### Gap: Think Tool Semantic Handling 🟡 **Reference** (`research_agent.py:178-187`): + - Think tool calls are processed synchronously - Recorded as part of conversation - Explicitly filtered out during compression **Go Implementation**: + - Think tool acknowledged but treated as no-op - `FilterThinkToolCalls()` removes them from compression @@ -271,6 +294,7 @@ supervisor_model_with_tools = supervisor_model.bind_tools(supervisor_tools) ### Critical Gap: Webpage Content Summarization 🔴 **Reference Implementation** (`utils.py:80-111`, `132-156`): + ```python def summarize_webpage_content(raw_content: str) -> str: """Summarizes web page content using structured output""" @@ -280,17 +304,20 @@ def summarize_webpage_content(raw_content: str) -> str: ``` The reference: + 1. Fetches raw page content via Tavily (`include_raw_content=True`) 2. Summarizes each page using LLM with structured output 3. Returns formatted summary with key excerpts **Go Implementation**: + - Uses Brave Search API - Returns search snippets directly - ❌ NO webpage content fetching - ❌ NO content summarization **Impact**: HIGH + - Reference gets much richer content from web pages - Go version only gets search snippets (typically 150-200 chars) - Significantly impacts research quality and depth @@ -298,6 +325,7 @@ The reference: ### Gap: Search Deduplication 🟡 **Reference** (`utils.py:113-130`): + ```python def deduplicate_search_results(search_results: List[dict]) -> dict: unique_results = {} @@ -329,16 +357,17 @@ def deduplicate_search_results(search_results: List[dict]) -> dict: **Reference Implementation** uses structured output (Pydantic models) at key decision points: -| Stage | Reference Schema | Go Equivalent | -|-------|------------------|---------------| -| Clarification | `ClarifyWithUser` | ❌ Missing | +| Stage | Reference Schema | Go Equivalent | +| -------------- | ------------------ | ------------- | +| Clarification | `ClarifyWithUser` | ❌ Missing | | Research Brief | `ResearchQuestion` | ❌ Plain text | -| Draft Report | `DraftReport` | ❌ Plain text | -| Compression | `Summary` | ❌ Plain text | +| Draft Report | `DraftReport` | ❌ Plain text | +| Compression | `Summary` | ❌ Plain text | **Go Implementation**: All stages use plain text LLM responses. **Impact**: HIGH + - Structured output prevents hallucination in decisions - Ensures consistent parsing - Reference can deterministically route based on schema fields @@ -347,11 +376,13 @@ def deduplicate_search_results(search_results: List[dict]) -> dict: ### Gap: Draft Refinement Accumulation Strategy 🟡 **Reference** (`multi_agent_supervisor.py:225-241`): + - Calls `get_notes_from_tool_calls()` which extracts ALL tool message content - Joins with newlines - Every refinement uses ALL accumulated notes **Go Implementation** (`tools.go:124-183`): + - Uses `state.Notes` which already contains compressed findings - Joins with `\n---\n` - Similar behavior but slightly different separator @@ -361,6 +392,7 @@ def deduplicate_search_results(search_results: List[dict]) -> dict: ### Gap: Final Report Input Structure 🟡 **Reference** (`research_agent_full.py:42`): + ```python final_report_prompt = prompt.format( research_brief=state.get("research_brief", ""), @@ -381,13 +413,13 @@ final_report_prompt = prompt.format( ### What Matches ✅ -| Parameter | Reference | Go | Status | -|-----------|-----------|-----|--------| -| max_researcher_iterations | 15 | MaxSupervisorIterations: 15 | ✅ | -| max_concurrent_researchers | 3 | MaxConcurrentResearch: 3 | ✅ | -| max searches per agent | 5 (via prompt) | MaxIterations: 5 | ✅ | -| compress model max_tokens | 32000 | Uses default | ⚠️ Check | -| final report max_tokens | 40000 | Uses default | ⚠️ Check | +| Parameter | Reference | Go | Status | +| -------------------------- | -------------- | --------------------------- | -------- | +| max_researcher_iterations | 15 | MaxSupervisorIterations: 15 | ✅ | +| max_concurrent_researchers | 3 | MaxConcurrentResearch: 3 | ✅ | +| max searches per agent | 5 (via prompt) | MaxIterations: 5 | ✅ | +| compress model max_tokens | 32000 | Uses default | ⚠️ Check | +| final report max_tokens | 40000 | Uses default | ⚠️ Check | ### Gap: Model Selection 🟡 @@ -400,6 +432,7 @@ final_report_prompt = prompt.format( ### Gap: Model-Specific Token Limits 🟡 **Reference** explicitly sets: + - `compress_model = init_chat_model(model="openai:gpt-5", max_tokens=32000)` - `writer_model = init_chat_model(model="openai:gpt-5", max_tokens=40000)` @@ -414,6 +447,7 @@ final_report_prompt = prompt.format( ### Feature: Jupyter Notebook Compatibility 🟢 **Reference** (`multi_agent_supervisor.py:54-65`): + ```python try: import nest_asyncio @@ -450,6 +484,7 @@ try: - Use goroutines + WaitGroup for true parallelism - Location: `supervisor.go:150-162` - Pattern: + ```go var wg sync.WaitGroup results := make(chan SubResearcherResult, len(conductResearchCalls)) @@ -511,18 +546,18 @@ try: ## 10. Alignment Score by Component -| Component | Score | Notes | -|-----------|-------|-------| -| Core Architecture | 90% | Fundamentally correct | -| Workflow Phases | 75% | Missing clarification phase | -| Supervisor Agent | 70% | Missing parallel execution, prompt gaps | -| Sub-Researcher Agent | 75% | Prompt differences, no page summarization | -| State Management | 85% | Minor differences in patterns | -| Tool Handling | 60% | Sequential vs parallel, no structured output | -| Search Strategy | 65% | No page fetch, no deduplication | -| Synthesis | 70% | Prompt gaps, no structured output | -| Configuration | 90% | Model differences | -| **Overall** | **73%** | Functional but needs optimization | +| Component | Score | Notes | +| -------------------- | ------- | -------------------------------------------- | +| Core Architecture | 90% | Fundamentally correct | +| Workflow Phases | 75% | Missing clarification phase | +| Supervisor Agent | 70% | Missing parallel execution, prompt gaps | +| Sub-Researcher Agent | 75% | Prompt differences, no page summarization | +| State Management | 85% | Minor differences in patterns | +| Tool Handling | 60% | Sequential vs parallel, no structured output | +| Search Strategy | 65% | No page fetch, no deduplication | +| Synthesis | 70% | Prompt gaps, no structured output | +| Configuration | 90% | Model differences | +| **Overall** | **73%** | Functional but needs optimization | --- @@ -530,16 +565,16 @@ try: ### Reference Files → Go Equivalents -| Reference File | Go Equivalent | Alignment | -|----------------|---------------|-----------| -| `research_agent_full.py` | `architectures/think_deep/think_deep.go` | 80% | -| `multi_agent_supervisor.py` | `agents/supervisor.go` | 65% | -| `research_agent.py` | `agents/sub_researcher.go` | 75% | -| `research_agent_scope.py` | `orchestrator/think_deep.go` (partial) | 70% | -| `state_multi_agent_supervisor.py` | `think_deep/state.go` | 85% | -| `state_research.py` | `think_deep/state.go` (partial) | 80% | -| `prompts.py` | `think_deep/prompts.go` | 70% | -| `utils.py` | `think_deep/tools.go` + `tools/registry.go` | 55% | +| Reference File | Go Equivalent | Alignment | +| --------------------------------- | ------------------------------------------- | --------- | +| `research_agent_full.py` | `architectures/think_deep/think_deep.go` | 80% | +| `multi_agent_supervisor.py` | `agents/supervisor.go` | 65% | +| `research_agent.py` | `agents/sub_researcher.go` | 75% | +| `research_agent_scope.py` | `orchestrator/think_deep.go` (partial) | 70% | +| `state_multi_agent_supervisor.py` | `think_deep/state.go` | 85% | +| `state_research.py` | `think_deep/state.go` (partial) | 80% | +| `prompts.py` | `think_deep/prompts.go` | 70% | +| `utils.py` | `think_deep/tools.go` + `tools/registry.go` | 55% | ---