Copyright 2025 Seppo Pakonen
Maestro is a command-line tool for composing complex projects with multiple AI models. Rather than pushing tasks through a linear pipeline, Maestro behaves like a conductor guiding an orchestra of models — planners, workers, refiners — each an instrument with its own voice.
A project planned with Maestro grows organically, like a living composition: themes develop, branches emerge, variations unfold, and motifs resolve across multiple voices.
Maestro helps humans and AIs work together to shape, refine, and restructure ambitious ideas — one movement at a time.
Plans don't appear fully formed they are composed. Maestro lets you talk with its planning AIs:
- Discuss, revise, and sculpt the plan
- Rewrite or clean" the root task
- Break the project into subtasks
- Control how much of the root task is given to each worker
You can conduct this process in two modes:
-
Interactive planning (
--discuss-plan) Talk with the AI until the structure feels right. -
One-shot planning (
--one-shot-plan) Produce a clean JSON plan in a single gesture.
If neither mode is specified, Maestro politely asks which style you prefer.
Real projects don't move in straight lines — they branch. Maestro treats planning as a tree of musical ideas:
- Every plan is a motif: a node in a branching structure
- You can create new branches mid-project
- Old branches remain as "dead ends" or alternative interpretations
- You can freely switch focus between branches
- A plan tree can be printed as colored ASCII score-like structure:
[*] P1 Main Theme (active)
├─ P2 Variant: Split Backend
│ └─ P3 Variant: Minimal API (dead)
└ P4 Variant: Test-Driven Rewrite
Each branch represents a different interpretation of your original composition.
Maestro provides an iterative planning mechanism to convert high-level plans into concrete project structures:
- Iterative Planning:
maestro plan exploreruns an iterative loop that converts plan bullet points into canonicalproject_opsJSON - Preview & Apply: Each iteration shows a preview of proposed changes before applying them
- Minimal Steps: AI proposes the smallest useful step to advance the project structure
- Safety First: Changes are validated and previewed before application
Example usage:
# Explore all plans in docs/plans.md
maestro plan explore
# Explore a specific plan
maestro plan explore "My Feature Plan"
# Explore with custom parameters
maestro plan explore --max-iterations 5 --apply
The explore command follows these steps:
- Loads plans and current project state
- Asks AI to propose minimal
ProjectOpsResultJSON - Validates the JSON against strict schema
- Previews changes via executor logic
- Applies changes with user approval (or automatically with
--apply)
The explore command supports persistent work sessions for long-running autonomous planning/execution:
- Session Persistence: Use
--save-sessionto enable detailed logging of each iteration - Resume Capability: Use
--session <id>to resume an interrupted exploration - Safe Autonomy: Use
--auto-applyfor autonomous execution (with preview logging) - Controlled Execution: Use
--stop-after-applyto apply one iteration then pause
Session artifacts are stored in docs/sessions/explore/<session-id>/ and include:
- Per-iteration data: prompt hash, AI response, validation result, preview summary
- Applied status and timestamps
- Error tracking for debugging
Example work session usage:
# Start a new explore session with persistent logging
maestro plan explore --save-session
# Resume an existing session (e.g., after interruption)
maestro plan explore --session abc123-def456
# Run in autonomous mode (apply without prompts)
maestro plan explore --auto-apply --max-iterations 10
# Apply one iteration, then pause for review
maestro plan explore --stop-after-apply
Sessions provide auditability, safe interruption (Ctrl+C), and deterministic resume without "lost context".
Maestro assigns roles according to each model's strengths:
Planners
- Codex – architectural motifs, structural design, advanced debugging
- Claude – deep reasoning, complex refinement, expressive thought
Workers
- Qwen – implementation, coding, straightforward bugfixing
- Gemini – natural-language heavy tasks, research, summarization
Each subtask receives a carefully crafted prompt with:
- cleaned root task excerpt
- relevant categories (like instrumental sections)
- context from previous movements
- optional partial outputs from interrupted runs
Instead of shoving the raw root task into every prompt, Maestro:
- rewrites your root task into a clean, musical score
- extracts conceptual categories (sections, voices, instruments)
- assigns only the relevant sections to each subtask
- optionally produces excerpts for workers
This keeps prompts focused, reducing noise and creative drift.
If you press Ctrl+C, Maestro behaves like a seasoned conductor:
- Stops the current AI subprocess gently
- Captures all partial output
- Saves it as part of the project history
- Marks the subtask as interrupted
- Allows a later continuation, including partial context in the next prompt
No stack traces. No lost work. No broken flow.
Maestro stores:
- root task raw + cleaned version
- categories + excerpts
- plan tree nodes
- subtask statuses
- all input prompts
- all raw AI outputs
- partial results
- user-facing summaries
Maestro ensures that every run is reproducible — every rehearsal preserved.
git clone https://github.com/OuluBSD/maestro.git
cd maestro
pip install -r requirements.txtIf you want the forked AI agent CLIs used by Maestro, clone with submodules:
git clone --recurse-submodules git@github.com:OuluBSD/maestro.gitOr initialize them later:
git submodule update --init --recursiveSubmodules live under external/ai-agents/:
external/ai-agents/qwen-code
external/ai-agents/gemini-cli
Or editable install:
pip install -e .For running the full test/TU suite and GUI completion helper, install dev extras in your virtualenv:
python -m venv ~/venv
~/venv/bin/pip install -r requirements-dev.txtIf libclang is not discovered automatically, point to it (e.g. on clang 21):
export LIBCLANG_PATH=/usr/lib/llvm/21/lib64/libclang.soLegacy root smoke/semantic integrity harnesses are kept as _legacy.py to avoid pytest duplicate-module clashes; run the maintained suites under tests/.
Maestro tests are organized into two main categories:
- Main-line tests: These tests run in parallel and do not require special permissions.
- Serial-line tests: These tests must run sequentially and often require git operations.
Serial-line tests are marked with the serial marker and often also have the git marker. To run tests that include serial-line tests, you need to set the MAESTRO_TEST_ALLOW_GIT environment variable:
# Run all tests including serial tests
MAESTRO_TEST_ALLOW_GIT=1 bash tools/test/run.sh --all
# Run only serial tests
MAESTRO_TEST_ALLOW_GIT=1 bash tools/test/run.sh -m "serial" -v
# Run tests with specific markers
MAESTRO_TEST_ALLOW_GIT=1 bash tools/test/run.sh -m "not legacy and serial" -vThe test runner supports various options:
--fast: Run only fast tests (not slow, not legacy, not tui, not integration)--medium: Run medium-speed tests (not legacy, not tui) - default--slow: Run slow tests (slow, not legacy, not tui)--all: Run all tests (not legacy)-m MARKEXPR: Run tests matching the marker expression--workers N: Specify number of parallel workers
echo "Build a MIDI-driven sandbox game engine" \
| maestro --session sessions/game.json --newmaestro --session sessions/game.json --plan(defaults to interactive discussion)
maestro -s sessions/game.json --one-shot-planmaestro -s sessions/game.json --resumemaestro -s sessions/game.json --show-plan-treemaestro -s sessions/game.json --focus-plan P4To compose is to discover. And discovery is rarely linear.
Maestro assumes:
- Creativity grows through branching ideas
- Plans evolve like musical themes explored, revisited, refined
- AI is best used as a collaborative instrument, not a silent worker
- Human intuition guides the music; AI provides the ensemble
- Every project is a symphony in progress
Maestro exists to ensure your composition — technical or artistic — can unfold freely, gracefully, and intelligently.
Maestro is designed to facilitate powerful AI-driven workflows, including those that are fully automated and operate autonomously. The system’s safety and reliability stem from a robust rule-based assertive validation layer, rather than a blanket restriction on AI action.
- Configurable Autonomy: AI can operate autonomously when configured to do so, for example, in scenarios like stress-testing, automated refactoring, or continuous integration pipelines. This is not the default mode but is a fully supported and intentional capability.
- Safety Through Rules: All AI-initiated actions, whether proposed for human review or executed autonomously, are funneled through Maestro’s validation mechanisms. These mechanisms enforce structural, syntactic, and semantic correctness, ensuring that only valid and coherent changes are applied to the project state.
- Controlled Mutation: AI may mutate project state when these actions are mediated and validated by Maestro. The system resists uncontrolled automation, guaranteeing that every change is auditable and adheres to predefined rules. This ensures predictability and maintains the integrity of the project.
Maestro enforces a strict, auditable prompt contract for all AI invocations to ensure:
- Every AI invocation is structurally predictable
- Missing context is explicit, not accidental
- Prompt drift is prevented across refactors
- Debugging becomes mechanical instead of interpretive
Every AI prompt must include 5 required sections in exact order:
[GOAL]
[CONTEXT]
[REQUIREMENTS]
[ACCEPTANCE CRITERIA]
[DELIVERABLES]
All prompts undergo validation before AI invocation:
- All required sections must exist
- Sections must be in correct order
- No section may be empty
- If validation fails, operation is aborted
All AI interactions are logged:
- Input prompts saved to:
sessions/<session>/inputs/ - AI outputs saved to:
sessions/<session>/outputs/ - Includes complete structured prompts and raw responses
For full technical details, see docs/prompt_contract.md.
New BSD
Maestro includes a comprehensive confidence scoring system that provides numeric confidence scores for conversion runs, enabling data-driven decisions about conversion quality and readiness for deployment.
The confidence scoring system assigns a numeric score (0-100) and letter grade (A-F) to each conversion run based on multiple quality indicators:
- Semantic integrity results
- Cross-repo semantic diff analysis
- Idempotency and drift detection
- Checkpoint activity
- Arbitration outcomes
- Arbitration Arena TUI for comparing competing AI outputs
- Open issues and warnings
- Validation results
The scoring model is configured via .maestro/convert/scoring/model.json:
{
"version": "1.0",
"scale": [0, 100],
"weights": {
"semantic_integrity": 0.35,
"semantic_diff": 0.20,
"drift_idempotency": 0.20,
"checkpoints": 0.10,
"open_issues": 0.10,
"validation": 0.05
},
"penalties": {
"semantic_low": 40,
"semantic_medium": 15,
"semantic_unknown": 8,
"lost_concept": 3,
"checkpoint_blocked": 10,
"checkpoint_overridden": 6,
"idempotency_failure": 20,
"drift_detected": 15,
"non_convergent": 25,
"open_issue": 2,
"validation_fail": 25
},
"floors": {
"any_semantic_low": 30
}
}maestro convert confidence show # Show most recent run
maestro convert confidence show --run-id run_1234567890 # Show specific runmaestro convert confidence history # Show last 10 runs
maestro convert confidence history --limit 20 # Show last 20 runsmaestro convert confidence gate --min-score 75 # Gate with minimum scoremaestro convert batch report --spec batch.json # Include confidence in report
maestro convert batch gate --spec batch.json --min-score 75 --aggregate min # Batch gatemaestro convert promote --min-score 80 # Promote with confidence check
maestro convert promote --force-promote # Force promotion regardless of scoreConfidence scores are automatically computed after each successful conversion run and stored in .maestro/convert/runs/<run_id>/confidence.json and .maestro/convert/runs/<run_id>/confidence.md.
Batch jobs also compute confidence scores, and batch-level confidence can be aggregated using mean, median, or min methods.
The Semantic Integrity Panel is a dedicated TUI screen that makes semantic risks visible, understandable, and actionable by humans during code conversion. It addresses the critical question:
"Yes, the code was converted — but did the intent survive?"
The Semantic Integrity Panel can be accessed through:
- TUI Navigation: Press
ior click "Integrity" in the navigation menu - Command Palette:
Ctrl+Pthen select "Go to semantic integrity panel" - Keyboard Shortcut: Direct access via
Ctrl+I(in TUI)
The panel features a three-panel layout for efficient workflow:
- Overall semantic health score (0-100%)
- Risk distribution: High, Medium, Low
- Status counts: Accepted, Rejected, Blocking
- Active gates/checkpoints (if any)
- Task ID, affected files, equivalence level
- Risk flags and current status
- Visual indicators for risk level
- Selectable with keyboard navigation
- Detailed explanation from semantic analysis
- Before/after conversion evidence
- Current disposition and decision reason
- Impact assessment: blocks pipeline? checkpoint ID
Each finding supports four human actions with safety measures:
- Accept (A): Mark as reviewed and accepted (with confirmation)
- Reject (R): Mark as rejected with required reason (blocks pipeline)
- Defer (D): Leave unresolved, keeps gate (with confirmation)
- Explain (E): Show detailed rationale history
Semantic operations are also available via command palette (Ctrl+P):
semantic list- Show summary of all findingssemantic show <id>- Show detailed finding informationsemantic accept <id>- Accept a specific findingsemantic reject <id>- Reject a specific finding with reasonsemantic defer <id>- Defer a specific finding
The panel integrates directly with the conversion pipeline:
- Shows which findings are currently blocking
- Displays associated checkpoint IDs
- Updates pipeline status immediately when findings are resolved
- Maintains audit trail of all human decisions
Maestro operates under a critical architectural principle: AI actions on project state are always mediated and validated by Maestro's assertive control layer. This means AI does not bypass Maestro to make changes. Instead, it interacts through structured mechanisms that enforce deterministic, reviewable changes and maintain project integrity.
- AI Initiates Actions: AI models generate structured JSON operations (e.g., via the DiscussionRouter) that represent desired changes or proposals.
- Maestro's Validation Layer: All AI-initiated operations are subjected to Maestro's rule-based validation. This layer rigorously checks for structural, syntactic, and semantic correctness against predefined contracts and project rules. Invalid operations are rejected.
- Controlled Application: Depending on configuration (e.g., manual approval,
ai_dangerously_skip_permissionssetting), valid operations are either:- Proposed for Human Review: Displayed as a diff-like preview requiring explicit user approval before application. This is the default for interactive workflows.
- Applied Autonomously: Executed directly by Maestro if configured for autonomous mode (e.g., in CI/CD, stress testing). This leverages Maestro's validation to ensure safety even without immediate human oversight.
- Audit Trail: All discussions, proposed operations, and applied changes are logged with metadata, ensuring full auditability.
Each discussion scope has specific allowed operations that define what the AI can initiate and what Maestro will validate:
- Track Contract:
add_track,add_phase,add_task,mark_done,mark_todo - Phase Contract:
add_phase,add_task,move_task,edit_task_fields,mark_done,mark_todo - Task Contract:
add_task,move_task,edit_task_fields,mark_done,mark_todo
This structured approach ensures AI proposals are constrained to appropriate scopes and operations, and are always processed through Maestro's assertive control.
Maestro provides a command to generate a canonical "project understanding snapshot" that captures the current state of project knowledge:
maestro understand dumpThis command generates a Markdown file at docs/UNDERSTANDING_SNAPSHOT.md (or a custom path with --output) that contains:
- Identity: What Maestro is and is not
- Authority Model: What humans control vs. what AI can do
- Assertive Rule Gates: Hard-stop conditions and where they're enforced
- Mutation Boundaries: What may change and through which channel
- Automation & Long-Run Mode: Explore sessions, resume capability, and auto-apply meaning
- Directory Semantics: Distinction between
.maestro/and$HOME/.maestro/ - Contracts: Summary of
plan_opsandproject_opscanonical expectations - Evidence Index: Claims mapped to file path references
Use maestro understand dump --check in CI to ensure the snapshot is up-to-date, or maestro understand dump --output <path> to specify a custom output location.
The snapshot is derived from actual code/config/docs, not narrative assumptions, making it a reliable source of truth about the project's capabilities and constraints.
Maestro enforces a "docs are truth + rule-assertive contracts" model through a CI truth gate that runs on all pull requests and main branch pushes.
The truth gate validates:
- Snapshot freshness:
UNDERSTANDING_SNAPSHOT.mdis up to date with the current codebase - Contract validation: Canonical JSON contracts (
plan_opsandproject_ops) validate correctly - Smoke tests: Core CLI entrypoints for ops/explore/discuss work at a basic level
# Using make (preferred method):
make truth-gate
# Or run the script directly:
bash scripts/truth_gate.sh
# For verbose output:
TRUTH_GATE_VERBOSE=1 make truth-gate- Snapshot drift: Run
maestro understand dumpto update the snapshot - Contract drift: Ensure fixtures match the expected schema or update the code accordingly
- Bootstrap issues: The truth gate handles clean checkout initialization automatically
The truth gate runs automatically in GitHub Actions on pull requests and main branch pushes. It must pass for any code to be merged.
