feat(02-05): email writer agent — MCQ, A/B subjects, tone routing, CAN-SPAM by coder-ishan · Pull Request #6 · coder-ishan/Ingot

coder-ishan · 2026-02-26T10:32:21Z

Summary

Adds mcq_agent (Haiku) that generates 2–3 context-specific questions from IntelBriefFull — questions reference actual company/person context, not hardcoded templates; run is optional with a questionary.confirm() skip gate
Adds writer_agent (Sonnet, output_type=EmailDraft) with role-based tone routing: HR emails are longer with credentials foregrounded; CTO/CEO emails are short and direct; unknown defaults to short/direct
Implements build_can_spam_footer() with all 3 FTC-mandated elements (sender identity, physical address, unsubscribe mechanism); run_writer() persists Email + two FollowUp records (day=3, day=7) and advances Lead.status → drafted

Includes 02-01 commits (schemas + profile agent) as base.

Key Decisions

Tone detection — parsed from person_role string; _TONE_PROMPTS dict drives structurally different system prompt injections per recipient type
retrigger_mcq=True path (WRITER-13) — reject/regenerate with "different angle" re-runs MCQ before re-calling writer, same DB session
MCQ is truly optional — writer falls back to IntelBrief + match data alone if user skips; no forced interaction

Test plan

MCQ questions reference company_name or person_role from the IntelBriefFull, not generic placeholders
HR email body is visibly longer than CTO email for same lead
EmailDraft.can_spam_footer contains physical address and unsubscribe text
Two FollowUp rows (day=3, day=7) created in DB after run_writer()
Lead.status == "drafted" after successful writer run
Skipping MCQ (questionary.confirm() → No) still produces a complete EmailDraft

🤖 Generated with Claude Code

…ile) - Create src/ingot/models/schemas.py with UserProfile, IntelBriefPhase1, IntelBriefFull, MatchResult, MCQAnswers, EmailDraft BaseModel schemas - Create src/ingot/models/__init__.py exporting all 6 schema classes - IntelBriefFull validates talking_points non-empty via field_validator - EmailDraft validates body length >= 100 chars via field_validator - MatchResult enforces match_score in 0-100 range via Field constraints

- Add extract_pdf_text() with multi-column layout support via column_boxes() - Add extract_docx_text() preserving paragraph/table interleave order - Add parse_resume() dispatcher (PDF | DOCX | plain-text fallback) - Add ResumeParseError typed exception - Add ProfileDeps dataclass and profile_agent (PydanticAI, output_type=UserProfile) - Add validate_profile() rejecting < 10% populated fields (PROFILE-09) - Add extract_profile() async orchestration: run agent, validate, persist to SQLite - Add defer_model_check=True to allow import without ANTHROPIC_API_KEY [Rule 1 fix] - Add PyMuPDF, python-docx, beautifulsoup4, scikit-learn, lxml to pyproject.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ent Task 1 - WriterDeps dataclass with sender/address fields for CAN-SPAM - mcq_agent (Haiku) generates 2-3 context-specific questions from IntelBriefFull - run_mcq() optional flow with questionary.confirm() skip gate - build_can_spam_footer() with all 3 FTC-mandated elements; warns on empty address - _TONE_PROMPTS dict for hr/cto/ceo/default with meaningfully different length+structure rules - Both agents use defer_model_check=True (codebase convention) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ence — Task 2 - writer_agent (Sonnet) with output_type=EmailDraft and inject_writer_context system prompt - Tone detection from person_role: hr/cto/ceo/default routing - run_writer() persists Email + FollowUp(day=3) + FollowUp(day=7); Lead.status -> drafted - retrigger_mcq=True path for WRITER-13 reject/regenerate flow - CAN-SPAM footer built inline and appended to email body before persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR implements the email writer agent with MCQ (Multiple Choice Questions) personalization flow, tone-based email generation, CAN-SPAM compliance, and database persistence. It includes both the writer pipeline (02-05) and the profile agent from plan 02-01 as a foundation.

Changes:

Adds complete profile agent with PDF/DOCX parsing and PydanticAI-based resume extraction
Implements two-agent writer pipeline: mcq_agent (Haiku) for question generation and writer_agent (Sonnet) for email drafting
Implements tone-aware email generation with role-based routing (HR/CTO/CEO/default)
Adds CAN-SPAM compliant footer with sender identity, physical address, and unsubscribe mechanism
Creates Pydantic schemas for all Phase 2 agent outputs (UserProfile, IntelBriefFull, EmailDraft, etc.)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
src/ingot/models/schemas.py	Defines Pydantic output schemas for all Phase 2 agents (UserProfile, IntelBriefFull, MatchResult, MCQAnswers, EmailDraft)
src/ingot/models/init.py	Exports all schema classes from the models package
src/ingot/agents/writer.py	Complete rewrite implementing MCQ agent, writer agent, tone routing, CAN-SPAM footer, and database persistence
src/ingot/agents/profile.py	New module with PDF/DOCX parsing, PydanticAI profile extraction, validation, and persistence
pyproject.toml	Adds dependencies for document parsing (PyMuPDF, python-docx) and ML/scraping (beautifulsoup4, scikit-learn, lxml)
.planning/phases/02-core-pipeline-scout-through-writer/02-05-SUMMARY.md	Documents implementation details and requirements coverage
.planning/STATE.md	Updates project state to reflect 02-05 completion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T10:43:59Z

src/ingot/agents/writer.py

+        recipient_type = "default"
+
+    tone_guidance = _TONE_PROMPTS[recipient_type]
+    deps.lead.__dict__["_resolved_recipient_type"] = recipient_type  # stored for persistence


The variable lead.__dict__["_resolved_recipient_type"] uses private dict manipulation to store state on the Lead SQLModel instance. This is fragile and bypasses the ORM layer. The underscore prefix suggests this is meant to be temporary/internal, but the value is later read from the same dict in run_writer() (line 328).

Consider either:

Adding a proper field to the Lead model if this needs to be persisted

Storing this in a local variable or the WriterDeps if it's only needed during the current operation

Returning it as part of the EmailDraft if it needs to be passed between functions

Suggested change

deps.lead.__dict__["_resolved_recipient_type"] = recipient_type # stored for persistence

setattr(deps.lead, "_resolved_recipient_type", recipient_type) # stored for later use

Agreed. lead.__dict__[...] is fragile and bypasses the ORM. Fixing by adding _resolved_recipient_type: str = "default" as a field on WriterDeps directly — that's the right place for transient pipeline state.

Copilot · 2026-02-26T10:44:00Z

pyproject.toml

+    "PyMuPDF>=1.24",
+    "python-docx>=1.1",
+    "beautifulsoup4>=4.12",
+    "scikit-learn>=1.5",


The dependency scikit-learn>=1.5 is added but not used in any of the changed files in this PR. Consider either:

Removing it if it's not needed yet

Documenting in the PR description or comments why it's being added preemptively

Or verifying that it's used in other parts of the codebase not visible in this diff

Adding unused dependencies increases installation size and potential security surface area.

Push back — sklearn is used in src/ingot/scoring/scorer.py (TfidfVectorizer + cosine_similarity for the Phase 2 scoring formula). Not visible in this PR's diff since scorer.py was added in 02-02, but it's an active dependency.

Copilot · 2026-02-26T10:44:00Z

src/ingot/models/schemas.py

+    @field_validator("talking_points")
+    @classmethod
+    def at_least_one_talking_point(cls, v: list[str]) -> list[str]:
+        if not v:
+            raise ValueError("IntelBriefFull must have at least one talking point")
+        return v
+
+


The talking_points field has redundant validation - both a Field constraint min_length=1, max_length=3 (line 46) and a custom validator that checks for empty lists (lines 49-54). The Field constraints already enforce that there must be at least 1 item, making the custom validator's empty check redundant.

Consider removing the custom validator since the Field constraints are sufficient, or clarify in a comment why both are needed (e.g., if Field constraints produce different error messages than the custom validator).

Suggested change

@field_validator("talking_points")

@classmethod

def at_least_one_talking_point(cls, v: list[str]) -> list[str]:

if not v:

raise ValueError("IntelBriefFull must have at least one talking point")

return v

Valid. Field(min_length=1) already enforces non-empty. Removing the custom validator.

Copilot · 2026-02-26T10:44:00Z

src/ingot/agents/writer.py

+async def run_mcq(deps: WriterDeps) -> MCQAnswers:
+    """
+    Run the optional MCQ flow for personalization.
+
+    LOCKED DECISION (02-CONTEXT.md):
+    - MCQ is optional — confirm with user before running
+    - If skipped, returns MCQAnswers(answers={}, skipped=True)
+    - When run, questions are LLM-generated from IntelBriefFull (not hardcoded)
+    - Question types: personalization hook + tone/intent + optional experience connection
+
+    Returns MCQAnswers with answers dict (question -> answer) and skipped flag.
+    """
+    run_mcq_flag = questionary.confirm(
+        f"Run personalization questions for {deps.intel_brief.company_name}? "
+        f"(recommended, or press Enter to skip)",
+        default=True,
+    ).ask()
+
+    if not run_mcq_flag:
+        return MCQAnswers(answers={}, skipped=True)
+
+    # LLM generates questions from IntelBriefFull
+    result = await mcq_agent.run(
+        "Generate personalized MCQ questions for this lead's outreach email.",
+        deps=deps,
+    )
+    questions: list[str] = result.output.questions
+
+    # Collect answers interactively
+    answers: dict[str, str] = {}
+    for q in questions:
+        answer = questionary.text(q, default="").ask()
+        if answer and answer.strip():
+            answers[q] = answer.strip()
+
+    return MCQAnswers(answers=answers, skipped=False)


The run_mcq() function uses questionary.confirm() and questionary.text() which block for user input. This function is declared as async (line 98) but contains blocking synchronous I/O operations, which defeats the purpose of async and could cause performance issues in an async application.

Consider either:

Making the function synchronous (remove async) since it performs blocking I/O

Or using async-compatible CLI input methods if available

Or documenting that this function is intentionally blocking despite being async

The issue is that while the function itself is async (it awaits the LLM call), the questionary calls will block the entire event loop, preventing other async operations from running.

The function must stay async — it awaits mcq_agent.run() which is a PydanticAI async call. There's no async-compatible questionary alternative with the same API. The blocking questionary calls are expected in a CLI tool context (not a web server). Adding a docstring note to make this explicit.

Copilot · 2026-02-26T10:44:01Z

pyproject.toml

    "aioimaplib>=2",
+    "PyMuPDF>=1.24",
+    "python-docx>=1.1",
+    "beautifulsoup4>=4.12",


The dependency beautifulsoup4>=4.12 is added but not imported or used in any of the changed files in this PR. Consider either:

Removing it if it's not needed yet

Documenting why it's being added preemptively

Or verifying that it's used in other parts of the codebase

BeautifulSoup is typically used for HTML parsing, which might be for web scraping in the Scout or Research agents.

Suggested change

"beautifulsoup4>=4.12",

bs4 is a phase-locked dependency documented in 02-RESEARCH.md (SCOUT-03: 'YC scraping strategy: httpx + BS4 fallback'). It's listed in 02-01-PLAN.md as a phase-wide dependency, added at the phase boundary in 02-01 for use in the Scout agent (02-02). Dependencies are added by phase, not per-agent file.

Copilot · 2026-02-26T10:44:04Z

src/ingot/models/schemas.py

+    @field_validator("body")
+    @classmethod
+    def body_must_mention_company(cls, v: str, info: object) -> str:
+        # Validated post-generation: body must reference something specific.
+        # This is a soft check — passes unless body is suspiciously short.
+        if len(v) < 100:
+            raise ValueError("Email body is too short to be personalized (< 100 chars)")
+        return v


The field validator body_must_mention_company has a misleading name and inadequate implementation. It only checks if the body is longer than 100 characters, but doesn't actually verify that the company name is mentioned. Consider either:

Renaming to body_must_have_minimum_length to match what it actually does

Or implementing the actual check: verify that company_name from the IntelBriefFull appears in the body text

Note: The validator receives an info parameter but doesn't use it to access other fields for cross-field validation.

Valid. Renaming to body_must_have_minimum_length to match what it actually validates.

Copilot · 2026-02-26T10:44:04Z

src/ingot/models/schemas.py

+
+    @field_validator("body")
+    @classmethod
+    def body_must_mention_company(cls, v: str, info: object) -> str:


The info parameter in the validator is typed as object which provides no type safety or IDE support. In Pydantic v2, field validators receive a ValidationInfo parameter. Consider updating the type hint to ValidationInfo from pydantic or pydantic.types for better type safety and access to validation context.

Valid. Updating to pydantic.ValidationInfo.

Copilot · 2026-02-26T10:44:05Z

src/ingot/agents/writer.py

+    await deps.session.refresh(email_record)
+
+    # Persist Day 3 follow-up (WRITER-09, DB-06)
+    followup_day3 = FollowUp(
+        parent_email_id=email_record.id,
+        scheduled_for_day=3,
+        body=draft.followup_day3,
+        status=FollowUpStatus.queued,
+        created_at=datetime.utcnow(),
+    )
+    # Persist Day 7 follow-up (WRITER-09, DB-06)
+    followup_day7 = FollowUp(
+        parent_email_id=email_record.id,


After the first commit at line 342, email_record.id is used at line 347 to set parent_email_id for follow-ups. However, between line 342 and 343, there's an await session.refresh(email_record) which ensures the ID is populated.

The issue is that if the refresh fails or if the ID assignment fails in the database, lines 347 and 355 would use None for parent_email_id, creating follow-ups that aren't linked to any email. Consider adding an assertion or validation after line 343 to ensure email_record.id is not None before proceeding.

Valid. Adding assert email_record.id is not None, "DB failed to assign email.id after flush" after the flush before using the ID.

Copilot · 2026-02-26T10:44:05Z

src/ingot/agents/profile.py

+    # on the installed version.  Try both; fall back to None (single-column).
+    try:
+        from pymupdf import column_boxes
+    except ImportError:
+        try:
+            from pymupdf.utils import column_boxes  # type: ignore[no-redef]
+        except ImportError:
+            column_boxes = None  # type: ignore[assignment]


The fallback handling for column_boxes import uses type: ignore[no-redef] (line 57) and type: ignore[assignment] (line 59) to suppress type checker warnings. While the logic is sound (trying multiple import paths), the type annotations could be improved.

Consider defining column_boxes with a proper type hint at the module level (e.g., Optional[Callable]) and removing the type ignores, or use a more explicit protocol/type stub for the expected function signature. This would make the code more maintainable and provide better IDE support.

The try/except structure is self-documenting and works correctly. Not adding a protocol stub for an internal fallback import path.

Copilot · 2026-02-26T10:44:05Z

src/ingot/agents/writer.py

+    match_result: MatchResult
+    lead: Lead
+    session: AsyncSession
+    mcq_answers: MCQAnswers = field(default_factory=lambda: MCQAnswers(answers={}, skipped=True))


The default value for mcq_answers in WriterDeps uses a lambda (line 45), but this creates a new MCQAnswers instance at field definition time, not at instance creation time. In dataclasses with mutable defaults, this is typically a pattern to avoid shared mutable state, but field(default_factory=...) is the more idiomatic approach.

Consider changing to:

mcq_answers: MCQAnswers = field(default_factory=MCQAnswers)

This assumes MCQAnswers(answers={}, skipped=True) is the default constructor behavior, which it is based on the schema definition (line 65-69).

Suggested change

mcq_answers: MCQAnswers = field(default_factory=lambda: MCQAnswers(answers={}, skipped=True))

mcq_answers: MCQAnswers = field(default_factory=MCQAnswers)

Push back. The code already uses field(default_factory=lambda: MCQAnswers(answers={}, skipped=True)) — this IS the field(default_factory=...) pattern. The lambda is required because MCQAnswers has no defaults for answers and skipped. The suggested replacement field(default_factory=MCQAnswers) would raise TypeError: MCQAnswers.__init__() missing 2 required positional arguments. The reviewer's description of the current code as creating an instance 'at field definition time' is also incorrect — default_factory always creates a new instance per object construction.

yc.py: - Remove unused asyncio import - Replace assert isinstance() with explicit ValueError (includes URL in message) - Drop assert len > 100 — breaks batch/industry endpoints with < 100 results scorer.py: - Replace assert in ScoringWeights.__post_init__ with ValueError - Fix score_lead() mutable default: weights=None sentinel, resolved inside body - Add v2 TODO comment on _semantic_score for pre-fit vectorizer optimization scout.py: - Fix dedup bug: _is_duplicate queried Lead.person_email but received company_website; now correctly queries Lead.company_website - Fix ScoutDeps.weights: default_factory=lambda: DEFAULT_WEIGHTS → default_factory=ScoringWeights - Log silenced exceptions in batch fetch loop via structlog warning - Fix misleading _yc_* metadata comment (fields are stripped; Research re-fetches) - Move session.commit() outside per-row loop; single atomic commit with rollback schemas.py: - Rename body_must_mention_company → body_must_be_sufficiently_long (name now matches impl) profile.py: - Remove .doc from allowed suffixes (python-docx cannot parse binary .doc files) - Update error message to direct users to convert .doc → .docx writer.py (PR #6 carried fixes): - Replace lead.__dict__["_resolved_recipient_type"] with WriterDeps.resolved_recipient_type field - Replace datetime.utcnow() with datetime.now(timezone.utc) (Python 3.12+ deprecation) - Replace double session.commit() with flush() + single atomic commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- profile.py: remove stale inline `from docx import Document` inside extract_docx_text (already top-level) - profile.py: datetime.utcnow() → datetime.now(timezone.utc) in extract_profile - profile.py: add TODO(02-06) comment to model_override param docstring - tests: replace WriterAgent-based TestWriterPipeline with factory-API imports test (run_writer, WriterDeps, create_writer_agent) and inspect.iscoroutinefunction check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coder-ishan and others added 5 commits February 26, 2026 15:43

docs(02-05): SUMMARY.md + STATE.md — plan complete

4777e67

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 26, 2026 10:32

Copilot started reviewing on behalf of coder-ishan February 26, 2026 10:33 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

	deps.lead.__dict__["_resolved_recipient_type"] = recipient_type # stored for persistence
	setattr(deps.lead, "_resolved_recipient_type", recipient_type) # stored for later use

	mcq_answers: MCQAnswers = field(default_factory=lambda: MCQAnswers(answers={}, skipped=True))
	mcq_answers: MCQAnswers = field(default_factory=MCQAnswers)

Conversation

coder-ishan commented Feb 26, 2026

Summary

Key Decisions

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants