Skip to content

feat(02-05): email writer agent — MCQ, A/B subjects, tone routing, CAN-SPAM#6

Open
coder-ishan wants to merge 6 commits intomainfrom
feature/phase-02-plan-05-writer
Open

feat(02-05): email writer agent — MCQ, A/B subjects, tone routing, CAN-SPAM#6
coder-ishan wants to merge 6 commits intomainfrom
feature/phase-02-plan-05-writer

Conversation

@coder-ishan
Copy link
Owner

Summary

  • Adds mcq_agent (Haiku) that generates 2–3 context-specific questions from IntelBriefFull — questions reference actual company/person context, not hardcoded templates; run is optional with a questionary.confirm() skip gate
  • Adds writer_agent (Sonnet, output_type=EmailDraft) with role-based tone routing: HR emails are longer with credentials foregrounded; CTO/CEO emails are short and direct; unknown defaults to short/direct
  • Implements build_can_spam_footer() with all 3 FTC-mandated elements (sender identity, physical address, unsubscribe mechanism); run_writer() persists Email + two FollowUp records (day=3, day=7) and advances Lead.status → drafted

Includes 02-01 commits (schemas + profile agent) as base.

Key Decisions

  • Tone detection — parsed from person_role string; _TONE_PROMPTS dict drives structurally different system prompt injections per recipient type
  • retrigger_mcq=True path (WRITER-13) — reject/regenerate with "different angle" re-runs MCQ before re-calling writer, same DB session
  • MCQ is truly optional — writer falls back to IntelBrief + match data alone if user skips; no forced interaction

Test plan

  • MCQ questions reference company_name or person_role from the IntelBriefFull, not generic placeholders
  • HR email body is visibly longer than CTO email for same lead
  • EmailDraft.can_spam_footer contains physical address and unsubscribe text
  • Two FollowUp rows (day=3, day=7) created in DB after run_writer()
  • Lead.status == "drafted" after successful writer run
  • Skipping MCQ (questionary.confirm() → No) still produces a complete EmailDraft

🤖 Generated with Claude Code

coder-ishan and others added 5 commits February 26, 2026 15:43
…ile)

- Create src/ingot/models/schemas.py with UserProfile, IntelBriefPhase1,
  IntelBriefFull, MatchResult, MCQAnswers, EmailDraft BaseModel schemas
- Create src/ingot/models/__init__.py exporting all 6 schema classes
- IntelBriefFull validates talking_points non-empty via field_validator
- EmailDraft validates body length >= 100 chars via field_validator
- MatchResult enforces match_score in 0-100 range via Field constraints
- Add extract_pdf_text() with multi-column layout support via column_boxes()
- Add extract_docx_text() preserving paragraph/table interleave order
- Add parse_resume() dispatcher (PDF | DOCX | plain-text fallback)
- Add ResumeParseError typed exception
- Add ProfileDeps dataclass and profile_agent (PydanticAI, output_type=UserProfile)
- Add validate_profile() rejecting < 10% populated fields (PROFILE-09)
- Add extract_profile() async orchestration: run agent, validate, persist to SQLite
- Add defer_model_check=True to allow import without ANTHROPIC_API_KEY [Rule 1 fix]
- Add PyMuPDF, python-docx, beautifulsoup4, scikit-learn, lxml to pyproject.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ent Task 1

- WriterDeps dataclass with sender/address fields for CAN-SPAM
- mcq_agent (Haiku) generates 2-3 context-specific questions from IntelBriefFull
- run_mcq() optional flow with questionary.confirm() skip gate
- build_can_spam_footer() with all 3 FTC-mandated elements; warns on empty address
- _TONE_PROMPTS dict for hr/cto/ceo/default with meaningfully different length+structure rules
- Both agents use defer_model_check=True (codebase convention)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ence — Task 2

- writer_agent (Sonnet) with output_type=EmailDraft and inject_writer_context system prompt
- Tone detection from person_role: hr/cto/ceo/default routing
- run_writer() persists Email + FollowUp(day=3) + FollowUp(day=7); Lead.status -> drafted
- retrigger_mcq=True path for WRITER-13 reject/regenerate flow
- CAN-SPAM footer built inline and appended to email body before persistence

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 26, 2026 10:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the email writer agent with MCQ (Multiple Choice Questions) personalization flow, tone-based email generation, CAN-SPAM compliance, and database persistence. It includes both the writer pipeline (02-05) and the profile agent from plan 02-01 as a foundation.

Changes:

  • Adds complete profile agent with PDF/DOCX parsing and PydanticAI-based resume extraction
  • Implements two-agent writer pipeline: mcq_agent (Haiku) for question generation and writer_agent (Sonnet) for email drafting
  • Implements tone-aware email generation with role-based routing (HR/CTO/CEO/default)
  • Adds CAN-SPAM compliant footer with sender identity, physical address, and unsubscribe mechanism
  • Creates Pydantic schemas for all Phase 2 agent outputs (UserProfile, IntelBriefFull, EmailDraft, etc.)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 23 comments.

Show a summary per file
File Description
src/ingot/models/schemas.py Defines Pydantic output schemas for all Phase 2 agents (UserProfile, IntelBriefFull, MatchResult, MCQAnswers, EmailDraft)
src/ingot/models/init.py Exports all schema classes from the models package
src/ingot/agents/writer.py Complete rewrite implementing MCQ agent, writer agent, tone routing, CAN-SPAM footer, and database persistence
src/ingot/agents/profile.py New module with PDF/DOCX parsing, PydanticAI profile extraction, validation, and persistence
pyproject.toml Adds dependencies for document parsing (PyMuPDF, python-docx) and ML/scraping (beautifulsoup4, scikit-learn, lxml)
.planning/phases/02-core-pipeline-scout-through-writer/02-05-SUMMARY.md Documents implementation details and requirements coverage
.planning/STATE.md Updates project state to reflect 02-05 completion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

recipient_type = "default"

tone_guidance = _TONE_PROMPTS[recipient_type]
deps.lead.__dict__["_resolved_recipient_type"] = recipient_type # stored for persistence
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable lead.__dict__["_resolved_recipient_type"] uses private dict manipulation to store state on the Lead SQLModel instance. This is fragile and bypasses the ORM layer. The underscore prefix suggests this is meant to be temporary/internal, but the value is later read from the same dict in run_writer() (line 328).

Consider either:

  1. Adding a proper field to the Lead model if this needs to be persisted
  2. Storing this in a local variable or the WriterDeps if it's only needed during the current operation
  3. Returning it as part of the EmailDraft if it needs to be passed between functions
Suggested change
deps.lead.__dict__["_resolved_recipient_type"] = recipient_type # stored for persistence
setattr(deps.lead, "_resolved_recipient_type", recipient_type) # stored for later use

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. lead.__dict__[...] is fragile and bypasses the ORM. Fixing by adding _resolved_recipient_type: str = "default" as a field on WriterDeps directly — that's the right place for transient pipeline state.

"PyMuPDF>=1.24",
"python-docx>=1.1",
"beautifulsoup4>=4.12",
"scikit-learn>=1.5",
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency scikit-learn>=1.5 is added but not used in any of the changed files in this PR. Consider either:

  1. Removing it if it's not needed yet
  2. Documenting in the PR description or comments why it's being added preemptively
  3. Or verifying that it's used in other parts of the codebase not visible in this diff

Adding unused dependencies increases installation size and potential security surface area.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push back — sklearn is used in src/ingot/scoring/scorer.py (TfidfVectorizer + cosine_similarity for the Phase 2 scoring formula). Not visible in this PR's diff since scorer.py was added in 02-02, but it's an active dependency.

Comment on lines +49 to +56
@field_validator("talking_points")
@classmethod
def at_least_one_talking_point(cls, v: list[str]) -> list[str]:
if not v:
raise ValueError("IntelBriefFull must have at least one talking point")
return v


Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The talking_points field has redundant validation - both a Field constraint min_length=1, max_length=3 (line 46) and a custom validator that checks for empty lists (lines 49-54). The Field constraints already enforce that there must be at least 1 item, making the custom validator's empty check redundant.

Consider removing the custom validator since the Field constraints are sufficient, or clarify in a comment why both are needed (e.g., if Field constraints produce different error messages than the custom validator).

Suggested change
@field_validator("talking_points")
@classmethod
def at_least_one_talking_point(cls, v: list[str]) -> list[str]:
if not v:
raise ValueError("IntelBriefFull must have at least one talking point")
return v

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Field(min_length=1) already enforces non-empty. Removing the custom validator.

Comment on lines +98 to +133
async def run_mcq(deps: WriterDeps) -> MCQAnswers:
"""
Run the optional MCQ flow for personalization.

LOCKED DECISION (02-CONTEXT.md):
- MCQ is optional — confirm with user before running
- If skipped, returns MCQAnswers(answers={}, skipped=True)
- When run, questions are LLM-generated from IntelBriefFull (not hardcoded)
- Question types: personalization hook + tone/intent + optional experience connection

Returns MCQAnswers with answers dict (question -> answer) and skipped flag.
"""
run_mcq_flag = questionary.confirm(
f"Run personalization questions for {deps.intel_brief.company_name}? "
f"(recommended, or press Enter to skip)",
default=True,
).ask()

if not run_mcq_flag:
return MCQAnswers(answers={}, skipped=True)

# LLM generates questions from IntelBriefFull
result = await mcq_agent.run(
"Generate personalized MCQ questions for this lead's outreach email.",
deps=deps,
)
questions: list[str] = result.output.questions

# Collect answers interactively
answers: dict[str, str] = {}
for q in questions:
answer = questionary.text(q, default="").ask()
if answer and answer.strip():
answers[q] = answer.strip()

return MCQAnswers(answers=answers, skipped=False)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run_mcq() function uses questionary.confirm() and questionary.text() which block for user input. This function is declared as async (line 98) but contains blocking synchronous I/O operations, which defeats the purpose of async and could cause performance issues in an async application.

Consider either:

  1. Making the function synchronous (remove async) since it performs blocking I/O
  2. Or using async-compatible CLI input methods if available
  3. Or documenting that this function is intentionally blocking despite being async

The issue is that while the function itself is async (it awaits the LLM call), the questionary calls will block the entire event loop, preventing other async operations from running.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function must stay async — it awaits mcq_agent.run() which is a PydanticAI async call. There's no async-compatible questionary alternative with the same API. The blocking questionary calls are expected in a CLI tool context (not a web server). Adding a docstring note to make this explicit.

"aioimaplib>=2",
"PyMuPDF>=1.24",
"python-docx>=1.1",
"beautifulsoup4>=4.12",
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency beautifulsoup4>=4.12 is added but not imported or used in any of the changed files in this PR. Consider either:

  1. Removing it if it's not needed yet
  2. Documenting why it's being added preemptively
  3. Or verifying that it's used in other parts of the codebase

BeautifulSoup is typically used for HTML parsing, which might be for web scraping in the Scout or Research agents.

Suggested change
"beautifulsoup4>=4.12",

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bs4 is a phase-locked dependency documented in 02-RESEARCH.md (SCOUT-03: 'YC scraping strategy: httpx + BS4 fallback'). It's listed in 02-01-PLAN.md as a phase-wide dependency, added at the phase boundary in 02-01 for use in the Scout agent (02-02). Dependencies are added by phase, not per-agent file.

Comment on lines +83 to +90
@field_validator("body")
@classmethod
def body_must_mention_company(cls, v: str, info: object) -> str:
# Validated post-generation: body must reference something specific.
# This is a soft check — passes unless body is suspiciously short.
if len(v) < 100:
raise ValueError("Email body is too short to be personalized (< 100 chars)")
return v
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field validator body_must_mention_company has a misleading name and inadequate implementation. It only checks if the body is longer than 100 characters, but doesn't actually verify that the company name is mentioned. Consider either:

  1. Renaming to body_must_have_minimum_length to match what it actually does
  2. Or implementing the actual check: verify that company_name from the IntelBriefFull appears in the body text

Note: The validator receives an info parameter but doesn't use it to access other fields for cross-field validation.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Renaming to body_must_have_minimum_length to match what it actually validates.


@field_validator("body")
@classmethod
def body_must_mention_company(cls, v: str, info: object) -> str:
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The info parameter in the validator is typed as object which provides no type safety or IDE support. In Pydantic v2, field validators receive a ValidationInfo parameter. Consider updating the type hint to ValidationInfo from pydantic or pydantic.types for better type safety and access to validation context.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Updating to pydantic.ValidationInfo.

Comment on lines +343 to +355
await deps.session.refresh(email_record)

# Persist Day 3 follow-up (WRITER-09, DB-06)
followup_day3 = FollowUp(
parent_email_id=email_record.id,
scheduled_for_day=3,
body=draft.followup_day3,
status=FollowUpStatus.queued,
created_at=datetime.utcnow(),
)
# Persist Day 7 follow-up (WRITER-09, DB-06)
followup_day7 = FollowUp(
parent_email_id=email_record.id,
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the first commit at line 342, email_record.id is used at line 347 to set parent_email_id for follow-ups. However, between line 342 and 343, there's an await session.refresh(email_record) which ensures the ID is populated.

The issue is that if the refresh fails or if the ID assignment fails in the database, lines 347 and 355 would use None for parent_email_id, creating follow-ups that aren't linked to any email. Consider adding an assertion or validation after line 343 to ensure email_record.id is not None before proceeding.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Adding assert email_record.id is not None, "DB failed to assign email.id after flush" after the flush before using the ID.

Comment on lines +52 to +59
# on the installed version. Try both; fall back to None (single-column).
try:
from pymupdf import column_boxes
except ImportError:
try:
from pymupdf.utils import column_boxes # type: ignore[no-redef]
except ImportError:
column_boxes = None # type: ignore[assignment]
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback handling for column_boxes import uses type: ignore[no-redef] (line 57) and type: ignore[assignment] (line 59) to suppress type checker warnings. While the logic is sound (trying multiple import paths), the type annotations could be improved.

Consider defining column_boxes with a proper type hint at the module level (e.g., Optional[Callable]) and removing the type ignores, or use a more explicit protocol/type stub for the expected function signature. This would make the code more maintainable and provide better IDE support.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except structure is self-documenting and works correctly. Not adding a protocol stub for an internal fallback import path.

match_result: MatchResult
lead: Lead
session: AsyncSession
mcq_answers: MCQAnswers = field(default_factory=lambda: MCQAnswers(answers={}, skipped=True))
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for mcq_answers in WriterDeps uses a lambda (line 45), but this creates a new MCQAnswers instance at field definition time, not at instance creation time. In dataclasses with mutable defaults, this is typically a pattern to avoid shared mutable state, but field(default_factory=...) is the more idiomatic approach.

Consider changing to:

mcq_answers: MCQAnswers = field(default_factory=MCQAnswers)

This assumes MCQAnswers(answers={}, skipped=True) is the default constructor behavior, which it is based on the schema definition (line 65-69).

Suggested change
mcq_answers: MCQAnswers = field(default_factory=lambda: MCQAnswers(answers={}, skipped=True))
mcq_answers: MCQAnswers = field(default_factory=MCQAnswers)

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push back. The code already uses field(default_factory=lambda: MCQAnswers(answers={}, skipped=True)) — this IS the field(default_factory=...) pattern. The lambda is required because MCQAnswers has no defaults for answers and skipped. The suggested replacement field(default_factory=MCQAnswers) would raise TypeError: MCQAnswers.__init__() missing 2 required positional arguments. The reviewer's description of the current code as creating an instance 'at field definition time' is also incorrect — default_factory always creates a new instance per object construction.

coder-ishan added a commit that referenced this pull request Feb 27, 2026
yc.py:
- Remove unused asyncio import
- Replace assert isinstance() with explicit ValueError (includes URL in message)
- Drop assert len > 100 — breaks batch/industry endpoints with < 100 results

scorer.py:
- Replace assert in ScoringWeights.__post_init__ with ValueError
- Fix score_lead() mutable default: weights=None sentinel, resolved inside body
- Add v2 TODO comment on _semantic_score for pre-fit vectorizer optimization

scout.py:
- Fix dedup bug: _is_duplicate queried Lead.person_email but received company_website;
  now correctly queries Lead.company_website
- Fix ScoutDeps.weights: default_factory=lambda: DEFAULT_WEIGHTS → default_factory=ScoringWeights
- Log silenced exceptions in batch fetch loop via structlog warning
- Fix misleading _yc_* metadata comment (fields are stripped; Research re-fetches)
- Move session.commit() outside per-row loop; single atomic commit with rollback

schemas.py:
- Rename body_must_mention_company → body_must_be_sufficiently_long (name now matches impl)

profile.py:
- Remove .doc from allowed suffixes (python-docx cannot parse binary .doc files)
- Update error message to direct users to convert .doc → .docx

writer.py (PR #6 carried fixes):
- Replace lead.__dict__["_resolved_recipient_type"] with WriterDeps.resolved_recipient_type field
- Replace datetime.utcnow() with datetime.now(timezone.utc) (Python 3.12+ deprecation)
- Replace double session.commit() with flush() + single atomic commit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- profile.py: remove stale inline `from docx import Document` inside extract_docx_text (already top-level)
- profile.py: datetime.utcnow() → datetime.now(timezone.utc) in extract_profile
- profile.py: add TODO(02-06) comment to model_override param docstring
- tests: replace WriterAgent-based TestWriterPipeline with factory-API imports test (run_writer, WriterDeps, create_writer_agent) and inspect.iscoroutinefunction check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants