Preserve actionable URLs through Gmail preprocessing and condensation#131
Open
constkolesnyak wants to merge 1 commit into
Open
Preserve actionable URLs through Gmail preprocessing and condensation#131constkolesnyak wants to merge 1 commit into
constkolesnyak wants to merge 1 commit into
Conversation
Gmail preprocessing was stripping all standalone URLs indiscriminately. Now only removes boilerplate (unsubscribe, social media, tracking pixels) and keeps actionable links (booking, messaging, payment, reply threads). Condense prompts updated to explicitly preserve actionable URLs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Gmail source's HTML→text preprocessing in
nerve/sources/gmail.pyran a blanket URL-strip pass on every body — every standalonehttps://…was removed before the email reached the agent or the condenser. That kills the things you actually want to act on:What it should have stripped is the boilerplate —
mailto:, tracking pixels, footer social links, "view this in your browser" — none of which are usually a problem for the agent but the original blanket pattern was clearly aiming at.Symptom from production: inbox notifications routinely arrived with the URL removed, forcing manual search of the Gmail web UI for any link the message referenced.
Fix
Replace the blanket URL strip with
_is_boilerplate_url(url)heuristics. Boilerplate URLs get removed, everything else passes through unchanged:unsubscribe,manage-?(subscriptions|preferences), tracking pixel domains (Mailchimp click-tracking, etc.), and known social-footer hosts.Also tighten the
condenseprompt innerve/sources/runner.pyto explicitly tell the condenser model to preserve actionable URLs through summarisation (otherwise a 4k-char condensation pass would strip them again on the LLM side).Tests
The existing source-runner tests stay green (
pytest -k 'gmail or sources or runner':14 passed, 2 skipped). No new tests added — the heuristic is small and verified end-to-end by routine inbox processing.Files
nerve/sources/gmail.py—_is_boilerplate_url+ selective strip (+19 lines)nerve/sources/runner.py— condense prompt: keep actionable URLs (+3 lines)