Skip to content

feat(webhook): add persistent retry with exponential backoff#19

Merged
jonathanpeterwu merged 61 commits into
mainfrom
feature/webhook-retry-backoff
Jun 9, 2026
Merged

feat(webhook): add persistent retry with exponential backoff#19
jonathanpeterwu merged 61 commits into
mainfrom
feature/webhook-retry-backoff

Conversation

@jonathanpeterwu

Copy link
Copy Markdown
Collaborator

Summary

  • Add persistent webhook delivery queue with SQLite-backed retry and exponential backoff
  • Failed webhook events retry up to 5x with backoff schedule: 1s, 2s, 4s, 8s, 16s (capped at 300s with jitter)
  • Deliveries persist across process restarts — no more lost events on crash
  • Health endpoint now returns delivery stats by status (pending/processing/completed/failed/dead)
  • Also fixes 46 pre-existing lint errors, OpenRouter test skip on invalid API key, and lazy redis import

Changes

File Change
src/integrations/linear/webhook-retry.ts New: WebhookDeliveryQueue with SQLite persistence, atomic claim, background worker
src/integrations/linear/webhook-server.ts Replace in-memory queue with WebhookDeliveryQueue
src/integrations/linear/__tests__/webhook-retry.test.ts 11 tests: enqueue, retry, dead letter, concurrency, stats
src/core/storage/two-tier-storage.ts Make redis import lazy (optional dep)
src/core/extensions/__tests__/openrouter-integration.test.ts Skip on 401/403 instead of failing
8 other files Fix pre-existing eslint errors

Test plan

  • 11 unit tests for WebhookDeliveryQueue (all passing)
  • Existing test suite: 144/145 pass, 1 skipped (OpenRouter)
  • Build succeeds
  • Lint: 0 errors

StackMemory Bot (CLI) added 30 commits May 5, 2026 19:11
stackmemory scaffold creates company/, wiki/, skills/, clients/, raw/,
and .stackmemory/config.yml. Enables local context management with
file-based skill rot detection and tenant isolation.
New DaemonTelemetryService collects anonymous usage snapshots:
- Daemon health (uptime, context saves, memory triggers, errors)
- Session counts (total heartbeats, active now)
- Skill audit entries, handoff counts
- No PII — instance ID is random hex

Runs daily (default 24h interval, first at boot+30s).
Stores rolling 90-snapshot history in ~/.stackmemory/telemetry.json.
Opt out: STACKMEMORY_TELEMETRY=0 or telemetry.enabled: false in config.
…gest skills

Three-component system in DaemonDesirePathService:
1. ActionStreamLogger — PostToolUse hook captures tool:target pairs
   to ~/.stackmemory/desire-paths/action-stream.jsonl (no data/content)
2. PatternDetector — sliding window extracts repeated sequences,
   filters by min 3 occurrences across 2+ sessions, scores by freq×sessions
3. SkillSuggester — generates skill.md files from top patterns with
   inputs/outputs inferred from sequence endpoints

- 10MB JSONL rotation, 10K entry scan cap for performance
- Opt out: STACKMEMORY_DESIRE_PATHS=0 or desirePaths.enabled: false
- Scans every 6h, first at boot+2m
- Suggestions written to ~/.stackmemory/desire-paths/suggestions/
- 3 adversarial review rounds: fixed separator injection, added scan cap,
  improved skill naming with target directory context
- Auto-starts daemon on session boot
- Writes session heartbeats for telemetry tracking
- Restores handoff context from previous sessions
- Sets STACKMEMORY_SESSION env for desire-path hook
- Determinism watcher + tracing
- bin/hermes-sm and bin/hermes-smd registered in package.json
Centralizes token estimation across 14 files through
src/core/cache/token-estimator.ts and packages/sdk/src/token-estimator.ts.
Lazy-loads cl100k_base encoder with char/4 fallback if WASM fails.

Also ports context-budget hook to codex-sm exit handler for
compact/restart nudges matching Claude Code behavior.
Skills are often prompt text (content) not code — content licenses like
CC-BY-4.0 fit better than MIT for these. Adds KnownLicenseSchema enum
with both code (MIT, Apache-2.0, ISC, BSD) and content (CC-BY-4.0,
CC-BY-SA-4.0, CC0-1.0) licenses while keeping the field open for custom
SPDX identifiers.
Markdown table parser + CLI commands + MCP tools for local-first task
steering. Tasks live in master-tasks.md, optionally sync to Linear/GH.

- Parser: parse/serialize/update/add/getNext for pipe-delimited md tables
- CLI: stackmemory tasks init/md list/md next/md add/md update
- MCP: get_next_master_task, update_master_task, create_master_task
- 19 tests covering parse, round-trip, priority sorting, file ops
…rm, script-suggest

- dedup-reads: escalate to [STOP] at 5+ reads (was soft-only at 3+)
- desire-path-hook: auto-route Bash→Glob/Read/Grep with inline suggestions
- prewarm-tools: SessionStart hook emits top deferred tool pre-fetch hint
- script-suggest: detects multi-tool patterns matching existing scripts
Replays 7,589 action-stream entries through hook logic.
Result: 324K token savings projected (22% waste reduction).
Emits reminder when >7 days since last mine and new suggested
skills exist. Points to /workflow-skill-miner.
…lete

- Skip patterns with <2 unique tools or <3 steps in generateSuggestions()
- Add cleanTrivialSuggestion() to auto-delete stale suggestion files
- Remove duplicate quality gate from autoPromote (filtering now upstream)
- Prevents bloat: 19/20 current patterns were trivial (git×2, Edit×3, etc.)
…/UI tasks

Design tasks bypass subscription-first cascade (Codex→Grok→API) and route
directly to Claude CLI, which excels at creative UI/UX decisions.

- Add 'design' task type to SubagentRequest, TaskType, ModelRouterConfig
- Add forceProvider field for explicit provider override
- Add design prompt (opinionated, production-ready, no-ask-just-decide)
- Add delegateDesign() convenience method for wrapper CLIs
Consolidate from StackMemory-specific config to a generic agent
reference covering stack, structure, commands, and key patterns.
Filter action-stream by current project directory so prewarm
suggestions are scoped to the repo you're working in. Falls back
to global stats when no project-specific data exists.
Reads MEMORY.md index at session start, scores entries by relevance
to current project context, and surfaces the most useful memories.
image-preprocess: PreToolUse hook that intercepts Read calls on image
files and routes them through vision-capable models.

image-extract-mcp: stdio MCP server providing a describe_image tool
for text extraction from images via vision model APIs.
daemon.js: persistent file watcher for all GEPA targets — triggers
optimization on CLAUDE.md changes.

Session hook and .before-optimize baseline updated for current
optimization state.
# Conflicts:
#	src/daemon/services/desire-path-service.ts
#	src/hooks/prewarm-tools.cjs
…harness

End-to-end integration bridging Stagehand browser automation with
StackMemory's desire-path system for workflow discovery and replay.

- StagehandWorkflowCapture: wraps act/extract/observe, emits to action-stream
- WorkflowCache: persists workflows, bridges to desire-path patterns.json
- WorkflowReplayer: cached (0 tokens), AI (self-healing), hybrid modes
- WorkflowBenchmark: compare stagehand-ai vs cached vs playwright-code
- 4 MCP tools: workflow_list, workflow_get, workflow_replay, workflow_benchmarks
- Benchmark script with 3 test workflows (GitHub, HN, NPM)

Stagehand is a peer dependency — not required for core functionality.
CliBrowserAgent: Playwright + claude/codex CLI hybrid that routes
AI understanding through subscription CLIs instead of direct API.
Falls back to Anthropic API when CLI hooks interfere.

- Playwright handles browser control (fast, deterministic)
- claude --print / codex -q handles extraction/action interpretation
- Results cached locally for zero-LLM replay on subsequent runs
- Benchmark script supports --cli mode vs --api mode

Known: claude --print triggers SessionEnd hooks that timeout.
TODO: fix hook interference or add CLAUDE_CODE_SKIP_HOOKS support.
When DISABLE_HOOKS=1 env var is set, all session lifecycle hooks
exit immediately. Prevents timeout/cancellation when claude --print
is invoked as a subprocess (e.g., from CliBrowserAgent).

Hooks patched: chime-on-stop.sh, stop-checkpoint.js, session-rescue.sh,
wiki-update.js, token-meter-finalize.js, gepa-session-hook.js
claude --print produces valid output before SessionEnd hooks fire.
Exit 143 from hook cancellation shouldn't reject — check stdout
content instead of exit code.
StackMemory Bot (CLI) added 28 commits May 27, 2026 17:29
- cd-thrash-guard: warns on 3+ cd commands in 10 tool calls
- linear-dedup: detects duplicate Linear API calls within 60s
- bash-dominance-guard: suggests Read/Grep/Glob/Edit over Bash equivalents
Distributable investigation skill teaching agents to debug production
issues from structured wide-format logs. Encodes the tenant-context +
domain-extras + timeline-reconstruction pattern.
Token estimator tests assumed ceil(length/4) heuristic but implementation
now uses js-tiktoken (cl100k_base). Subagent routing tests failed because
codex CLI is installed locally, causing isCodexAvailable() to short-circuit
multiProvider/batch/Kimi paths — fixed by spying on private methods.
Every MCP tool call now emits begin/finish traces to Raindrop Workshop
when RAINDROP_LOCAL_DEBUGGER env is set. Conditional — zero overhead
when env var is absent. Flush on SIGINT shutdown.
Replace async IIFE wrapping the tool dispatch switch with a named
handleTool() function. Removes one indentation level from ~500 lines,
makes the Raindrop tracing wrapper cleaner, and shrinks the bundle.
Three adapter modes behind a unified ScreenAdapter interface:
- TmuxAdapter: reads pane buffer as text, sends keys (CLI, no API needed)
- DesktopAdapter: macOS screenshots + AppleScript + Haiku VLM
- BrowserAdapter: Playwright DOM reads for claude.ai/code web app

Rule-based state machine detects: IDLE, WORKING, PERMISSION_PROMPT,
ERROR, RATE_LIMITED, STUCK, SESSION_ENDED from screen content.
LLM decision layer (Haiku) handles ambiguous states and generates
nudges for stuck sessions instead of blind restarts.

CLI: stackmemory operator start/stop/status/attach
- Drains master-tasks.md queue overnight on Max plan
- Auto-approves permission prompts, exponential backoff on rate limits
- JSONL logging + checkpoint file for monitoring
- 31 tests passing
- STUCK now nudges twice before marking blocked (was: immediate block)
- Track nudgeCount per task in checkpoint, reset on new task
- Fix ScreenshotAdapter adapterType to match union type
- Add index.ts barrel export for clean imports
- 32 tests passing
…→ apply)

Repackages the external instincts/continuous-learning concept into
StackMemory as a native feature backed by SQLite.

New modules:
- PatternStore: CRUD with confidence scoring, decay, pruning
- PatternObserver: extracts patterns from trace events at session end
  (tool sequences, error→fix pairs, tool preferences)
- PatternApplier: surfaces relevant patterns in context retrieval
- CLI: stackmemory patterns list|learn|stats|prune|export|import

Schema: adds `patterns` table with domain, trigger, action,
confidence scoring (0.3→0.85 based on observation count),
project-scoping, and weekly confidence decay.

16 tests passing.
…arity

Fills 3 feature gaps vs continuous-learning-v2:
- promote: project→global (manual or auto-detect cross-project patterns)
- projects: list projects with pattern counts
- evolve: cluster related patterns, identify skill/command candidates

PatternStore gains: promote(), projects(), promotionCandidates(), findClusters()

CL-v2 for stackmemory is now retired — observer was already disabled,
no hooks registered. Patterns system is the native replacement.
Tests already skipped when OPENROUTER_API_KEY env var is unset, but
still failed with 401 when the key existed but was expired or invalid.
Now catch auth errors (401/403) and skip gracefully via ctx.skip().
Replace in-memory webhook event queue with SQLite-backed delivery queue.
Failed webhook events are now retried up to 5 times with exponential
backoff (1s, 2s, 4s, 8s, 16s, capped at 300s) and jitter. Deliveries
persist across process restarts.

- WebhookDeliveryQueue: SQLite persistence, atomic claim, background worker
- Reuses existing calculateBackoff() from core/errors/recovery.ts
- Health endpoint now returns delivery stats by status
- 11 tests covering enqueue, retry, dead letter, concurrency, stats
- Fix prettier formatting in hermes-sm, daemon-config, research-stream-service, telemetry-service
- Add eslint-disable for dynamic require() in operator, adapter-factory
- Replace inline require() with top-level imports in operator-logger, subagent-client
Redis is optional (only needed for remote tier). The hard import
crashed when redis wasn't installed, causing test suite failures.
Now uses lazy require() with graceful fallback.
Ensures every bash script that invokes stackmemory or node loads
the correct Node version from .nvmrc before executing. Prevents
better-sqlite3 NODE_MODULE_VERSION errors when hooks, wrappers,
or daemons run in contexts without nvm initialized (git GUIs,
IDE integrations, background processes, subprocesses).
@jonathanpeterwu jonathanpeterwu merged commit 4d22836 into main Jun 9, 2026
3 of 6 checks passed
@jonathanpeterwu jonathanpeterwu deleted the feature/webhook-retry-backoff branch June 9, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant