Skip to content

Conversation

@alexluong
Copy link
Collaborator

@alexluong alexluong commented Jan 22, 2026

Moves event fetching from the delivery flow to the retry flow.

What changed

  • Retry scheduler now fetches full event data from logstore before publishing to delivery queue
  • Delivery handler no longer needs logstore dependency or retry detection logic

Why this design

Scheduled retry tasks only store event IDs (not full payloads) to keep queue messages small. Previously, the delivery handler had to detect retries via event.Time.IsZero() and conditionally fetch event data.

Moving this to the retry scheduler is cleaner because:

  1. The retry flow inherently knows it's handling retries - no heuristic detection needed
  2. Fails earlier - missing events are caught before publishing to delivery queue
  3. Removes logstore dependency from delivery handler - delivery handler always receives complete data and no longer needs to know about or access logstore. It simply processes what it's given, regardless of message origin.

Both entry points now consistently provide full event data to the delivery queue:

Flow Event data fetched at
Initial delivery Already has full event
Scheduled retry Retry scheduler (before publish)
Manual retry API handler (before publish)

@vercel
Copy link

vercel bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
outpost-docs Ready Ready Preview, Comment Jan 26, 2026 5:39pm
outpost-website Ready Ready Preview, Comment Jan 26, 2026 5:39pm

Request Review

Copy link
Contributor

@alexbouchardd alexbouchardd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor 👍

@alexluong alexluong force-pushed the model-delivery-event branch from 04c3ebd to 9277a3f Compare January 26, 2026 13:11
alexluong and others added 10 commits January 27, 2026 00:37
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends TestRetryDelivery to verify that the manual retry API publishes
a DeliveryTask with complete event data to deliveryMQ. This ensures
the deliverymq handler receives full event data for manual retries,
consistent with the scheduled retry flow which fetches event data
in the retry scheduler.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add unit and e2e tests verifying that retries are not lost when the
retry scheduler queries logstore before the event has been persisted.

Test scenario:
1. Initial delivery fails, retry is scheduled
2. Retry scheduler queries logstore for event data
3. Event is not yet persisted (logmq batching delay)
4. Retry should remain in queue and be reprocessed later

Tests added:
- TestRetryScheduler_RaceCondition_EventNotYetPersisted (unit test)
- TestE2E_Regression_RetryRaceCondition (e2e test)

Also adds:
- RetryVisibilityTimeoutSeconds config option
- WithRetryVisibilityTimeout scheduler option
- mockDelayedEventGetter for simulating delayed persistence

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Return error instead of nil so the retry message stays in queue and
will be reprocessed after the visibility timeout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@alexluong alexluong force-pushed the deliverymq-logstore-deps branch from eb51fdb to b701223 Compare January 26, 2026 17:38
@alexluong alexluong merged commit da4bd25 into model-delivery-event Jan 26, 2026
2 of 4 checks passed
@alexluong alexluong deleted the deliverymq-logstore-deps branch January 26, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants