Fix race condition with continue-as-new by sophiatev · Pull Request #1303 · Azure/durabletask

sophiatev · 2026-02-23T21:58:11Z

Currently we have a subtle race condition that can occur if an orchestration attempts to continue-as-new. The flow is as follows

An orchestration continues-as-new with a new execution ID, and the TaskOrchestrationDispatcher calls CompleteTaskOrchestrationWorkItemAsync.
In the completion call, outbound messages are committed. Say one of these is a TaskScheduled event to start a new Activity.
The Activity completes and sends a TaskCompleted event back to the orchestration, all before the CompleteTaskOrchestrationWorkItemAsync has updated the orchestration's state in storage to reflect the new execution ID.
A call to LockNextTaskOrchestrationWorkItemAsync is made which retrieves the TaskCompleted event. The TaskCompleted event is addressed to the new execution ID, but since the orchestration's state has not yet been updated in storage, there is no record or history for that execution ID. This call to determine out of order messages should detect that this is potentially an "out of order" TaskCompleted message, since the instance does "not yet exist". However, IsOutOfOrderMessage decides that the message is okay, because this [condition[(https://github.com/Azure/durabletask/blob/1b04239f01b8376e084d9b957bf15b546700dd64/src/DurableTask.AzureStorage/Messaging/OrchestrationSession.cs#L163) evaluates to true.
Later on, in the LockNextTaskOrchestrationWorkItemAsync method, when we attempt to retrieve information about this orchestration instance with the new execution ID in storage, we find none, and fail at this point. We delete the TaskCompleted event, which leaves the orchestration permanently stuck in a running state.

The core of the issue is that the checks in IsOutOfOrderMessage are not conservative enough. In this case, the checkpoint time of the session is indeed higher than that of the TaskCompleted event, because the orchestration's state is retrieved after the message is received. Just this one condition evaluating to true should not be enough to decide that the message is not out of order.

This PR changes the logic to be that if any of the conditions are met (this is a non-existent instance and the message hasn't yet been dequeued 5 times, the checkpoint is stale, or a scheduled event does not exist for a completion event), we treat the message as out of order.

Resolves #1302

…ommit any outbound messages

Copilot

Pull request overview

Fixes a race condition in the Azure Storage backend when extended sessions are enabled and an orchestration performs ContinueAsNew, where activity responses can arrive before the new execution ID is checkpointed, causing a stuck orchestration.

Changes:

Moves session.UpdateRuntimeState(runtimeState) earlier in CompleteTaskOrchestrationWorkItemAsync so the in-memory session reflects the new execution ID before outbound messages are committed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs

…sessions-race-condition

…o initial misdiagnosis, and changed the IsOutOfOrder logic instead

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/DurableTask.AzureStorage/Messaging/OrchestrationSession.cs

moved placement of session.UpdatedRuntimeState call to be before we c…

fe92118

…ommit any outbound messages

Copilot AI review requested due to automatic review settings February 23, 2026 21:58

Copilot started reviewing on behalf of sophiatev February 23, 2026 21:59 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs Outdated Show resolved Hide resolved

Sophia Tevosyan added 2 commits February 24, 2026 09:41

Merge branch 'main' into stevosyan/fix-continue-as-new-with-extended-…

5c2c45e

…sessions-race-condition

removed the unnecessary changes from the orchestration service, due t…

20846e8

…o initial misdiagnosis, and changed the IsOutOfOrder logic instead

Copilot AI review requested due to automatic review settings February 24, 2026 17:56

Copilot started reviewing on behalf of sophiatev February 24, 2026 17:57 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

sophiatev changed the title ~~Fix the race condition for continue-as-new with extended sessions enabled~~ Fix race condition with continue-as-new Feb 24, 2026

fixed the endlessly abandoning nonexistent instances bug

3909fe7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix race condition with continue-as-new#1303

Fix race condition with continue-as-new#1303
sophiatev wants to merge 4 commits intomainfrom
stevosyan/fix-continue-as-new-with-extended-sessions-race-condition

sophiatev commented Feb 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

sophiatev commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sophiatev commented Feb 23, 2026 •

edited

Loading