Skip to content

Comments

.NET: Fixing issue where OpenTelemetry span is never exported in .NET in-process workflow execution#4196

Open
alliscode wants to merge 1 commit intomicrosoft:mainfrom
alliscode:investigation/issue-4155-1771877003
Open

.NET: Fixing issue where OpenTelemetry span is never exported in .NET in-process workflow execution#4196
alliscode wants to merge 1 commit intomicrosoft:mainfrom
alliscode:investigation/issue-4155-1771877003

Conversation

@alliscode
Copy link
Member

@alliscode alliscode commented Feb 23, 2026

This pull request addresses the issue where workflow run telemetry spans (Activity objects) were not always properly stopped and exported, particularly in streaming and lockstep execution environments. The changes ensure that workflow run activities are disposed as soon as the workflow reaches the idle state or when the run loop exits, preventing telemetry data from being lost. Additionally, comprehensive regression tests are added to verify correct activity lifecycle management.

Improvements to Activity Lifecycle Management:

  • Ensured that the workflow.run Activity is disposed immediately when the workflow reaches the Idle state, so telemetry spans are promptly exported rather than waiting for cancellation or disposal.
  • Added a safety net to dispose of the workflow.run Activity if it was not already stopped when the run loop exits, covering cancellation and error scenarios.
  • Removed the using statement from the activity initialization to allow manual control over the activity's disposal timing.

Testing and Regression Coverage:

  • Added a new test file WorkflowRunActivityStopTests.cs to verify that workflow run activities are always properly stopped and exported to telemetry backends, covering lockstep, off-thread, and streaming execution environments, as well as ensuring that all started activities are stopped.

Closes #4155

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

…ity never stopped in streaming OffThread path

The WorkflowRunActivity_IsStopped_Streaming_OffThread test demonstrates that
the workflow.run OpenTelemetry Activity created in StreamingRunEventStream.RunLoopAsync
is started but never stopped when using the OffThread/Default streaming execution.
The background run loop keeps running after event consumption completes, so the
using Activity? declaration never disposes until explicit StopAsync() is called.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2. Fix workflow.run Activity never stopped in streaming OffThread execution (microsoft#4155)

The workflow.run OpenTelemetry Activity in StreamingRunEventStream.RunLoopAsync
was scoped to the method lifetime via 'using'. Since the run loop only exits on
cancellation, the Activity was never stopped/exported until explicit disposal.

Fix: Remove 'using' and explicitly dispose the Activity when the workflow reaches
Idle status (all supersteps complete). A safety-net disposal in the finally block
handles cancellation and error paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 23, 2026 22:10
@markwallace-microsoft markwallace-microsoft added .NET workflows Related to Workflows in agent-framework labels Feb 23, 2026
@github-actions github-actions bot changed the title Fixing issue where OpenTelemetry span is never exported in .NET in-process workflow execution .NET: Fixing issue where OpenTelemetry span is never exported in .NET in-process workflow execution Feb 23, 2026
await this._eventChannel.Writer.WriteAsync(new InternalHaltSignal(currentEpoch, capturedStatus), linkedSource.Token).ConfigureAwait(false);

// Stop the workflow.run Activity when the workflow reaches Idle so the span is
// exported to telemetry backends immediately, rather than waiting for the run loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can determine when the spans are actually sent to the backend, but we have to properly close them. I recommend we adjust this comment.

// Stop the workflow.run Activity when the workflow reaches Idle so the span is
// exported to telemetry backends immediately, rather than waiting for the run loop
// to be cancelled/disposed.
if (activity is not null && capturedStatus == RunStatus.Idle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only status that the workflow will wait for another run call?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure OpenTelemetry workflow-run spans (Activity) are reliably stopped/disposed (and therefore exported) during .NET in-process workflow execution, including streaming scenarios, and adds regression tests around activity lifecycle behavior.

Changes:

  • Updated StreamingRunEventStream.RunLoopAsync to manually manage the workflow-run Activity lifecycle (stop on Idle and ensure disposal on loop exit).
  • Added WorkflowRunActivityStopTests to assert workflow-run activities are started and stopped across multiple execution modes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs Changes workflow-run Activity disposal timing to stop/export spans earlier and adds a safety-net disposal on exit.
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/WorkflowRunActivityStopTests.cs Adds regression coverage validating workflow-run activities are stopped/disposed in lockstep, off-thread, and streaming usage.

Comment on lines +96 to +104
// Stop the workflow.run Activity when the workflow reaches Idle so the span is
// exported to telemetry backends immediately, rather than waiting for the run loop
// to be cancelled/disposed.
if (activity is not null && capturedStatus == RunStatus.Idle)
{
activity.AddEvent(new ActivityEvent(EventNames.WorkflowCompleted));
activity.Dispose();
activity = null;
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunLoopAsync disposes and nulls the workflow run Activity when the workflow reaches RunStatus.Idle, but the run loop continues and can process additional inputs (it calls WaitForInputAsync(...) and then sets _runStatus = RunStatus.Running). Since activity is never re-created after being set to null, subsequent turns will run without a workflow_invoke span, and child spans (e.g., executor.process) will lose their parent correlation. Consider creating a new workflow-run activity each time the run loop resumes from a halted state (and adding the corresponding WorkflowStarted/tags), or alternatively keep the activity open until the run is actually ended if the intent is to span the whole session.

Suggested change
// Stop the workflow.run Activity when the workflow reaches Idle so the span is
// exported to telemetry backends immediately, rather than waiting for the run loop
// to be cancelled/disposed.
if (activity is not null && capturedStatus == RunStatus.Idle)
{
activity.AddEvent(new ActivityEvent(EventNames.WorkflowCompleted));
activity.Dispose();
activity = null;
}
// Keep the workflow.run Activity open across Idle so that subsequent inputs
// processed by this run loop continue to be correlated under the same span.
// The Activity will be completed when the run loop is cancelled/disposed.

Copilot uses AI. Check for mistakes.
Comment on lines 57 to 59
using CancellationTokenSource errorSource = new();
CancellationTokenSource linkedSource = CancellationTokenSource.CreateLinkedTokenSource(errorSource.Token, cancellationToken);

Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linkedSource (created via CancellationTokenSource.CreateLinkedTokenSource) is never disposed. This can retain token registrations longer than needed. Wrap it in a using declaration or dispose it in the finally block alongside errorSource and the event unsubscription.

Copilot uses AI. Check for mistakes.
"workflow.run Activity should be stopped/disposed so it is exported to telemetry backends (issue #4155)");
}

/// <summary>
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests validate that the first workflow_invoke activity is stopped when the workflow becomes idle, but they don’t cover a common follow-up scenario: resuming the same StreamingRun/session with another input after reaching Idle and then streaming again. Adding a regression test that sends a second input after the first idle halt (and asserts a second run activity is started/stopped) would both document the intended multi-turn behavior and catch cases where the activity isn’t re-created after being disposed.

Suggested change
/// <summary>
/// <summary>
/// Verifies that a new workflow.run activity is started and stopped for each
/// streaming invocation, even when using the same workflow in a multi-turn pattern.
/// </summary>
[Fact]
public async Task WorkflowRunActivity_IsStopped_Streaming_OffThread_MultiTurn()
{
// Arrange
using var testActivity = new Activity("WorkflowRunStopTest_Streaming_OffThread_MultiTurn").Start();
var workflow = CreateWorkflow();
// Act - first streaming run
await using (StreamingRun run1 = await InProcessExecution.OffThread.RunStreamingAsync(workflow, "Hello, World!"))
{
await foreach (WorkflowEvent evt in run1.WatchStreamAsync())
{
// Consume all events from first turn
}
}
// Act - second streaming run (multi-turn scenario with same workflow)
await using (StreamingRun run2 = await InProcessExecution.OffThread.RunStreamingAsync(workflow, "Second turn!"))
{
await foreach (WorkflowEvent evt in run2.WatchStreamAsync())
{
// Consume all events from second turn
}
}
// Assert - two workflow.run activities should have been started
var startedWorkflowRuns = this._startedActivities
.Where(a => a.RootId == testActivity.RootId &&
a.OperationName.StartsWith(ActivityNames.WorkflowRun, StringComparison.Ordinal))
.ToList();
startedWorkflowRuns.Should().HaveCount(2,
"each streaming invocation should start its own workflow.run Activity");
// Assert - both workflow.run activities should have been stopped
var stoppedWorkflowRuns = this._stoppedActivities
.Where(a => a.RootId == testActivity.RootId &&
a.OperationName.StartsWith(ActivityNames.WorkflowRun, StringComparison.Ordinal))
.ToList();
stoppedWorkflowRuns.Should().HaveCount(2,
"each workflow.run Activity should be stopped/disposed so it is exported to telemetry backends in multi-turn scenarios");
}
/// <summary>

Copilot uses AI. Check for mistakes.
this._stepRunner.OutgoingEvents.EventRaised += OnEventRaisedAsync;

using Activity? activity = this._stepRunner.TelemetryContext.StartWorkflowRunActivity();
Activity? activity = this._stepRunner.TelemetryContext.StartWorkflowRunActivity();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using dispose the object for us in cases of error and cancellation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET workflows Related to Workflows in agent-framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: [Bug]: workflow.run OpenTelemetry span is never exported in .NET in-process workflow execution

3 participants