Fix process crash when testing Stderr#1449
Fix process crash when testing Stderr#1449ericstj wants to merge 15 commits intomodelcontextprotocol:mainfrom
Conversation
|
This may not be the root cause of the problem. We've now got dumps on both windows and linux. I'll have a look and see if the dumps reveal a better root-cause. |
|
Crashes fixed, but we have tests hanging. Could be due to the |
|
Yeah, WaitForExit is now hanging... I'm not convinced this is the right change. I'm wondering more "why aren't processes being cleaned up" and do we have a hang or test bug somewhere. Also - the fragility here where missing something means a crash seems busted. I think we should have a design that's resilient to not cleaning up external state. |
|
Found leaked StdioServerTransport objects in dumps, these kept handles open to StdIn and were causing threadpool starvation. -- at least that's the theory.
|
StdioServerTransport does not reliably close on linux, which causes a threadpool thread to be blocked on read. This can lead to threadpool starvation and application hangs. Since the tests in question do not require stdio transport, switch to using StreamServerTransport with null streams to avoid this issue.
src/ModelContextProtocol.Core/Client/StdioClientSessionTransport.cs
Outdated
Show resolved
Hide resolved
| if (!processRunning && HasExited(process)) | ||
| { | ||
| process.WaitForExit(); | ||
| } |
There was a problem hiding this comment.
Won't we now possibly miss some error messages being reported over stderr from the child process?
There was a problem hiding this comment.
The WaitForExit() that was removed from DisposeProcess was only called when !processRunning && HasExited(process) — i.e. when the process had already exited on its own before we got here. The stderr flushing for that case is now handled earlier in StdioClientSessionTransport.GetUnexpectedExitExceptionAsync (line 99-106), which calls WaitForExitAsync(ct) on .NET or WaitForExit(timeout) on Framework. So stderr is still flushed before we build the error message and completion details.
In the processRunning case (where we call KillTree), the old code never called WaitForExit() either — the !processRunning guard prevented it. So there's no behavior change for that path.
The reason for removing it from DisposeProcess: DisposeProcess is a static helper called from both the session transport cleanup and the ConnectAsync error path. The session transport cleanup already handles stderr flushing, and the ConnectAsync error path doesn't need it. Keeping the no-arg WaitForExit() in DisposeProcess risked hangs when called from contexts where the streams aren't fully set up.
There was a problem hiding this comment.
or WaitForExit(timeout) on Framework. So stderr is still flushed before we build the error message and completion details
WaitForExit(timeout) doesn't wait for EOF of the streams though, right?
There was a problem hiding this comment.
You're right — WaitForExit(timeout) does not wait for stream EOF.
I checked the .NET runtime source for WaitForExitAsync. It calls WaitUntilOutputEOF(cancellationToken) which does _error.EOF.WaitAsync(cancellationToken) — so it DOES wait for stream EOF, and the cancellation token IS respected for the stream-drain wait, not just the process-exit wait. So on .NET, WaitForExitAsync(ct) is strictly better than the old WaitForExit() — it flushes streams AND can be cancelled via CancelShutdown() if grandchild processes hold pipe handles open.
On .NET Framework there's no WaitForExitAsync, so we use WaitForExit(timeout) which doesn't guarantee stream flush. The risk of missing events is low in practice: by this point HasExited has returned true (line 93), the error handler is still attached and receiving events, and the ErrorDataReceived dispatching happens asynchronously — events that were already in the OS pipe buffer will still be delivered during the timeout window. We'd only miss events if there's a significant delay between the process exiting and the runtime dispatching buffered pipe data, which shouldn't happen when the process has cleanly exited. The alternative — no-arg WaitForExit() — has no cancellation mechanism on Framework and hangs indefinitely when grandchild processes hold pipe handles open.
When CleanupAsync is entered from ReadMessagesAsync (stdout EOF), the cancellation token is _shutdownCts.Token which hasn't been canceled yet (base.CleanupAsync runs later). If stderr EOF hasn't been delivered by the threadpool yet, WaitForExitAsync hangs indefinitely. Use a linked CTS with ShutdownTimeout to bound the wait. The process is already dead; we're just draining stderr pipe buffers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix #1448
Summary of fixes:
generating crash dumps that abort the parent test host. Wrapped the main logic in try/catch to report errors to
stderr instead.
propagate to the background ErrorDataReceived dispatch thread, crashing the process. Added try/catch with a
warning-level log.
ITestOutputHelper.WriteLine which throws after test completion. Stored handler delegates as named fields and detach
them during cleanup.
grandchild processes hold pipe handles open. Replaced with WaitForExitAsync(ct) on .NET and WaitForExit(timeout) on
.NET Framework.
caller returned without completing the message channel, causing await Completion to hang forever. The second caller
now calls SetDisconnected to ensure the channel completes.
instantiating StdioServerTransport, which permanently blocks a thread pool thread via Console.OpenStandardInput().
On 2-vCPU CI runners, multiple instances starved the pool. Changed these tests to use
StreamServerTransport(Stream.Null, Stream.Null)
@stephentoub