Add live audio transcription streaming support to Foundry Local C# SDK#485
Open
Add live audio transcription streaming support to Foundry Local C# SDK#485
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
added 2 commits
March 10, 2026 18:09
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new C# SDK API for live/streaming audio transcription sessions (push PCM chunks, receive incremental/final text results) and includes a Windows microphone demo sample.
Changes:
- Introduces
LiveAudioTranscriptionSession+ result/error types for streaming ASR over Core interop. - Extends Core interop to support audio stream start/push/stop (including binary payload routing).
- Adds a
samples/cs/LiveAudioTranscriptiondemo project and updates the audio client factory API.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk_v2/cs/test/FoundryLocal.Tests/Utils.cs | Replaced prior test utilities with ad-hoc top-level streaming harness code (currently breaks test build). |
| sdk_v2/cs/test/FoundryLocal.Tests/ModelTests.cs | Adds trailing blank lines (formatting noise). |
| sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionTypes.cs | Adds LiveAudioTranscriptionResult and a structured Core error type. |
| sdk_v2/cs/src/OpenAI/LiveAudioTranscriptionClient.cs | Adds LiveAudioTranscriptionSession implementation (channels, retry, stop semantics). |
| sdk_v2/cs/src/OpenAI/AudioClient.cs | Adds CreateLiveTranscriptionSession() and removes the public file streaming transcription API. |
| sdk_v2/cs/src/Detail/JsonSerializationContext.cs | Registers new audio streaming types for source-gen JSON. |
| sdk_v2/cs/src/Detail/ICoreInterop.cs | Adds interop structs + methods for audio stream start/push/stop. |
| sdk_v2/cs/src/Detail/CoreInterop.cs | Implements binary command routing via execute_command_with_binary and start/stop routing via execute_command. |
| sdk_v2/cs/src/AssemblyInfo.cs | Adds InternalsVisibleTo("AudioStreamTest"). |
| samples/cs/LiveAudioTranscription/README.md | Documentation for the live transcription demo sample. |
| samples/cs/LiveAudioTranscription/Program.cs | Windows microphone demo using NAudio + new session API. |
| samples/cs/LiveAudioTranscription/LiveAudioTranscription.csproj | Adds sample project dependencies and references the SDK project (path currently incorrect). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
+36
to
+38
| // --- NEW: Audio streaming types --- | ||
| [JsonSerializable(typeof(LiveAudioTranscriptionResult))] | ||
| [JsonSerializable(typeof(CoreErrorResponse))] |
| <Project Sdk="Microsoft.NET.Sdk"> | ||
|
|
||
| <ItemGroup> | ||
| <ProjectReference Include="..\..\sdk_v2\cs\src\Microsoft.AI.Foundry.Local.csproj" /> |
Comment on lines
+55
to
+56
|
|
||
|
|
| var streamingClient = audioClient.CreateLiveTranscriptionSession(); | ||
| streamingClient.Settings.SampleRate = 16000; | ||
| streamingClient.Settings.Channels = 1; | ||
| streamingClient.Settings.BitsPerSample = 16; |
Comment on lines
+98
to
101
|
|
||
| private async IAsyncEnumerable<AudioCreateTranscriptionResponse> TranscribeAudioStreamingImplAsync( | ||
| string audioFilePath, [EnumeratorCancellation] CancellationToken ct) | ||
| { |
Comment on lines
+9
to
+14
| using System.Runtime.CompilerServices; | ||
| using System.Runtime.InteropServices; | ||
| using System.Globalization; | ||
| using System.Threading.Channels; | ||
| using Microsoft.AI.Foundry.Local.Detail; | ||
| using Microsoft.Extensions.Logging; |
| /// | ||
| /// Created via <see cref="OpenAIAudioClient.CreateLiveTranscriptionSession"/>. | ||
| /// | ||
| /// Thread safety: PushAudioAsync can be called from any thread (including high-frequency |
Comment on lines
+10
to
11
| [assembly: InternalsVisibleTo("AudioStreamTest")] | ||
| [assembly: InternalsVisibleTo("DynamicProxyGenAssembly2")] // for Mock of ICoreInterop |
Comment on lines
+1
to
+18
| using Microsoft.AI.Foundry.Local; | ||
| using Microsoft.Extensions.Logging; | ||
|
|
||
| using Microsoft.VisualStudio.TestPlatform.TestHost; | ||
| var loggerFactory = LoggerFactory.Create(b => b.AddConsole().SetMinimumLevel(LogLevel.Debug)); | ||
| var logger = loggerFactory.CreateLogger("AudioStreamTest"); | ||
|
|
||
| using Moq; | ||
| // Point to the directory containing Core + ORT DLLs | ||
| var corePath = @"C:\Users\ruiren\Desktop\audio-stream-test\Microsoft.AI.Foundry.Local.Core.dll"; | ||
|
|
||
| internal static class Utils | ||
| var config = new Configuration | ||
| { | ||
| internal struct TestCatalogInfo | ||
| { | ||
| internal readonly List<ModelInfo> TestCatalog { get; } | ||
| internal readonly string ModelListJson { get; } | ||
|
|
||
| internal TestCatalogInfo(bool includeCuda) | ||
| { | ||
|
|
||
| TestCatalog = Utils.BuildTestCatalog(includeCuda); | ||
| ModelListJson = JsonSerializer.Serialize(TestCatalog, JsonSerializationContext.Default.ListModelInfo); | ||
| } | ||
| } | ||
|
|
||
| internal static readonly TestCatalogInfo TestCatalog = new(true); | ||
|
|
||
| [Before(Assembly)] | ||
| public static void AssemblyInit(AssemblyHookContext _) | ||
| { | ||
| using var loggerFactory = LoggerFactory.Create(builder => | ||
| { | ||
| builder | ||
| .AddConsole() | ||
| .SetMinimumLevel(LogLevel.Debug); | ||
| }); | ||
|
|
||
| ILogger logger = loggerFactory.CreateLogger<Program>(); | ||
|
|
||
| // Read configuration from appsettings.Test.json | ||
| logger.LogDebug("Reading configuration from appsettings.Test.json"); | ||
| var configuration = new ConfigurationBuilder() | ||
| .SetBasePath(Directory.GetCurrentDirectory()) | ||
| .AddJsonFile("appsettings.Test.json", optional: true, reloadOnChange: false) | ||
| .Build(); | ||
|
|
||
| var testModelCacheDirName = "test-data-shared"; | ||
| string testDataSharedPath; | ||
| if (Path.IsPathRooted(testModelCacheDirName) || | ||
| testModelCacheDirName.Contains(Path.DirectorySeparatorChar) || | ||
| testModelCacheDirName.Contains(Path.AltDirectorySeparatorChar)) | ||
| { | ||
| // It's a relative or complete filepath, resolve from current directory | ||
| testDataSharedPath = Path.GetFullPath(testModelCacheDirName); | ||
| } | ||
| else | ||
| { | ||
| // It's just a directory name, combine with repo root parent | ||
| testDataSharedPath = Path.GetFullPath(Path.Combine(GetRepoRoot(), "..", testModelCacheDirName)); | ||
| } | ||
|
|
||
| logger.LogInformation("Using test model cache directory: {testDataSharedPath}", testDataSharedPath); | ||
|
|
||
| if (!Directory.Exists(testDataSharedPath)) | ||
| { | ||
| throw new DirectoryNotFoundException($"Test model cache directory does not exist: {testDataSharedPath}"); | ||
|
|
||
| } | ||
|
|
||
| var config = new Configuration | ||
| { | ||
| AppName = "FoundryLocalSdkTest", | ||
| LogLevel = Local.LogLevel.Debug, | ||
| Web = new Configuration.WebService | ||
| { | ||
| Urls = "http://127.0.0.1:0" | ||
| }, | ||
| ModelCacheDir = testDataSharedPath, | ||
| LogsDir = Path.Combine(GetRepoRoot(), "sdk_v2", "cs", "logs") | ||
| }; | ||
|
|
||
| // Initialize the singleton instance. | ||
| FoundryLocalManager.CreateAsync(config, logger).GetAwaiter().GetResult(); | ||
|
|
||
| // standalone instance for testing individual components that skips the 'initialize' command | ||
| CoreInterop = new CoreInterop(logger); | ||
| } | ||
|
|
||
| internal static ICoreInterop CoreInterop { get; private set; } = default!; | ||
|
|
||
| internal static Mock<ILogger> CreateCapturingLoggerMock(List<string> sink) | ||
| { | ||
| var mock = new Mock<ILogger>(); | ||
| mock.Setup(x => x.Log( | ||
| It.IsAny<LogLevel>(), | ||
| It.IsAny<EventId>(), | ||
| It.IsAny<It.IsAnyType>(), | ||
| It.IsAny<Exception?>(), | ||
| (Func<It.IsAnyType, Exception?, string>)It.IsAny<object>())) | ||
| .Callback((LogLevel level, EventId id, object state, Exception? ex, Delegate formatter) => | ||
| { | ||
| var message = formatter.DynamicInvoke(state, ex) as string; | ||
| sink.Add($"{level}: {message}"); | ||
| }); | ||
|
|
||
| return mock; | ||
| } | ||
|
|
||
| internal sealed record InteropCommandInterceptInfo | ||
| { | ||
| public string CommandName { get; init; } = default!; | ||
| public string? CommandInput { get; init; } | ||
| public string ResponseData { get; init; } = default!; | ||
| public string? ResponseError { get; init; } | ||
| } | ||
|
|
||
| internal static Mock<ICoreInterop> CreateCoreInteropWithIntercept(ICoreInterop coreInterop, | ||
| List<InteropCommandInterceptInfo> intercepts) | ||
| { | ||
| var mock = new Mock<ICoreInterop>(); | ||
| var interceptNames = new HashSet<string>(StringComparer.InvariantCulture); | ||
|
|
||
| foreach (var intercept in intercepts) | ||
| { | ||
| if (!interceptNames.Add(intercept.CommandName)) | ||
| { | ||
| throw new ArgumentException($"Duplicate intercept for command {intercept.CommandName}"); | ||
| } | ||
|
|
||
| mock.Setup(x => x.ExecuteCommand(It.Is<string>(s => s == intercept.CommandName), It.IsAny<CoreInteropRequest?>())) | ||
| .Returns(new ICoreInterop.Response | ||
| { | ||
| Data = intercept.ResponseData, | ||
| Error = intercept.ResponseError | ||
| }); | ||
|
|
||
| mock.Setup(x => x.ExecuteCommandAsync(It.Is<string>(s => s == intercept.CommandName), | ||
| It.IsAny<CoreInteropRequest?>(), | ||
| It.IsAny<CancellationToken?>())) | ||
| .ReturnsAsync(new ICoreInterop.Response | ||
| { | ||
| Data = intercept.ResponseData, | ||
| Error = intercept.ResponseError | ||
| }); | ||
| } | ||
|
|
||
| mock.Setup(x => x.ExecuteCommand(It.Is<string>(s => !interceptNames.Contains(s)), | ||
| It.IsAny<CoreInteropRequest?>())) | ||
| .Returns((string commandName, CoreInteropRequest? commandInput) => | ||
| coreInterop.ExecuteCommand(commandName, commandInput)); | ||
|
|
||
| mock.Setup(x => x.ExecuteCommandAsync(It.Is<string>(s => !interceptNames.Contains(s)), | ||
| It.IsAny<CoreInteropRequest?>(), | ||
| It.IsAny<CancellationToken?>())) | ||
| .Returns((string commandName, CoreInteropRequest? commandInput, CancellationToken? ct) => | ||
| coreInterop.ExecuteCommandAsync(commandName, commandInput, ct)); | ||
|
|
||
| return mock; | ||
| } | ||
|
|
||
| internal static bool IsRunningInCI() | ||
| AppName = "AudioStreamTest", | ||
| LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Debug, | ||
| AdditionalSettings = new Dictionary<string, string> | ||
| { | ||
| var azureDevOps = Environment.GetEnvironmentVariable("TF_BUILD"); | ||
| var githubActions = Environment.GetEnvironmentVariable("GITHUB_ACTIONS"); | ||
| var isCI = string.Equals(azureDevOps, "True", StringComparison.OrdinalIgnoreCase) || | ||
| string.Equals(githubActions, "true", StringComparison.OrdinalIgnoreCase); | ||
|
|
||
| return isCI; | ||
| { "FoundryLocalCorePath", corePath } | ||
| } | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here's the updated PR description based on the latest changes (renamed types, CoreInterop routing fix, mermaid updates):
Title: Add live audio transcription streaming support to Foundry Local C# SDK
Description:
Adds real-time audio streaming support to the Foundry Local C# SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).
The existing
OpenAIAudioClientonly supports file-based transcription. This PR introducesLiveAudioTranscriptionSessionthat accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.What's included
New files
src/OpenAI/LiveAudioTranscriptionClient.cs— Streaming session withStartAsync(),AppendAsync(),GetTranscriptionStream(),StopAsync()src/OpenAI/LiveAudioTranscriptionTypes.cs—LiveAudioTranscriptionResultandCoreErrorResponsetypesModified files
src/OpenAI/AudioClient.cs— AddedCreateLiveTranscriptionSession()factory methodsrc/Detail/ICoreInterop.cs— AddedStreamingRequestBufferstruct,StartAudioStream,PushAudioData,StopAudioStreaminterface methodssrc/Detail/CoreInterop.cs— Routes audio commands through existingexecute_command/execute_command_with_binarynative entry points (no separate audio exports needed)src/Detail/JsonSerializationContext.cs— RegisteredLiveAudioTranscriptionResultfor AOT compatibilitytest/FoundryLocal.Tests/Utils.cs— Updated to useCreateLiveTranscriptionSession()Documentation
API surface
Design highlights
Channel<T>serializes audio pushes from any thread (safe for mic callbacks) with backpressureStartAsync()and immutable during the sessionStopAsyncalways calls native stop even if cancelled, preventing native session leaksCancellationTokenSource, decoupled from the caller's tokenStartAudioStreamandStopAudioStreamroute throughexecute_command;PushAudioDataroutes throughexecute_command_with_binary— no new native entry points requiredCore integration (neutron-server)
The Core side (AudioStreamingSession.cs) uses
StreamingProcessor+Generator+Tokenizer+TokenizerStreamfrom onnxruntime-genai to perform real-time RNNT decoding. The native commands (audio_stream_start/push/stop) are handled as cases inNativeInterop.ExecuteCommandManaged/ExecuteCommandWithBinaryManaged.Verified working
StreamingProcessorpipeline verified with WAV file (correct transcript)TranscribeChunkbyte[] PCM path matches reference float[] path exactly