Skip to content
@RemoteMedia-SDK

RemoteMedia-SDK

RemoteMedia SDK

A multi-runtime, multi-transport pipeline framework for real-time media.

Build streaming audio/video/text pipelines that run locally with native Rust performance — or offload to remote servers and Docker containers seamlessly. Same manifest, four transports (gRPC, HTTP/SSE, WebRTC, FFI), three runtimes (native Rust, multiprocess Python, WASM).


Featured: Real-Time Avatar Speech-to-Speech

End-to-end avatar pipeline running entirely on a workstation:

mic → VAD → Whisper STT → Qwen3.6-27B → Kokoro TTS → Character-Creator avatar with lipsync, streamed over WebRTC. All body movements are diffusion-generated on the fly by KimodoMotionNode — the LLM emits a text-prompt tool call ("a person waves the right hand") and Kimodo's motion-diffusion model synthesizes per-frame SMPL skeletal poses live.

avatar-demo.mp4

Nothing pre-recorded, scripted, or keyframed. See scripts/run-avatar-s2s.sh in the main repo.


What's distinctive

  • Native Rust audio nodes — VAD, resampling, Whisper STT run 2–16× faster than Python equivalents.
  • Zero-copy IPC — process-isolated Python nodes communicate via iceoryx2 shared memory; no serialization, no socket overhead.
  • Transport-agnostic manifests — the same pipeline JSON runs over gRPC, HTTP/SSE, WebRTC, or Python FFI without changes.
  • Capability resolution — automatic format negotiation (sample rate, channels, dtype) at pipeline construction time, with actionable error messages and auto-inserted converters where possible.
  • Loadable plugins — ship a node as a single .so/.dylib/.dll with abi_stable FFI; works across host-vs-plugin dep-version conflicts (e.g. ort=rc.10 plugin against ort=rc.12 host).
  • Per-node managed Python venvs — declare deps via PEP 723 + @python_requires; the runtime provisions a content-addressed uv venv per node so transformers<5.0 and transformers>=5.0 nodes coexist in the same pipeline.

Quick start

# CLI — fastest path
cargo install --git https://github.com/RemoteMedia-SDK/remotemedia-sdk \
              --path examples/cli/remotemedia-cli
remotemedia nodes list                     # 41+ built-in nodes
remotemedia run pipeline.yaml -i audio.wav

# Python
pip install remotemedia
python -c "import remotemedia; print(remotemedia.is_rust_runtime_available())"

# Node.js
npm install @matbee/remotemedia-native

Full reference in remotemedia-sdk's main README.

Repository map

Core SDK

Repo Purpose
remotemedia-sdk Main monorepo — Rust runtime, Python/Node clients, gRPC server, CLI, transports, examples.

Loadable plugins (single-.so distribution)

Repo Backend What it ships
lfm25-audio-onnx ONNX Runtime LFM2.5-Audio-1.5B speech-to-speech (Path 3 dep-conflict isolation).
moss-tts-realtime HuggingFace Transformers MOSS-TTS realtime streaming TTS with vendored sub-package.
whisper Candle OpenAI Whisper STT, dual-emit loadable.
silero-vad ONNX Runtime Silero voice-activity detection.
speaker-diarization Pyannote Speaker diarization.
pyannote-rs Workspace fork pinned to ort=rc.12 (host-compatible).
llama-cpp llama.cpp GGUF LLM generation.
whisper-to-vad Composite Whisper+VAD chain.

Avatar / rendering

Repo Purpose
audio2face NVIDIA Audio2Face lipsync integration.
cc-render Character Creator (CC4/CC5) avatar renderer with wgpu.
live2d-render Live2D Cubism avatar renderer.

Plugin templates

Repo Purpose
echo-python-loadable Smallest valid Python loadable plugin — copy as starting point.
echo-python-source Source-node variant for input-driving plugins.

Documentation

Start here if you're new:

Reference docs (in the main repo):

Requirements

Component Version
Rust 1.87+
Python 3.10+ (for Python nodes)
Node.js 18+ (for JS/TS clients)
FFmpeg 7.x (for media I/O)
CUDA / cuDNN 12.x / 9 (optional, for GPU nodes)

License

Apache 2.0 for the open-source runtime + plugins. Commercial license available — see COMMERCIAL-LICENSE.md in the main repo.

Contributing

Issues, discussions, and pull requests are welcome on each repo. For substantial changes, please open an issue first to discuss approach — most non-trivial work goes through an OpenSpec change proposal.

Popular repositories Loading

  1. remotemedia-sdk remotemedia-sdk Public

    Rust

  2. echo-python-source echo-python-source Public

    Echo node shipped as a source-load Python plugin (plugin.toml + .py, no Rust)

    Python

  3. silero-vad silero-vad Public

    Silero VAD as a RemoteMedia SDK loadable plugin (Path 4 cdylib)

    Rust

  4. whisper whisper Public

    Whisper STT as a RemoteMedia SDK loadable plugin (Path 4 cdylib)

    Rust

  5. pyannote-rs pyannote-rs Public

    pyannote.audio as a RemoteMedia SDK dependency-isolated loadable plugin (Path 3)

    Rust

  6. moss-tts-realtime moss-tts-realtime Public

    MOSS-TTS-Realtime as a RemoteMedia SDK Python plugin (Path 4, cdylib + embedded Python)

    Python

Repositories

Showing 10 of 19 repositories

Top languages

Loading…

Most used topics

Loading…