OpenSource TypeScript SDK for real-time voice vision AI apps

LLMRTC Docs is the official documentation for LLMRTC, a TypeScript SDK for building real-time voice and vision applications. The SDK unifies WebRTC-based audio/video streaming with large language models (LLMs), speech-to-text (STT), and text-to-speech (TTS) through a single, provider-agnostic API. It focuses on low-latency, bidirectional streaming suitable for interactive, conversational systems.
The documentation targets developers who need reliable infrastructure for conversational AI, including voice assistants, multimodal agents, and customer support flows. LLMRTC abstracts complex runtime concerns such as session management, provider orchestration, voice activity detection, and reconnection so teams can focus on application logic.
LLMRTC is organized into three packages that separate core logic, server capabilities, and browser integration:
| Package | Purpose |
|---|---|
@llmrtc/llmrtc-core | Shared types, orchestrators, tools, and hooks |
@llmrtc/llmrtc-backend | Node.js server with WebRTC, VAD, and provider integrations |
@llmrtc/llmrtc-web-client | Browser SDK for audio/video capture and playback |
At runtime, the system streams user audio to the server, detects speech boundaries with VAD, and converts speech to text. The transcript is processed by an LLM, which can optionally call developer-defined tools described with JSON Schema. Responses are converted to speech and streamed back to the client. Sentence-boundary detection enables early, natural-sounding TTS playback before generation completes.
Developers can switch or combine providers without changing application code, enabling configurations such as one provider for LLM, another for STT, and a third for TTS. The SDK exposes 20+ hook points for logging, debugging, and custom behaviors, along with built-in metrics. Session resilience covers reconnection and state continuity. For production, a TURN server is required to ensure reliable WebRTC connectivity; the docs include guidance on configuration.
LLMRTC supports a range of conversational and multimodal use cases:
Key advantages include a provider-agnostic architecture, a unified TypeScript API with comprehensive types, validated tool calling, and operational visibility through hooks and metrics. The architecture’s streaming pipeline and sentence-aware TTS reduce perceived latency, while session management and reconnection improve reliability in real-world network conditions.