LLMRTC Docs
OpenSource TypeScript SDK for real-time voice vision AI apps

About LLMRTC Docs
Introduction to LLMRTC Docs
LLMRTC Docs is the official documentation for LLMRTC, a TypeScript SDK for building real-time voice and vision applications. The SDK unifies WebRTC-based audio/video streaming with large language models (LLMs), speech-to-text (STT), and text-to-speech (TTS) through a single, provider-agnostic API. It focuses on low-latency, bidirectional streaming suitable for interactive, conversational systems.
The documentation targets developers who need reliable infrastructure for conversational AI, including voice assistants, multimodal agents, and customer support flows. LLMRTC abstracts complex runtime concerns such as session management, provider orchestration, voice activity detection, and reconnection so teams can focus on application logic.
Key Takeaways
- Real-time audio/video streaming via WebRTC with sub-second latency
- Server-side voice activity detection (VAD) and barge-in for natural turn-taking
- Provider-agnostic design across cloud and local models; mix LLM, STT, and TTS providers
- Tool calling with JSON Schema for validated function execution
- Playbooks for multi-stage conversations with configurable transitions
- End-to-end streaming pipeline (STT → LLM → TTS) with sentence-boundary detection
- Hooks and metrics for observability, including TTFT, token counts, and durations
- Session resilience with automatic reconnection and conversation history preservation
How LLMRTC Docs Works
LLMRTC is organized into three packages that separate core logic, server capabilities, and browser integration:
| Package | Purpose |
|---|---|
@llmrtc/llmrtc-core | Shared types, orchestrators, tools, and hooks |
@llmrtc/llmrtc-backend | Node.js server with WebRTC, VAD, and provider integrations |
@llmrtc/llmrtc-web-client | Browser SDK for audio/video capture and playback |
At runtime, the system streams user audio to the server, detects speech boundaries with VAD, and converts speech to text. The transcript is processed by an LLM, which can optionally call developer-defined tools described with JSON Schema. Responses are converted to speech and streamed back to the client. Sentence-boundary detection enables early, natural-sounding TTS playback before generation completes.
Developers can switch or combine providers without changing application code, enabling configurations such as one provider for LLM, another for STT, and a third for TTS. The SDK exposes 20+ hook points for logging, debugging, and custom behaviors, along with built-in metrics. Session resilience covers reconnection and state continuity. For production, a TURN server is required to ensure reliable WebRTC connectivity; the docs include guidance on configuration.
Core Benefits and Applications
LLMRTC supports a range of conversational and multimodal use cases:
- Voice assistants: Build assistants with low-latency, interruptible speech and domain tools
- Customer support: Guide users through authentication, triage, and resolution via playbooks
- Multimodal agents: Combine speech with camera or screen input for context-aware assistance
- On-device AI: Run locally with providers such as Ollama, Faster-Whisper, and Piper for privacy and cost control
Key advantages include a provider-agnostic architecture, a unified TypeScript API with comprehensive types, validated tool calling, and operational visibility through hooks and metrics. The architecture’s streaming pipeline and sentence-aware TTS reduce perceived latency, while session management and reconnection improve reliability in real-world network conditions.