LLMRTC Docs

About LLMRTC Docs

Introduction to LLMRTC Docs

LLMRTC Docs is the official documentation for LLMRTC, a TypeScript SDK for building real-time voice and vision applications. The SDK unifies WebRTC-based audio/video streaming with large language models (LLMs), speech-to-text (STT), and text-to-speech (TTS) through a single, provider-agnostic API. It focuses on low-latency, bidirectional streaming suitable for interactive, conversational systems.

The documentation targets developers who need reliable infrastructure for conversational AI, including voice assistants, multimodal agents, and customer support flows. LLMRTC abstracts complex runtime concerns such as session management, provider orchestration, voice activity detection, and reconnection so teams can focus on application logic.

Key Takeaways

Real-time audio/video streaming via WebRTC with sub-second latency
Server-side voice activity detection (VAD) and barge-in for natural turn-taking
Provider-agnostic design across cloud and local models; mix LLM, STT, and TTS providers
Tool calling with JSON Schema for validated function execution
Playbooks for multi-stage conversations with configurable transitions
End-to-end streaming pipeline (STT → LLM → TTS) with sentence-boundary detection
Hooks and metrics for observability, including TTFT, token counts, and durations
Session resilience with automatic reconnection and conversation history preservation

How LLMRTC Docs Works

LLMRTC is organized into three packages that separate core logic, server capabilities, and browser integration:

Package	Purpose
`@llmrtc/llmrtc-core`	Shared types, orchestrators, tools, and hooks
`@llmrtc/llmrtc-backend`	Node.js server with WebRTC, VAD, and provider integrations
`@llmrtc/llmrtc-web-client`	Browser SDK for audio/video capture and playback

At runtime, the system streams user audio to the server, detects speech boundaries with VAD, and converts speech to text. The transcript is processed by an LLM, which can optionally call developer-defined tools described with JSON Schema. Responses are converted to speech and streamed back to the client. Sentence-boundary detection enables early, natural-sounding TTS playback before generation completes.

Developers can switch or combine providers without changing application code, enabling configurations such as one provider for LLM, another for STT, and a third for TTS. The SDK exposes 20+ hook points for logging, debugging, and custom behaviors, along with built-in metrics. Session resilience covers reconnection and state continuity. For production, a TURN server is required to ensure reliable WebRTC connectivity; the docs include guidance on configuration.

Core Benefits and Applications

LLMRTC supports a range of conversational and multimodal use cases:

Voice assistants: Build assistants with low-latency, interruptible speech and domain tools
Customer support: Guide users through authentication, triage, and resolution via playbooks
Multimodal agents: Combine speech with camera or screen input for context-aware assistance
On-device AI: Run locally with providers such as Ollama, Faster-Whisper, and Piper for privacy and cost control

Key advantages include a provider-agnostic architecture, a unified TypeScript API with comprehensive types, validated tool calling, and operational visibility through hooks and metrics. The architecture’s streaming pipeline and sentence-aware TTS reduce perceived latency, while session management and reconnection improve reliability in real-world network conditions.

About LLMRTC Docs

Introduction to LLMRTC Docs

Key Takeaways

How LLMRTC Docs Works

Core Benefits and Applications

Get Started

Categories

Tags