TTSLab

About TTSLab

Introduction to TTSLab

TTSLab is a browser-based application that enables local execution and comparison of text-to-speech (TTS) and speech-to-text (STT) models without relying on remote servers, API keys, or cloud infrastructure. It leverages WebGPU and WebAssembly (WASM) to perform on-device inference, ensuring full data privacy and low-latency interaction. The tool is designed for developers evaluating model performance, researchers conducting reproducible benchmarks, and product teams comparing voice characteristics across models.

The application supports multiple open-source models—including Kokoro 82M, Whisper Base and Small, Moonshine Base, and Supertonic 2—with each model downloaded once and cached locally in the browser. Users can run side-by-side voice comparisons, execute inference benchmarks, or interact with a fully client-side Voice Agent—all without transmitting audio or text externally.

Key Takeaways

Runs entirely in the browser using WebGPU and WASM—no server required
Ensures complete data privacy: text input and generated audio never leave the device
Supports both TTS and STT models with local caching for instant reuse
Enables side-by-side model comparison, benchmarking, and interactive Voice Agent use
Fully open source under the MIT license with public repository and contribution support
Compatible with common browser environments supporting WebGPU and WASM
No API keys, no account registration, and no telemetry or data collection
Optimized for real-time applications such as live captioning and voice-controlled interfaces

How TTSLab Works

TTSLab operates through a three-stage workflow. First, users select one or more models from the integrated directory—each labeled by type (TTS or STT), architecture, parameter count, and size. Upon selection, model weights are fetched over HTTPS and stored in the browser’s cache (e.g., IndexedDB or Cache API); subsequent use loads weights directly from local storage without re-downloading.

Second, inference is executed entirely within the browser context. WebGPU provides hardware-accelerated tensor computation where supported; otherwise, WASM serves as a portable fallback runtime. Input text is processed into speech (for TTS) or audio is transcribed into text (for STT), with all intermediate data remaining in memory.

Third, results are rendered in the UI—audio playback for TTS, transcribed text for STT, or conversational responses for the Voice Agent. No network requests occur during inference, and no persistent identifiers or usage analytics are collected.

Core Benefits and Applications

TTSLab enables privacy-sensitive evaluation of speech AI, particularly valuable for domains with strict compliance requirements—such as healthcare documentation, legal transcription, or internal enterprise communications—where sending sensitive content to third-party APIs is prohibited. Researchers benefit from standardized, reproducible inference environments across hardware configurations, facilitating fair model comparisons and benchmark reporting. Developers use it to prototype voice interfaces, validate model behavior before backend integration, or test multilingual support without provisioning cloud resources. Product teams leverage side-by-side voice previews to assess naturalness, latency, and language coverage when selecting TTS voices for end-user applications.

About TTSLab

Introduction to TTSLab

Key Takeaways

How TTSLab Works

Core Benefits and Applications

Get Started

Categories

Tags