Studio-quality voice AI that runs locally on your desktop.

Vois is a desktop application for text-to-speech (TTS) synthesis that operates entirely offline. Designed as an integrated voice production studio, it enables users to script, generate, arrange, master, and export speech audio without relying on cloud services. The software targets content creators—including podcasters, audiobook authors, YouTube narrators, documentary producers, e-learning developers, and game designers—who require high-quality, expressive, and privacy-respecting voice generation.
Unlike cloud-based TTS platforms, Vois processes all audio generation locally on the user’s machine. This eliminates per-character billing, usage caps, and the need to upload sensitive scripts to third-party servers. Built with Rust for performance and safety, Vois supports real-time processing at up to 6× speed on Apple Silicon hardware.
Vois follows a linear yet flexible four-stage workflow: writing, casting, generating/arranging, and mastering/exporting. Users begin by writing or importing a script—either directly in the app, from a document, or via URL—and tagging speakers for multi-character dialogue. Next, they assign voices from the built-in library or cloned models to each speaker or section. Audio generation is performed locally and unlimited; generated clips appear as editable segments on a multi-track timeline where users can adjust timing, add transitions, insert ambient layers, and manage speaker order.
The final stage involves applying mastering effects—such as loudness normalization, frequency balancing, and dynamic range control—before exporting to target platforms. Export presets automatically configure settings for industry standards (e.g., ACX’s -18 LUFS requirement). All project files, audio assets, and cloned voices are stored locally and remain fully accessible even after subscription cancellation.
Vois replaces multiple disjointed tools—including cloud TTS APIs, digital audio workstations (DAWs), and mastering plugins—with a single, unified application. For podcasters, it enables solo production of multi-voice episodes with intros, outros, and guest impersonations. Audiobook authors can convert manuscripts into professionally mastered, platform-compliant audio in hours rather than days. YouTube creators benefit from consistent, scalable narration for faceless or tutorial content. Documentary makers use multilingual narration and character-specific voices to localize content across regions. Game developers generate NPC dialogue in bulk while preserving stylistic consistency and linguistic accuracy. Educational content creators produce training modules and meditation guides with tailored vocal tone and pacing—all while maintaining full data sovereignty and avoiding recurring per-use costs.
| Plan | Price | Billing | Key Terms |
|---|---|---|---|
| Free Tier | $0 | Daily limit | 10 generations per day |
| Annual Subscription | $9/month | Billed annually ($108/year) | Unlimited everything; 40% launch discount applied |
| Monthly Subscription | $29/month | Billed monthly | Unlimited everything; no long-term commitment |