
Speak to code. AI-powered dictation for developers.

Voca is a desktop application designed for developers and technical professionals who require accurate, context-aware voice-to-text transcription. It addresses common limitations in general-purpose speech recognition tools—particularly their inability to correctly interpret technical terminology such as 'React', 'navbar', or 'props'. Built with developer workflows in mind, Voca integrates real-time AI correction directly into the typing environment, minimizing context switching and preserving focus during coding, documentation, or communication tasks.
The application processes speech locally or via user-provided API keys, ensuring privacy and data control. It supports both Windows and macOS and is distributed under the MIT open-source license. Voca is not a cloud-only service; it enables on-device operation where possible and gives users full ownership of their speech processing pipeline.
Voca operates as a lightweight desktop agent. When activated via a configurable global shortcut, it begins listening and sends audio to the selected STT engine (Deepgram Nova-3 or Groq Whisper). The resulting transcript is passed through an AI layer powered by Gemini 2.0 Flash, which corrects grammar, syntax, and domain-specific vocabulary—especially technical jargon relevant to software development. The corrected text is then automatically pasted at the current cursor location in any application.
Users can optionally enable translation before pasting, selecting from formal, casual, or Developer tone modes. In Developer mode, technical terms remain in English while surrounding content is translated. Additional add-ons handle numeric conversion and list generation. All processing respects user privacy: no audio is stored or transmitted to Voca’s servers unless explicitly routed through the user’s own API keys.
Voca streamlines repetitive text input tasks for developers, including writing code comments, drafting documentation, composing technical emails or Slack messages, and creating issue tickets or PR descriptions. Its ability to maintain technical accuracy reduces manual editing time and improves consistency in written output. The translation capability supports multilingual teams, enabling real-time composition of localized documentation or user-facing content. Because it works entirely within the user’s existing toolchain—without requiring copy-paste or tab switching—it integrates seamlessly into IDEs, editors, browsers, and collaboration platforms. The open-source nature and local-first architecture also make it suitable for regulated or security-sensitive environments where data residency and transparency are required.
| Plan | Monthly Cost | Included Credits | Max Recording Size | Features |
|---|---|---|---|---|
| Pro | $3 | $3 | 10 MB | All STT engines, translation, tone modes, numeric & planning add-ons |
| Max | $10 | $10 | 25 MB | Same as Pro, with higher file size limit |
Both plans include access to all features—no tiered feature restrictions.