DictatorFlow
Talk to your computer & AI agents to control your computer

About DictatorFlow
Introduction to DictatorFlow
DictatorFlow is a voice interface platform that enables users to control their computers and interact with AI agents using spoken language. It functions as both a native desktop application and a developer-facing API, supporting real-time speech-to-text transcription, voice command execution, and text editing via voice. Designed for professionals who require precision, speed, and privacy—including developers, writers, engineers, and accessibility users—DictatorFlow operates across macOS (Apple Silicon and Intel), Windows, and Linux without relying on Electron or other bloated frameworks.
The system is built around custom acoustic models trained for low-latency, high-accuracy transcription. It supports fully offline operation on local hardware, ensuring audio never leaves the user’s device. With compatibility across 99+ languages—including automatic language detection and cross-language translation—it serves multilingual workflows while maintaining strict data sovereignty.
Key Takeaways
- Native desktop applications for macOS, Windows, and Linux, written in Zig for minimal resource usage
- 1.2% word error rate (WER) on LibriSpeech test-clean, outperforming Whisper (Large), Google Cloud STT, and AWS Transcribe
- 150ms latency to first token, lower than Deepgram, AssemblyAI, and Whisper API
- Fully offline mode: all models run locally on GPU or CPU; no audio is transmitted to or stored on remote servers
- Voice command editing: select text in any application and issue spoken instructions to rewrite, translate, summarize, or refactor content
- REST API and WebSocket support with PCM, WAV, WebM, MP3, and OGG audio format compatibility
- Browser-embeddable widget with live waveform visualization, Enter-to-submit, and Esc/X-to-cancel functionality
- Usage-based API pricing at $0.004 per second of audio, with multi-provider fallback and speaker diarization
How DictatorFlow Works
DictatorFlow operates through two primary interaction modes: local desktop control and programmatic API integration. In desktop mode, users speak commands into their microphone to trigger system actions or edit selected text across any application—including IDEs, browsers, and text editors. The engine processes audio locally using optimized acoustic models, then executes transformations directly within the host application context.
For developers, DictatorFlow provides a low-latency API endpoint accepting raw audio bytes. Integration is supported via cURL, JavaScript, Python, Go, and other HTTP-capable stacks. Audio is submitted with an authorization header and appropriate Content-Type; the response returns transcribed text and duration metadata. The browser widget simplifies frontend integration by mounting a self-contained speech modal beside any <textarea>, <input>, or contenteditable element, handling recording, visualization, and insertion automatically.
The platform supports automatic language detection and translation—e.g., speaking French to generate English output—without requiring explicit language selection. All processing can occur entirely offline when using the native app or self-hosted API deployment.
Core Benefits and Applications
DictatorFlow enables hands-free, high-fidelity computer interaction for diverse use cases. Writers and editors use voice commands to revise prose, adjust tone, or restructure paragraphs without switching contexts. Software developers refactor code, explain logic, or translate comments using natural language prompts. Accessibility users benefit from robust offline support and zero-cloud audio handling, reducing reliance on internet connectivity and third-party services.
Developers integrate DictatorFlow into internal tools, CLI utilities, cron-driven transcription pipelines, and customer-facing applications requiring real-time voice input. Its low-latency design makes it suitable for interactive systems such as voice-controlled dashboards, meeting note assistants, and multilingual documentation tools. The API’s support for speaker diarization and multi-format audio ingestion further extends its utility in enterprise call-center analytics and academic research settings.
| Tier | Price | Includes |
|---|---|---|
| Pro | $9/month | 10 hours/month cloud transcription, highest-accuracy models, free offline mode, continuous updates |
| Pro Lifetime | $99 one-time | Native apps for all platforms, $99 API credits, unlimited local transcription, lifetime updates |
| API Credits | $0.004/second | REST & WebSocket access, 99.99% uptime SLA, speaker diarization, priority support |