Real-time Transcript, translate and summary in private

Echosy is a macOS application for real-time, on-device audio transcription, dictation, and summarization. It processes audio entirely offline—without internet connectivity—ensuring complete privacy and data sovereignty. Designed for professionals who handle sensitive or confidential content—including researchers, journalists, legal practitioners, educators, and developers—Echosy enables secure capture of both system audio (e.g., meetings, podcasts, videos) and microphone input simultaneously.
The application supports multilingual transcription across 99+ languages and integrates multiple local automatic speech recognition (ASR) models, including Qwen3-ASR and MLX-optimized Whisper variants. Users can transcribe live sessions, dictate system-wide, enhance text with punctuation or translation, generate AI summaries via configurable LLM backends, and manage recordings with full session history—all without sending audio or transcripts to remote servers.
Echosy operates as a native macOS application that leverages Apple’s ScreenCaptureKit framework to capture system audio from any running application—including Zoom, Teams, YouTube, and Spotify—as well as microphone input. Audio streams are routed directly to on-device ASR models (e.g., Qwen3-ASR or MLX Whisper) running via Metal-accelerated inference, producing timestamped transcripts in real time. Users may select from multiple ASR models based on hardware constraints (e.g., memory, chip architecture) and language needs.
Transcripts can be enhanced interactively: punctuation and grammar are auto-corrected, translations are applied per segment, and custom prompts refine output style. For summarization and analysis, Echosy connects to user-configured LLM endpoints—including OpenAI, Gemini, Claude, Groq, OpenRouter, or fully local Ollama instances—streaming transcript chunks for low-latency processing. All generated outputs (transcripts, summaries, chat responses) remain stored exclusively on the device unless manually exported.
File transcription follows the same local workflow: imported audio or video files are decoded and processed by the selected ASR model without cloud dependency. Session history maintains metadata, raw audio references, full transcripts, and associated summaries in a local database, enabling search, replay, and export to MD, TXT, SRT, VTT, DOCX, or PDF (Pro tier).
Echosy serves use cases requiring strict data privacy, low-latency responsiveness, and adaptability across diverse audio sources. Legal professionals use it to transcribe client consultations or deposition recordings without exposing sensitive information to third-party services. Researchers and academics transcribe interviews or lecture recordings while preserving participant confidentiality. Developers leverage vocabulary biasing to improve recognition of technical terminology during code walkthroughs or internal demos.
Educators create accessible lecture notes with synchronized timestamps and multilingual translations. Journalists capture and summarize press conferences or podcast interviews in real time, then refine outputs using custom prompts. Remote workers use system-wide dictation to compose emails, documentation, or messages hands-free—especially useful when multitasking across applications. Batch file transcription supports archival workflows, such as converting legacy meeting recordings or lecture libraries into searchable text.
Its offline-first design also benefits users in air-gapped environments, regions with unreliable connectivity, or organizations with strict data residency policies. Hardware flexibility—from Intel Macs with 8 GB RAM to Apple Silicon systems running large quantized models—allows deployment across heterogeneous device fleets without compromising core functionality.
| Feature | Free | Pro |
|---|---|---|
| Maximum recording length | 15 minutes per session | 4 hours per session |
| Available ASR models | Qwen3-ASR 0.6B only | All models (Qwen3-ASR 0.6B/1.7B, MLX Whisper variants, standard Whisper) |
| AI summaries | 3 per day | Unlimited |
| AI chat with transcripts | 3 per day | Unlimited |
| Real-time translation | Not available | Available |
| Auto-punctuation & correction | Available | Available |
| Custom prompts | Not available | Available |
| Export formats | MD, TXT | MD, TXT, SRT, VTT, DOCX, PDF |
| File transcription | Available | Available |
| Session history | Unlimited | Unlimited |
| Licensed devices | Not applicable | Up to 3 devices |
| License scope | Personal, non-commercial use only | Personal, non-commercial use only |
Enterprise licensing is required for business, team, or commercial deployment and includes custom deployment options, priority support, and dedicated onboarding.