X (Twitter)

About Echosy

Introduction to Echosy

Echosy is a macOS application for real-time, on-device audio transcription, dictation, and summarization. It processes audio entirely offline—without internet connectivity—ensuring complete privacy and data sovereignty. Designed for professionals who handle sensitive or confidential content—including researchers, journalists, legal practitioners, educators, and developers—Echosy enables secure capture of both system audio (e.g., meetings, podcasts, videos) and microphone input simultaneously.

The application supports multilingual transcription across 99+ languages and integrates multiple local automatic speech recognition (ASR) models, including Qwen3-ASR and MLX-optimized Whisper variants. Users can transcribe live sessions, dictate system-wide, enhance text with punctuation or translation, generate AI summaries via configurable LLM backends, and manage recordings with full session history—all without sending audio or transcripts to remote servers.

Key Takeaways

100% offline operation: All audio processing, transcription, and summarization occur locally on the Mac; no data leaves the device
Dual audio capture: Records both system audio (e.g., Zoom, Spotify, Safari) and microphone input concurrently using ScreenCaptureKit
Multiple on-device ASR models: Includes Qwen3-ASR (0.6B and 1.7B), MLX Whisper (Tiny to Large V3 Turbo), and standard Whisper variants—each with distinct trade-offs in size, speed, accuracy, and memory requirements
System-wide dictation: Customizable hotkey activates voice-to-text input that pastes directly at the cursor in any macOS application
Real-time transcript enhancement: Supports auto-punctuation, multilingual translation, custom prompt application, and streaming AI summaries using OpenAI-compatible APIs or local LLMs (e.g., Ollama)
File-based transcription: Drag-and-drop support for WAV, MP3, MP4, MOV, and M4A files with batch processing and progress tracking
Session management: Full history browser with searchable transcripts, replayable audio, and persistent summary storage
Vocabulary biasing: Pro feature allowing users to add domain-specific terms or proper nouns to improve ASR accuracy for technical or specialized content

How Echosy Works

Echosy operates as a native macOS application that leverages Apple’s ScreenCaptureKit framework to capture system audio from any running application—including Zoom, Teams, YouTube, and Spotify—as well as microphone input. Audio streams are routed directly to on-device ASR models (e.g., Qwen3-ASR or MLX Whisper) running via Metal-accelerated inference, producing timestamped transcripts in real time. Users may select from multiple ASR models based on hardware constraints (e.g., memory, chip architecture) and language needs.

Transcripts can be enhanced interactively: punctuation and grammar are auto-corrected, translations are applied per segment, and custom prompts refine output style. For summarization and analysis, Echosy connects to user-configured LLM endpoints—including OpenAI, Gemini, Claude, Groq, OpenRouter, or fully local Ollama instances—streaming transcript chunks for low-latency processing. All generated outputs (transcripts, summaries, chat responses) remain stored exclusively on the device unless manually exported.

File transcription follows the same local workflow: imported audio or video files are decoded and processed by the selected ASR model without cloud dependency. Session history maintains metadata, raw audio references, full transcripts, and associated summaries in a local database, enabling search, replay, and export to MD, TXT, SRT, VTT, DOCX, or PDF (Pro tier).

Core Benefits and Applications

Echosy serves use cases requiring strict data privacy, low-latency responsiveness, and adaptability across diverse audio sources. Legal professionals use it to transcribe client consultations or deposition recordings without exposing sensitive information to third-party services. Researchers and academics transcribe interviews or lecture recordings while preserving participant confidentiality. Developers leverage vocabulary biasing to improve recognition of technical terminology during code walkthroughs or internal demos.

Educators create accessible lecture notes with synchronized timestamps and multilingual translations. Journalists capture and summarize press conferences or podcast interviews in real time, then refine outputs using custom prompts. Remote workers use system-wide dictation to compose emails, documentation, or messages hands-free—especially useful when multitasking across applications. Batch file transcription supports archival workflows, such as converting legacy meeting recordings or lecture libraries into searchable text.

Its offline-first design also benefits users in air-gapped environments, regions with unreliable connectivity, or organizations with strict data residency policies. Hardware flexibility—from Intel Macs with 8 GB RAM to Apple Silicon systems running large quantized models—allows deployment across heterogeneous device fleets without compromising core functionality.

Feature	Free	Pro
Maximum recording length	15 minutes per session	4 hours per session
Available ASR models	Qwen3-ASR 0.6B only	All models (Qwen3-ASR 0.6B/1.7B, MLX Whisper variants, standard Whisper)
AI summaries	3 per day	Unlimited
AI chat with transcripts	3 per day	Unlimited
Real-time translation	Not available	Available
Auto-punctuation & correction	Available	Available
Custom prompts	Not available	Available
Export formats	MD, TXT	MD, TXT, SRT, VTT, DOCX, PDF
File transcription	Available	Available
Session history	Unlimited	Unlimited
Licensed devices	Not applicable	Up to 3 devices
License scope	Personal, non-commercial use only	Personal, non-commercial use only

Enterprise licensing is required for business, team, or commercial deployment and includes custom deployment options, priority support, and dedicated onboarding.

Echosy

About Echosy

Introduction to Echosy

Key Takeaways

How Echosy Works

Core Benefits and Applications

Get Started

Categories

Tags