Free, private transcription and speaker ID in the browser

Transcrisper is a client-side AI transcription application that operates entirely within the web browser. It is designed for users who require accurate, private transcription of audio and video recordings without relying on cloud-based processing. The tool is particularly suited for professionals handling sensitive content—such as legal practitioners, journalists, researchers, and educators—who prioritize data confidentiality and local processing.
Unlike cloud-dependent transcription services, Transcrisper performs all computation on the user’s device. This architecture eliminates network transmission of source files and ensures no audio data is stored or processed remotely. The application supports long-form media, including multi-hour recordings, and includes built-in optimizations for efficiency and usability.
Upon first launch, Transcrisper downloads two neural models—parakeet-tdt-0.6b-v3 for automatic speech recognition (ASR) and sortformer_4spk-v2.1 for speaker diarization—into the browser’s persistent storage. These models remain local and are reused across sessions unless manually cleared. Audio files are loaded directly from the user’s device and processed in-memory using either WebAssembly (WASM) or WebGPU acceleration, depending on system capabilities and configuration.
The transcription pipeline begins with silence detection and segmentation, followed by ASR to generate raw text. Concurrently, the diarization model analyzes acoustic features to assign speaker labels. Final output aligns transcribed text with precise timestamps and speaker identifiers. Export options include plain text, Markdown, and industry-standard subtitle formats with frame-accurate timing.
Transcrisper enables secure, compliant transcription workflows in regulated environments where data residency and privacy are mandatory—such as healthcare consultations, confidential interviews, or academic fieldwork. Its silent-gap skipping behavior improves readability of transcripts from low-energy recordings (e.g., lecture halls or remote meetings). Educators can generate accessible subtitles for instructional videos; researchers can annotate multi-speaker focus group sessions without external dependencies. Because it requires no account creation, API keys, or backend infrastructure, it serves as a lightweight, zero-configuration utility for one-off or recurring transcription tasks.