AI Models

Information about the Whisper AI models used for speech recognition.

About Whisper

LocalWhisper uses OpenAI’s Whisper, an open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual data.

All processing runs locally on your Mac. No audio is ever sent to external servers.

Available Models

Model	Size	Relative Speed	Best For
tiny	~75 MB	Fastest	Quick notes, low-powered devices
base	~150 MB	Fast	Daily use, good balance
small	~500 MB	Medium	Better accuracy
medium	~1.5 GB	Slower	High accuracy needs

Choosing a Model

Recommended: Base

For most users, the base model offers the best balance of speed and accuracy. It’s fast enough for real-time transcription while maintaining good accuracy.

For Apple Silicon Macs

Apple Silicon Macs (M1/M2/M3/M4) can easily run the medium model with Metal GPU acceleration. If you have an Apple Silicon Mac and prioritize accuracy over speed, try the medium model.

For Intel Macs

Intel Macs work best with tiny or base models for responsive performance. Larger models may be slow without GPU acceleration.

Model Download

Models are downloaded automatically from HuggingFace on first use. After the initial download, LocalWhisper works completely offline.

Download sizes:

tiny: ~75 MB
base: ~150 MB
small: ~500 MB
medium: ~1.5 GB

Language Support

Whisper supports 99 languages with automatic language detection. You can also manually specify a language for slightly better accuracy.

Supported languages include: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, and Yoruba.

Technical Details

Metal Acceleration

On Apple Silicon Macs, LocalWhisper uses Metal for GPU-accelerated inference. This provides significant speed improvements over CPU-only processing.

Memory Usage

Memory usage varies by model:

tiny/base: Minimal impact on system
small: ~1-2 GB additional RAM
medium: ~2-4 GB additional RAM

If you experience slowdowns, try a smaller model or close memory-intensive applications.

Comparison with Other Solutions

vs. Apple Dictation

Apple Dictation sends audio to Apple’s servers. LocalWhisper processes everything locally, often with better accuracy for technical terms, names, and non-English languages.

vs. Cloud Transcription Services

Cloud services require internet and send your audio to remote servers. LocalWhisper works offline and keeps all audio on your device, ideal for sensitive content.