AI Models
Last updated: January 2026
About Whisper
LocalWhisper uses OpenAI’s Whisper, an open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual data.
All processing runs locally on your Mac. No audio is ever sent to external servers.
Available Models
| Model | Size | Relative Speed | Best For |
|---|---|---|---|
| tiny | ~75 MB | Fastest | Quick notes, low-powered devices |
| base | ~150 MB | Fast | Daily use, good balance |
| small | ~500 MB | Medium | Better accuracy |
| medium | ~1.5 GB | Slower | High accuracy needs |
| large | ~3 GB | Slowest | Maximum accuracy |
Choosing a Model
Recommended: Base
For most users, the base model offers the best balance of speed and accuracy. It’s fast enough for real-time transcription while maintaining good accuracy.
For Apple Silicon Macs
Apple Silicon Macs (M1/M2/M3/M4) can easily run the large model with Metal GPU acceleration. If you have an Apple Silicon Mac and prioritize accuracy over speed, try the large model.
For Intel Macs
Intel Macs work best with tiny or base models for responsive performance. Larger models may be slow without GPU acceleration.
Model Download
Models are downloaded automatically from HuggingFace on first use. After the initial download, LocalWhisper works completely offline.
Download sizes:
- tiny: ~75 MB
- base: ~150 MB
- small: ~500 MB
- medium: ~1.5 GB
- large: ~3 GB
Language Support
Whisper supports 99 languages with automatic language detection. You can also manually specify a language for slightly better accuracy.
Supported languages include: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, and Yoruba.
Technical Details
Metal Acceleration
On Apple Silicon Macs, LocalWhisper uses Metal for GPU-accelerated inference. This provides significant speed improvements over CPU-only processing.
Memory Usage
Memory usage varies by model:
- tiny/base: Minimal impact on system
- small: ~1-2 GB additional RAM
- medium/large: ~2-4 GB additional RAM
If you experience slowdowns, try a smaller model or close memory-intensive applications.
Comparison with Other Solutions
vs. Apple Dictation
Apple Dictation sends audio to Apple’s servers. LocalWhisper processes everything locally, often with better accuracy for technical terms, names, and non-English languages.
vs. Cloud Transcription Services
Cloud services require internet and send your audio to remote servers. LocalWhisper works offline and keeps all audio on your device, ideal for sensitive content.