AI Models

Last updated: January 2026

About Whisper

LocalWhisper uses OpenAI’s Whisper, an open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual data.

All processing runs locally on your Mac. No audio is ever sent to external servers.

Available Models

ModelSizeRelative SpeedBest For
tiny~75 MBFastestQuick notes, low-powered devices
base~150 MBFastDaily use, good balance
small~500 MBMediumBetter accuracy
medium~1.5 GBSlowerHigh accuracy needs
large~3 GBSlowestMaximum accuracy

Choosing a Model

For most users, the base model offers the best balance of speed and accuracy. It’s fast enough for real-time transcription while maintaining good accuracy.

For Apple Silicon Macs

Apple Silicon Macs (M1/M2/M3/M4) can easily run the large model with Metal GPU acceleration. If you have an Apple Silicon Mac and prioritize accuracy over speed, try the large model.

For Intel Macs

Intel Macs work best with tiny or base models for responsive performance. Larger models may be slow without GPU acceleration.

Model Download

Models are downloaded automatically from HuggingFace on first use. After the initial download, LocalWhisper works completely offline.

Download sizes:

  • tiny: ~75 MB
  • base: ~150 MB
  • small: ~500 MB
  • medium: ~1.5 GB
  • large: ~3 GB

Language Support

Whisper supports 99 languages with automatic language detection. You can also manually specify a language for slightly better accuracy.

Supported languages include: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, and Yoruba.

Technical Details

Metal Acceleration

On Apple Silicon Macs, LocalWhisper uses Metal for GPU-accelerated inference. This provides significant speed improvements over CPU-only processing.

Memory Usage

Memory usage varies by model:

  • tiny/base: Minimal impact on system
  • small: ~1-2 GB additional RAM
  • medium/large: ~2-4 GB additional RAM

If you experience slowdowns, try a smaller model or close memory-intensive applications.

Comparison with Other Solutions

vs. Apple Dictation

Apple Dictation sends audio to Apple’s servers. LocalWhisper processes everything locally, often with better accuracy for technical terms, names, and non-English languages.

vs. Cloud Transcription Services

Cloud services require internet and send your audio to remote servers. LocalWhisper works offline and keeps all audio on your device, ideal for sensitive content.