Can Distilled Whisper Models Serve as Direct Replacements for OpenAI Whisper?

I’m currently using a local OpenAI Whisper model for transcribing videos, and I’m interested in transitioning to the distilled variant called “distil-small.en,” which is meant to be quicker and more efficient.

def transcribe_video(self):
    video_path = "/path/to/my/video.mp4"
    
    whisper_model = whisper.load_model("small.en")          # This functions correctly
    whisper_model = whisper.load_model("distil-small.en")   # This one fails
    
    transcript = whisper_model.transcribe(word_timestamps=True, audio=video_path)
    print(transcript["text")

However, I’m facing this error:

RuntimeError: Model distil-small.en not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

My dependencies are managed with Poetry, and here’s how I set them up:

[tool.poetry.dependencies]
python = "^3.11"
openai-whisper = "*"
transformers = "*" # for distilled models
accelerate = "*" # for distilled models
datasets = { version = "*", extras = ["audio"] } # for distilled models

I’ve found that the GitHub documentation for Distilled Whisper appears to suggest a different installation method. Is it feasible to use a distilled model as a direct substitute for a regular Whisper model?

You’re mixing two different implementations that don’t work together. Distilled models need Hugging Face’s transformers library, not the openai-whisper package. I had to completely rewrite my transcription pipeline when I switched last month. Use pipeline('automatic-speech-recognition', model='distil-whisper/distil-small.en') and pass your audio file path. The output’s different too - you get a dictionary with a ‘text’ key instead of whisper’s full metadata structure. I got about 6x faster performance on my hardware, but accuracy dropped 2-3% compared to standard small.en. Also, word-level timestamps work differently in transformers, so you’ll need the return_timestamps parameter if you need that.

The distilled Whisper models won’t work with the standard openai-whisper package - that’s why you’re getting the error. You need to use the transformers library instead since these models live on Hugging Face. I hit the same problem when optimizing my transcription setup. Use AutomaticSpeechRecognitionPipeline from transformers with “distil-whisper/distil-small.en” instead of whisper.load_model(). The API’s a bit different but the speed boost makes refactoring worth it. Heads up - you might need to tweak your audio preprocessing since transformers handles it differently than native whisper.