Using Distilled Whisper as Direct Substitute for Standard OpenAI Whisper Models

I built a video transcription system that works great with the standard OpenAI Whisper model running locally. Now I want to switch to a distilled version (specifically “distil-small.en”) because it’s supposed to be faster and uses less memory.

def process_audio(self):
    video_path = "/my/video/file.mp4"
    
    # This line works fine
    whisper_model = whisper.load_model("small.en")
    
    # This line throws an error
    whisper_model = whisper.load_model("distil-small.en")
    
    result = whisper_model.transcribe(audio=video_path, word_timestamps=True)
    return result["text"]

When I try to load the distilled model, I get this error message:

RuntimeError: Model distil-small.en not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

My project dependencies are set up like this:

[tool.poetry.dependencies]
python = "^3.11"
openai-whisper = "*"
transformers = "*"
accelerate = "*"
datasets = { version = "*", extras = ["audio"] }

I noticed that the distilled whisper documentation shows a completely different way to load and use these models. Can I use distilled models the same way as regular whisper models without changing my code?

nah, you cant just drop in distilled models like that. distilled whisper needs the huggingface transformers library, not the regular whisper package. you’ll need to rewrite your loading code to use AutoModelForSpeechSeq2Seq and the other transformers components. its a completely different api unfortunately.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.