I’m having trouble with my Wit.ai speech recognition implementation in Python. The API call seems to work but I’m getting an empty response with no recognized text.
from wit import Wit
audio_client = Wit("MY_ACCESS_TOKEN")
with open('recording.mp3', 'rb') as audio_file:
result = audio_client.speech(audio_file, {'Content-Type': 'audio/wav'})
print('Response received: ' + str(result))
The output I get is: {'entities': {}, 'intents': [], 'text': '', 'traits': {}} - notice how the text field is completely empty. What could be causing this issue? Is there a problem with my audio file format or the way I’m making the API request?
Hit this same issue with a transcription service last year. Yeah, check the content-type like others said, but also look at your audio file’s sample rate and bit depth. Wit.ai works best with 16-bit PCM at 8kHz or 16kHz. File encoding tripped me up too. Just because it says mp3 or wav doesn’t mean that’s what it actually is. Open the file in a media player first - make sure it plays normally. Also check the file size in bytes before sending. I’ve seen files that looked fine but were corrupted or cut off during recording. Wit.ai gets empty audio data and can’t do anything with it.
yeah sounds like ur audio format’s messed up. uploading mp3 but saying it’s wav in content-type. just change that to ‘audio/mpeg’ or convert it to wav. good luck, hope it helps!
Had this exact issue a few months back building a voice assistant prototype. You’ve got a content type mismatch like Mandy said.
But Wit.ai’s also picky about audio quality. I’ve seen it return empty text when audio’s too quiet or has background noise.
Try this:
with open('recording.mp3', 'rb') as audio_file:
result = audio_client.speech(audio_file, {'Content-Type': 'audio/mpeg'})
Doesn’t work? Convert to WAV first. I use FFmpeg:
ffmpeg -i recording.mp3 recording.wav
Then use ‘audio/wav’ as content type.
Check your file duration too. Wit.ai limits files to 20 seconds and 5MB. Longer or bigger files will break.
One more thing - make sure your access token has speech recognition permissions. I’ve debugged cases where tokens were only set up for text processing.