I’m attempting to transcribe an audio file with specific characteristics using the Google Speech to Text API. Here are the file details:
- Format: Raw Audio
- Sample Rate: 16000 Hz
- Bit Depth: 16
- Channel: Mono
I implemented the following Python code to generate the text:
request = api.speech().async_recognize(
data={
'configuration': {
'format': 'LINEAR16', # raw 16-bit signed little-endian samples
'sample_rate_hertz': 16000, # 16 kHz
'language': 'en-US', # BCP-47 language tag
},
'recording': {
'uri':'gs://somebucket/audio.raw'
}
})
result = request.execute()
print(json.dumps(result))
The code seems to function correctly, but it appears that the transcription only processes one minute of audio, disregarding any additional content. Can anyone explain this issue?