I’m building a music recognition app similar to Shazam for my class assignment. I can record audio and convert it to base64 format like the API requires, but the song detection isn’t working. I think there might be an issue with how I’m recording or encoding the audio data.
The API documentation says it needs raw audio data in a specific format: 44100Hz sample rate, mono channel, 16-bit PCM little endian, encoded as base64, and under 500KB (around 3-5 seconds of audio).
Your code has a basic mismatch - you’re sending 3GP files with AMR compression, but Shazam’s API wants raw PCM samples. No file headers, no compression.
I’ve hit this same wall with audio recognition APIs. The 3GP container adds metadata headers that mess up the recognition algorithms completely. Even pulling raw bytes from the 3GP file gives you compressed AMR data mixed with file structure junk.
Your delay() timing is also problematic. Audio recording doesn’t guarantee the buffer fills exactly when your delay ends. I’ve seen hardware take longer to initialize, giving you shorter samples than you think you’re getting.
Here’s what I’d try: write your raw PCM data to a WAV file first and play it back. If it sounds off or has the wrong pitch, your sample rate or bit depth conversion is broken. The API is strict about that 44.1kHz mono 16-bit spec.
Also check your base64 encoding isn’t adding line breaks. Some implementations add newlines every 76 characters, which breaks the API parsing.
Your problem is you’re recording 3GP with AMR encoding, but the API wants raw PCM data. 3GP files have metadata and compressed audio - not the raw samples the API expects.
Ditch MediaRecorder and use AudioRecord to grab raw PCM data:
val bufferSize = AudioRecord.getMinBufferSize(44100, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT)
val audioRecord = AudioRecord(MediaRecorder.AudioSource.MIC, 44100, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize)
val audioData = ShortArray(44100 * 5) // 5 seconds
audioRecord.startRecording()
val bytesRead = audioRecord.read(audioData, 0, audioData.size)
audioRecord.stop()
// Convert to byte array and encode
val byteArray = ByteArray(bytesRead * 2)
ByteBuffer.wrap(byteArray).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(audioData, 0, bytesRead)
val base64Audio = Base64.encodeToString(byteArray, Base64.NO_WRAP)
This gets you the exact PCM format the API needs without container format bloat.
Honestly though, audio format conversions and API integrations turn into a mess quickly. I’ve built similar recognition systems and found automation platforms handle format conversion and API management way cleaner. Latenode has built-in audio processing that handles sample rate conversion and encoding automatically, plus manages API calls with proper error handling.
This is super common with audio recognition APIs. The previous answer nails the MediaRecorder issue, but you’re probably missing endianness and bit ordering too. I hit the same wall last year - even after switching to AudioRecord, Shazam kept failing because of byte order problems. The API wants little-endian format, but your device might be sending big-endian data depending on architecture. Double-check your audio permissions in the manifest and runtime requests. I wasted hours thinking it was a format issue when AudioRecord wasn’t capturing anything - just missing permissions. Watch your file size after base64 encoding. That 500KB limit is tighter than it looks. 5 seconds of 44.1kHz 16-bit mono should be ~441KB raw, but base64 bumps it up 33%. Try 3-4 seconds instead to stay safely under the limit. Don’t use delay(5000) for timing either. Monitor the recording state and buffer fill to make sure you’re getting complete audio samples.