MediaRecorder integration with Shazam API via RapidApi not working properly

I’m building a music recognition app for my class assignment that works similar to Shazam. Right now I can record audio and convert it to base64 format like the API wants, but the song detection always fails. I think there might be an issue with how I’m recording or encoding the audio data.

The RapidAPI documentation says I need base64 encoded raw audio data under 500KB (3-5 seconds is enough). The audio must be 44100Hz, mono channel, signed 16 bit PCM little endian format. No other formats like mp3 or wav work.

Here’s my sound recording class:

override fun startRecording(targetFile: File) {
    buildRecorder().apply {
        setAudioSource(MediaRecorder.AudioSource.DEFAULT)
        setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP)
        setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB)
        setAudioChannels(1)
        setAudioSamplingRate(44100)
        setAudioEncodingBitRate(16 * 44100)
        setOutputFile(targetFile.absolutePath)

        prepare()
        start()

        mediaRecorder = this
    }
}

override fun stopRecording() {
    mediaRecorder?.stop()
    mediaRecorder?.reset()
    mediaRecorder = null
}

fun getAudioBytes(targetFile: File): ByteArray {
    return targetFile.readBytes()
}

This is my recording handler function:

private suspend fun handleAudioCapture(recorder: AudioRecorder, appContext: Context): String {
    Log.d("SongDetectionActivity", "Starting audio capture process")
    
    File(appContext.cacheDir, "recorded_audio.3gp").also {
        recorder.startRecording(it)
        capturedFile = it
    }

    delay(5000)

    recorder.stopRecording()

    val rawAudioBytes = recorder.getAudioBytes(capturedFile!!)
    val encodedString = encodeToBase64(rawAudioBytes)
    
    return encodedString
}

And my base64 encoding function:

private fun encodeToBase64(rawBytes: ByteArray): String {
    return Base64.encodeToString(rawBytes, Base64.DEFAULT)
}

The API connection works fine when I test manually. What am I doing wrong with the audio recording or conversion process?

yeah, the issue’s clear - you’re using MediaRecorder with compression, but Shazam needs raw PCM samples. that 3GP file comes with headers and AMR encoding that messes everything up when you base64 encode it. switch to AudioRecord instead and grab the raw bytes straight from the mic buffer. skip the file stuff entirely.

Your issue is using MediaRecorder with THREE_GPP format and AMR_NB codec. You’re recording compressed audio in a container format, then feeding that whole file (headers, metadata and all) to Shazam like it’s raw PCM data. Shazam wants pure audio samples, not wrapped container files.

Switch to AudioRecord instead. Set it up with AudioFormat.ENCODING_PCM_16BIT, AudioFormat.CHANNEL_IN_MONO, and your 44100 sample rate. Read straight into a byte buffer while recording. That’s how you get the raw PCM format the API actually wants.

I hit this exact problem with audio fingerprinting services last year. The encoded file approach doesn’t work - you can’t mix container formats with raw audio expectations. AudioRecord’s trickier to code but it’s the only way to get clean PCM data for recognition APIs.

Your issue is the 3GP format with AMR_NB encoding - the API wants raw PCM data, not compressed audio with container headers. When you convert that 3GP file to base64, you’re basically sending garbage to the API.

MediaRecorder can’t output raw PCM, so you’ll need to switch to AudioRecord instead. Set it up with ENCODING_PCM_16BIT, CHANNEL_IN_MONO, and 44100 sample rate. This gives you clean raw audio without any compression or container mess that’s killing your recognition.

I hit this exact same problem with audio APIs before - switching to AudioRecord fixed it instantly.