MediaRecorder with Shazam RapidAPI integration not working properly

I’m building a music recognition app for a class assignment that works similar to Shazam. I can record audio and convert it to base64 format like the API documentation requires, but the song detection always fails. I think there might be an issue with how I’m recording or encoding the audio data.

The RapidAPI documentation says I need:
Base64 encoded byte array from raw audio data under 500KB (3-5 second clips work best). The audio must be 44100Hz sample rate, mono channel, signed 16-bit PCM little endian format. No other formats like mp3 or wav are accepted.

Here’s my sound recording class:

override fun startCapture(targetFile: File) {
    buildRecorder().apply {
        setAudioSource(MediaRecorder.AudioSource.DEFAULT)
        setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP)
        setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB)
        setAudioChannels(1)
        setAudioSamplingRate(44100)
        setAudioEncodingBitRate(16 * 44100)
        setOutputFile(targetFile.absolutePath)

        prepare()
        start()

        mediaRecorder = this
    }
}

override fun stopCapture() {
    mediaRecorder?.stop()
    mediaRecorder?.reset()
    mediaRecorder = null
}

fun extractAudioBytes(audioFile: File): ByteArray {
    return audioFile.readBytes()
}

My recording function:

private suspend fun captureAudioSample(recorder: SoundRecorder, ctx: Context): String {
    File(ctx.cacheDir, "sample.3gp").also {
        recorder.startCapture(it)
        recordedFile = it
    }

    delay(5000)
    recorder.stopCapture()

    val rawData = recorder.extractAudioBytes(recordedFile!!)
    val encodedData = encodeToBase64(rawData)
    
    return encodedData
}

Base64 conversion:

private fun encodeToBase64(rawAudio: ByteArray): String {
    return Base64.encodeToString(rawAudio, Base64.DEFAULT)
}

What am I doing wrong with the audio format or encoding process?

Yeah, it’s definitely the 3GP format causing issues. You’re sending compressed audio with headers instead of raw PCM data. MediaRecorder always wraps audio in container formats, and Shazam’s API can’t handle that. Use AudioRecord instead - it gives you direct access to the raw audio buffer without encoding or file wrapping.

I hit this exact same issue last semester. Your problem is THREE_GPP format - it creates a container file with headers and compressed data, not the raw PCM samples Shazam needs. Even worse, AMR_NB encoder resamples everything to 8kHz internally, even though you set 44100Hz. That breaks the API requirements completely.

Ditch MediaRecorder and switch to AudioRecord with AudioFormat.ENCODING_PCM_16BIT instead. Set up a buffer to grab raw samples directly, then convert to base64. AudioRecord gives you clean, unprocessed audio straight from the mic. MediaRecorder always compresses and wraps everything in containers the API can’t handle.

The issue appears to stem from using the 3GP format with AMR_NB encoding, which is not compatible with the Shazam API that requires raw PCM data. When you apply readBytes() to the .3gp file, you’re retrieving the entire compressed file, including headers and metadata, rather than just the audio samples. I recommend switching from MediaRecorder to AudioRecord. The latter allows you to obtain raw PCM samples directly. Configure AudioRecord with a 44100Hz sample rate, mono channel (CHANNEL_IN_MONO), and 16-bit encoding (ENCODING_PCM_16BIT). Read audio data directly into a ByteArray and encode it to base64. This approach avoids file compression and delivers precisely what the API needs.