MediaRecorder with Shazam-like API integration issues via RapidAPI

I’m building a music recognition app for my class assignment. I can record audio and convert it to base64 format like the API requires, but song detection always fails. I think there’s something wrong with how I’m recording or encoding the audio data.

The API needs raw PCM data at 44100Hz, mono channel, signed 16-bit little endian format, encoded as base64 under 500KB.

Here’s my SoundRecorder class:

override fun startRecording(destination: File) {
    buildRecorder().apply {
        setAudioSource(MediaRecorder.AudioSource.MIC)
        setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP)
        setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB)
        setAudioChannels(1)
        setAudioSamplingRate(44100)
        setAudioEncodingBitRate(16 * 44100)
        setOutputFile(destination.absolutePath)

        prepare()
        start()

        mediaRecorder = this
    }
}

override fun stopRecording() {
    mediaRecorder?.stop()
    mediaRecorder?.reset()
    mediaRecorder = null
}

fun getAudioBytes(audioFile: File): ByteArray {
    return audioFile.readBytes()
}

This function manages the recording process:

private suspend fun captureAudioSample(recorder: SoundRecorder, ctx: Context): String {
    val recordingFile = File(ctx.cacheDir, "sample.3gp")
    
    recorder.startRecording(recordingFile)
    savedFile = recordingFile
    
    delay(5000)
    
    recorder.stopRecording()
    
    val rawAudio = recorder.getAudioBytes(savedFile!!)
    val encodedData = encodeToBase64(rawAudio)
    
    return encodedData
}

Base64 encoding method:

private fun encodeToBase64(rawData: ByteArray): String {
    return Base64.encodeToString(rawData, Base64.DEFAULT)
}

Your problem is mixing THREE_GPP format with AMR_NB encoding when the API wants raw PCM data. You’re essentially recording compressed audio and sending that compressed file to the API like it’s raw PCM samples. I faced a similar issue with audio recognition APIs. You should either switch to AudioRecord class for capturing raw PCM directly or record as WAV and extract PCM data from that file. If you prefer using MediaRecorder, consider OutputFormat.DEFAULT or OutputFormat.MPEG_4 with AudioEncoder.DEFAULT. However, AudioRecord is more efficient in this case since it provides direct PCM samples without the overhead of a container format. Additionally, ensure that your 5-second recording remains under the 500KB base64 limit, which translates to about 375KB of raw audio data.

youre using THREE_GPP but the API needs raw PCM. that format is compressed, not raw. consider using MediaRecorder.OutputFormat.PCM, or even better, check out AudioRecord class for direct access to raw samples without compression.

Your problem is the 3GP format with AMR-NB encoding - you’re sending compressed audio with headers, but the API wants raw PCM data. I’ve hit this same issue before. You need to either extract the PCM samples from your 3GP file or record straight to PCM format. The API can’t handle the 3GP container wrapping your audio. Try AudioRecord instead of MediaRecorder for direct PCM capture. If you’re stuck with MediaRecorder, use OutputFormat.DEFAULT or THREE_GPP, but then you’ll have to decode the container to get raw samples. FFmpeg handles this conversion well, though it’ll complicate your project. Alternatively, record to WAV first, strip the header to get pure PCM data, then base64 encode it.

Your problem’s simple - you’re recording in 3GP with AMR encoding, but the API wants raw PCM data. It can’t handle compressed formats.

I hit this same wall building a voice command system for our office. Wasted hours debugging until I caught the format mismatch.

Switch to AudioRecord instead of MediaRecorder to grab raw PCM samples directly. MediaRecorder always compresses.

But honestly? Handling all the audio recording, format conversion, API calls, and error handling manually is a nightmare. You’ll spend more time fighting audio formats than building your actual app.

I’d just use Latenode for this. Set up a workflow that takes the audio through webhooks, auto-converts to PCM, hits the Shazam API, and sends back results. No messing with AudioRecord buffers or conversions.

Flow goes: mobile app sends audio → Latenode processes/converts → calls API → returns song data. Takes 10 minutes to set up vs hours of audio programming headaches.

Check it out: https://latenode.com

Your problem is the recording format. You’re using 3GP with AMR_NB encoding, but the API needs raw PCM data. When you pull bytes from the 3GP file, you’re getting compressed data instead of the raw audio samples the API expects. Switch to AudioRecord - it gives you PCM samples directly without any wrapper. Set it up with ENCODING_PCM_16BIT, CHANNEL_IN_MONO, and 44100 sample rate. Record straight into a short array, then convert to bytes. If you must stick with MediaRecorder, record as WAV and strip the header afterward. Fair warning though - that’s going to be messier.