Trouble with audio capture and Shazam-like API integration

I’m stuck on a school assignment where I’m making a Shazam clone. I can record audio and turn it into a base64 string, but the API isn’t recognizing any songs. I think I’m messing up either the recording or the encoding.

The API wants a base64 string from a byte array, less than 500KB, from a 3-5 second sample. It needs to be 44100Hz, mono channel, signed 16 bit PCM little endian.

Here’s my AudioRecorder class:

fun startRecording(file: File) {
  recorder = MediaRecorder().apply {
    setAudioSource(MediaRecorder.AudioSource.MIC)
    setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP)
    setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB)
    setAudioChannels(1)
    setSampleRate(44100)
    setEncodingBitRate(16 * 44100)
    setOutputFile(file.absolutePath)
    prepare()
    start()
  }
}

fun stopRecording() {
  recorder?.stop()
  recorder?.reset()
  recorder = null
}

fun getAudioBytes(file: File): ByteArray {
  return file.readBytes()
}

My recording function:

suspend fun captureAudio(recorder: AudioRecorder, ctx: Context): String {
  val audioFile = File(ctx.cacheDir, "sound.3gp")
  recorder.startRecording(audioFile)
  delay(5000)
  recorder.stopRecording()
  val audioBytes = recorder.getAudioBytes(audioFile)
  return encodeToBase64(audioBytes)
}

And my base64 encoder:

fun encodeToBase64(audioBytes: ByteArray): String {
  return Base64.encodeToString(audioBytes, Base64.DEFAULT)
}

Any ideas what I’m doing wrong?

I encountered a similar issue in my project. The problem likely lies in your audio encoding. The API requires signed 16-bit PCM, but you’re using AMR_NB. Try changing your AudioRecorder setup to use AudioFormat.ENCODING_PCM_16BIT and OutputFormat.RAW_AMR. Also, ensure your file extension matches the format (.pcm instead of .3gp). Lastly, consider using AudioRecord instead of MediaRecorder for more precise control over the audio parameters. This should align better with the API’s requirements. If you’re still having trouble, double-check the byte order (little-endian) and verify the sample rate is exactly 44100Hz.

hey there! i struggled with this too. your issue might be the audio format. THREE_GPP and AMR_NB aren’t compatible with the API requirements. try using OutputFormat.MPEG_4 and AudioEncoder.AAC instead. also, make sure you’re not exceeding the 500KB limit. hope this helps! let me know if you need more info

I have encountered similar issues before. The core problem seems to be that the audio format does not fully meet the API requirements. Instead of using MediaRecorder, consider switching to AudioRecord. With AudioRecord you can capture audio in mono at a 44100Hz sample rate using 16-bit PCM. First, set up AudioRecord accordingly and capture the raw PCM data. Then, convert the data to a byte array while ensuring little-endian byte order before encoding it to Base64. This approach should yield a file size within the required 500KB limit and meet the API criteria.