Implementing MediaRecorder with Shazam API via RapidApi

Question Overview

I’m working on a project that mimics the functionality of Shazam. I’ve managed to record audio and transform it into a base64 string as required by the API; however, it fails to identify any songs. This issue likely stems from either the recording or decoding process being incorrect. Currently, I am inputting my data directly into the API, eliminating the possibility of connection problems.

API Requirements

The API specifies that I need to provide an encoded base64 string from raw audio data that’s under 500KB. Sample durations of 3-5 seconds are sufficient for song recognition. The required specifications are:

  • 44,100Hz sample rate
  • Mono channel
  • 16-bit signed PCM little endian
    Media types like mp3, wav, etc., are not supported.

Audio Recorder Implementation

I’ve built an AudioRecorder class to capture audio:

override fun initiate(outputFile: File) {
    audioRecorder().apply {
        setAudioSource(MediaRecorder.AudioSource.MIC)
        setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP)
        setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB)
        setAudioChannels(1)
        setAudioSamplingRate(44100)
        setAudioEncodingBitRate(16 * 44100)
        setOutputFile(outputFile.absolutePath)

        prepare()
        start()

        recorder = this
    }
}

override fun halt() {
    recorder?.stop()
    recorder?.reset()
    recorder = null
}

fun extractAudioData(file: File): ByteArray {
    return file.readBytes()
}

I also have a function that triggers the recording at the appropriate moment:

private suspend fun beginAudioRecording(audioRecorder: AudioRecorder, context: Context): String {
    Log.d("your.package.name", "Starting audio recording")
    // Begin recording
    File(context.cacheDir, "recorded_audio.3gp").also {
        audioRecorder.initiate(it)
        recordedFile = it
    }

    // Record for 5 seconds
    delay(5000)

    // End the recording
    audioRecorder.halt()

    // Convert recorded audio to Base64 string
    val recordedData = audioRecorder.extractAudioData(recordedFile!!)
    return transformToBase64(recordedData)
}

Base64 Conversion Function

Finally, here is my function for converting to base64:

private fun transformToBase64(audioData: ByteArray): String {
    return Base64.encodeToString(audioData, Base64.DEFAULT)
}

Hi CreativePainter33,

I see you're working on improving your audio processing for better integration with Shazam's API. Here's how you can ensure your recording meets the API requirements more accurately:

1. Audio Format Adjustment

Currently, your code uses the THREE_GPP format with an AMR_NB encoder, which might not satisfy the raw PCM format expectation. Consider using MediaRecorder.OutputFormat.RAW_AMR if you need raw PCM, but since it's not directly achievable via MediaRecorder, you might have to handle PCM data differently.

2. Manual PCM Handling

To ensure you're processing PCM audio data correctly, consider capturing raw audio with AudioRecord instead. Here's a basic structure:


val bufferSize = AudioRecord.getMinBufferSize(44100,
    AudioFormat.CHANNEL_IN_MONO,
    AudioFormat.ENCODING_PCM_16BIT)

val recorder = AudioRecord(MediaRecorder.AudioSource.MIC,
    44100,
    AudioFormat.CHANNEL_IN_MONO,
    AudioFormat.ENCODING_PCM_16BIT,
    bufferSize)

val audioData = ByteArray(bufferSize)
recorder.startRecording()
recorder.read(audioData, 0, audioData.size)
recorder.stop()
recorder.release()

This raw audioData can be directly encoded into a Base64 format for API submission, ensuring compliance with the API's required raw PCM format.

3. Verify Sample Size

Ensure your audio data doesn't exceed 500KB before encoding, as oversized data can lead to API issues.

After making these adjustments, you should see an improvement in the API's ability to recognize the songs.

Happy coding!
David

For your audio processing, you're encountering issues primarily because the encoding format you're using is not aligned with the API's requirements. Let's address this challenge step-by-step.

1. Use AudioRecord for Raw PCM Data

The MediaRecorder you're using doesn't directly capture raw PCM data, which is necessary for the Shazam API. Instead, try using AudioRecord to access raw audio:


val sampleRate = 44100
val audioFormat = AudioFormat.ENCODING_PCM_16BIT
val channelConfig = AudioFormat.CHANNEL_IN_MONO

val bufferSize = AudioRecord.getMinBufferSize(sampleRate, channelConfig, audioFormat)

val recorder = AudioRecord(MediaRecorder.AudioSource.MIC, sampleRate, channelConfig, audioFormat, bufferSize)

val audioData = ByteArray(bufferSize)
recorder.startRecording()
recorder.read(audioData, 0, audioData.size)
recorder.stop()
recorder.release()

This ensures that the audio data you retrieve is in the required PCM format.

2. Base64 Encoding the Correct Data

With the raw PCM data captured through AudioRecord, you can now safely convert it to a base64 string which will meet the API's expectations:


private fun transformToBase64(audioData: ByteArray): String {
    return Base64.encodeToString(audioData, Base64.DEFAULT)
}

This function remains suitable for the task, assuming the input byte array is correctly retrieved from your raw audio data process.

3. Size Management

After retrieving the raw audio, ensure the file size before encoding stays within the 500KB limit. Consider recording for a precise duration and adjusting the buffer size if necessary to manage the data volume.

By switching from MediaRecorder to AudioRecord, you'll align your recording process with the needed format, which should improve API accuracy in recognizing the song.

Feel free to experiment with these suggestions, and best of luck integrating with the Shazam API!

Hey CreativePainter33,

You're headed in the right direction, but for raw PCM audio data compliance, consider using AudioRecord instead of MediaRecorder:

val sampleRate = 44100
val audioFormat = AudioFormat.ENCODING_PCM_16BIT
val channelConfig = AudioFormat.CHANNEL_IN_MONO

val bufferSize = AudioRecord.getMinBufferSize(sampleRate, channelConfig, audioFormat)

val recorder = AudioRecord(MediaRecorder.AudioSource.MIC, sampleRate, channelConfig, audioFormat, bufferSize)

val audioData = ByteArray(bufferSize)
recorder.startRecording()
recorder.read(audioData, 0, audioData.size)
recorder.stop()
recorder.release()

This setup ensures you're getting the PCM format that the API requires. Convert this audioData to Base64 before sending it to the API using your transformToBase64 function.

Ensure your data size is under 500KB and record for only 3-5 seconds to stay within limits.

Good luck with your project!