How to simulate DTMF button presses through OpenAI Realtime API integration with Twilio?

I’ve got a phone system set up where my Twilio number connects to OpenAI’s Realtime API using websockets. Everything works fine for voice conversations, but I’m stuck on one thing. I need the AI to be able to simulate pressing phone buttons (like when you hear ‘press 3 for billing’ or ‘press 9 for technical help’).

I know Twilio has ways to generate DTMF tones and the OpenAI Realtime API can call functions, but I can’t figure out how to make them work together. Has anyone managed to get the Realtime API to send button press signals? I’ve been searching but haven’t found any working examples of this setup.

I built a custom websocket handler that connects OpenAI’s function calls to Twilio’s call modification endpoints. When the AI wants to press a button, I catch it through a function schema and fire off a POST request to Twilio’s Calls API with SendDigits. The tricky part was the audio stream - you’ve got to pause the realtime connection while DTMF tones play, or you’ll get messy audio conflicts. Also learned to add a short delay between digit presses since some IVR systems are slow to react.

I built this exact thing a few months back. Use function calling from the Realtime API to trigger DTMF on Twilio’s side. Create a function the AI calls when it needs to press buttons, then hit Twilio’s REST API to send DTMF tones during the call. The tricky bit is keeping call state while sending tones - I added brief pauses so the receiving system actually recognizes the DTMF signals. Watch your timing because some IVR systems are picky about tone duration.

for sure! using twilio’s <Play> verb works well for generating DTMF tones. just make sure your URLs are configured correctly to avoid any issues. got my timing down after a few tries, and it runs smoothly now!

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.