Integrating GPT-4o Realtime API with Assistant Vector Stores for Voice Chat

I’m developing a mobile application that features voice chat using the GPT-4o Realtime API in combination with Whisper for speech recognition. I’ve managed to set up the system prompts and greeting messages from my Assistant configuration without any issues.

However, I’m struggling to get the Realtime API to correctly connect with the Vector Store associated with my Assistants. Even after uploading the updated knowledge base, the voice chat responses are still based on the last knowledge cutoff date from December 17, 2024, rather than the new information I provided.

I want to ensure that the voice chat feature utilizes the answers from my Vector Store knowledge base and eventually integrates web search functionality using OpenAI’s Web Search. Has anyone implemented a similar setup? Any advice would be greatly appreciated.

Hit the same wall last month building a customer service bot. The Realtime API won’t pull from Assistant vector stores automatically - it’s not like the regular Assistant API. Here’s what worked: I query the vector store separately using retrieval endpoints, then inject those results into the conversation before hitting the Realtime API. You’ve got to handle vector search manually in your app instead of letting the API do it. Structure your system message to include the retrieved context from your knowledge base. More work upfront, but you get way better control over which info gets prioritized.

Yeah, this is a super common issue with Realtime API. I ran into the same thing building a voice app. The problem is that Realtime API works totally differently than regular Assistant calls. Here’s what worked for me: I built a hybrid setup where I query the vector store separately first to grab the relevant chunks, then feed that context into the conversation history or system prompt before starting the Realtime session. The trick is doing your vector queries upfront based on what topics you expect, not trying to retrieve stuff in real-time during the voice chat. Also double-check that your vector store files are actually indexed, not just uploaded - there’s usually a processing delay and if that’s not done, the API just uses base knowledge instead.

the realtime api doesn’t support vector stores yet - super annoying. i built a workaround where i search my vector db when the user starts talking, then pass those results as context in the session config. it works but adds latency, which defeats the whole point of realtime.