How to manage user sessions efficiently in a real-time LangChain voice application with FastAPI?

Hi there! I need some guidance on handling multiple users at the same time in my voice chat app.

What I’m using:

  • Server: FastAPI with WebSocket connections
  • Voice processing: Gemini Live API for speech recognition and text-to-speech
  • AI setup: LangChain with two separate agents:
    1. A “Router” agent (StreamAgent) that processes the voice data and handles simple function calls
    2. A main “Core Agent” (ChatAgent) that gets called by the router, has database tools, and keeps track of chat history with ConversationBufferMemory

My main problem: Handling user data properly

Right now when someone connects via WebSocket, I make a brand new ChatAgent object for them. This keeps each user’s data separate, including their language preferences and chat memory.

But I’m wondering: Is creating one agent per user connection the right way to do this for a real app?

I’m worried about RAM usage when lots of people are connected at once, each having their own agent running in memory.

What other ways could I structure this? Maybe:

  • Use something like Redis to store user chat history separately, and have shared agent workers that don’t keep state?
  • What do most people do to keep user conversations private and separate in WebSocket LangChain apps?

I want to get the architecture right before I launch this thing. Any tips from people who have done this before would be awesome!

just serialize the conv state to your db after each exchange and kill the agent instance right away. way simpler than dealing with pools or redis caching - memory usage stays flat no matter how many users u have. when the user sends their next msg, deserialize the state, spin up a fresh agent, then save agn. sure, there’s a bit of latncy, but it beats getting OOM crashes when u scale up.

You’re right to question one-agent-per-user. Hit the same memory wall around 200 concurrent users. Fixed it with a hybrid setup - lightweight session manager tracks user state and conversation context, but the actual LangChain agents are pooled and stateless. Moved all user-specific stuff (chat history, preferences) to PostgreSQL with proper indexing for quick lookups. Request comes in, grab user context, pull an agent from the pool, inject context, process, then toss the agent back. Now I handle thousands of users with 10-20 agent instances. DB lookup time is nothing compared to the memory you save, plus you get real persistence when users disconnect.

been there - creating agents per user is a nightmare. i switched to stateless agents with redis for session data and it’s so much better. just pass the user context with each request instead of storing it in memory. scales way cleaner.

Skip the agent pools and manual session stuff - just automate everything. I built something similar last year and wasted time with Redis before realizing I was making it way too complicated.

Automate the whole session lifecycle. Set up workflows that spin up user contexts when needed, handle the handoff between router and core agents, and clean up when sessions end.

For your voice app, you’d have triggers that create lightweight session containers, route requests to available agents, and save conversation data. You don’t manage any of the complexity - the memory issue disappears because everything gets orchestrated efficiently.

This pattern works great for real-time apps. Automation scales up and down with load, manages database connections for chat history, and keeps user data isolated without persistent agent overhead.

Your FastAPI endpoints become simple triggers for automated workflows. No more memory leaks or connection headaches.

Latenode handles this session automation perfectly. It’s built for these workflow orchestration challenges: https://latenode.com