Your memory system’s treating every interaction the same - that’s the problem. Recent stuff should matter way more than something from 10 turns back.
I’ve hit this same wall in production. Here’s what actually worked: tiered memory. Keep the last 4-5 exchanges in immediate memory and inject them into every prompt. No retrieval needed.
For vector storage, don’t store individual messages. Chunk conversations by topic or time windows instead. When users reference earlier stuff, you’ll grab full context instead of random fragments.
Your temperature’s too high at 0.8 for memory tasks. I stick to 0.3-0.5 when consistency beats creativity.
Add recency scoring to retrieval too. Weight recent conversations higher even if they’re less semantically similar. Stops the system from pulling ancient content when users clearly mean recent exchanges.
This setup’s been solid across multiple conversational systems I’ve built. Remember - conversation memory isn’t just semantic search. It’s about keeping things coherent.