N8n RAG Agent Bypasses Tool Execution Every 7th Request - Need Solution

Hi there,

I’ve got n8n running in a Docker container on my local machine and I’m facing a weird problem with my RAG setup. My agent has this strange behavior where it refuses to execute tools on every 7th user request.

Here’s what’s happening:

  • I have an agent with three different tools configured
  • For the first 6 queries, everything works perfectly and all tools get called as expected
  • On the 7th query, the agent just uses cached responses from memory instead of running the tools
  • This exact same pattern keeps happening over and over

Even when I clear the memory session completely, the problem comes back in the same way. The 7th request always gets handled from memory only.

I’m using GPT-4o-mini for the main agent processing. Has anyone seen this kind of issue before? Any ideas on what might be causing this consistent pattern?

I encountered a similar issue with n8n agents, where the memory buffer configuration was causing unexpected behavior. After several interactions, my agent began prioritizing conversation history over tool execution as the memory buffer reached its limit. This was due to the agent’s setting that retains context between requests, which led to cached responses being used instead of executing the tools. It would be beneficial to review your memory node settings for any conversation buffer limits and check for execution constraints on your tools or any circuit breakers in your workflow that could lead to fallback on cached responses after a specific number of calls.

that’s a weird pattern. check your workflow execution settings - n8n sometimes has hidden configs that kick in after a certain number of runs. also look for any conditional logic counting executions. docker logs might show what’s happening when it hits that 7th request.

This looks like n8n’s internal rate limiting or caching kicking in. I had the same problem with my RAG workflows - turned out to be execution throttling settings in n8n. Check if you’ve got any execution limits or queuing configs that might be causing this. Also worth checking your Docker container’s memory allocation. If it’s too low, n8n might fall back to cached results to avoid overloading the system. Could also be GPT-4o-mini starting to rely more heavily on conversation context after a bunch of interactions. Try testing with a different model to see if the problem sticks around. The fact that it consistently happens on the 7th request definitely points to some kind of configuration threshold.