I’m running into a frustrating issue with my conversational AI setup. I’ve created MCP tools with detailed descriptions that clearly explain what each tool does, what inputs they expect, and what outputs they return. I even added example usage scenarios in my system prompt to help guide the model.
But here’s the problem - my LangChain agent keeps choosing the wrong tools for user requests. Even when a query seems like an obvious match for a specific tool, it either picks something completely different or doesn’t use any tool at all.
I’m using Ollama with the llama3.2:1b model running locally, Python for the backend, and LangChain to build the agent. Has anyone else faced similar tool selection issues? What debugging steps or configuration changes helped you improve tool selection accuracy?
Any working code samples or troubleshooting tips would be greatly appreciated!
Tool selection breaks down because you’re asking one system to juggle too much - parsing language, understanding intent, matching tools, and executing everything at once.
I fixed this with a preprocessing layer that analyzes requests before they reach the agent. Just simple pattern matching and keyword detection to narrow tool choices or route directly when I’m confident about the intent.
Build decision trees for different request types. Data analysis questions? Show only analysis tools. File operations? Filter out everything else. Takes the guesswork away from your 1b model.
Separate your routing logic from conversation handling. Keep the local model for chat, but get reliable tool selection through proper request classification.
Set up conditional branches that check user input patterns, then either route straight to tools or give your agent a filtered subset. No more random tool picking from smaller models.
I’ve automated this for several production systems and it works great. The workflow handles decision logic while your model sticks to what it’s good at.
Latenode makes building these routing workflows really easy: https://latenode.com
check if your tool descriptions overlap too much - that’s what broke mine. same problem with 1b models where they can’t tell apart similar-sounding tools. make each description really unique and ditch any tools that basically do the same thing.
I’ve encountered a similar problem with smaller models. The llama3.2:1b you’re utilizing may lack the capacity for effective tool selection since I’ve found that models under 7B often struggle with the reasoning required to differentiate between multiple tools. To enhance accuracy, ensure that your tool names clearly indicate their functions and limit the number of tools presented at once. Additionally, adopting a two-step process can be beneficial; first, determine the user’s intent with a straightforward prompt, then display only the relevant tools. This method simplifies decision-making. Lastly, re-evaluate your prompt design; overly detailed tool descriptions can confuse smaller models, so focus on the essential functions and necessary parameters.
Yeah, model size is a real issue, but there’s a way around it that doesn’t trap you with local model limits.
I’ve hit this tool selection mess in production before. The real problem isn’t just model capacity - you’re making one component juggle conversation AND tool routing. Creates a nightmare decision tree that even big models screw up.
Split it into separate workflows instead. One handles conversation, another does tool selection with structured rules, and a third coordinates them. Set up conditional logic that routes requests by keywords or patterns before the LLM even sees tool options.
This kills the guesswork completely. No more hoping the model picks right - you get deterministic routing that hits the correct tool every time. Keep your local model for conversations while getting bulletproof tool selection.
I’ve automated this exact setup for multiple projects and it’s rock solid. Handles all the orchestration, manages conditional logic, and you can add fallback chains for unexpected stuff.
Latenode makes building and managing these workflows dead simple: https://latenode.com
Tool selection issues usually come from bad agent setup, not model problems. Had this exact headache building a document processor last year. Fixed it by ditching LangChain’s default executor and writing a custom tool selector that validates inputs first. Here’s what worked: add validation logic that checks if the user’s request has all the required parameters before letting the agent pick a tool. Put few-shot examples directly in your tool descriptions - don’t just stick them in the system prompt. The model needs to see concrete examples right when it’s deciding which tool to use. Turn on verbose logging in LangChain to see the agent’s reasoning. You’ll probably find it’s getting tripped up by confusing parameter names or tools that do similar things but aren’t clearly different in your descriptions.
This is probably a LangChain issue with how it builds the agent’s decision context. I had the exact same problem with local models - turns out the agent wasn’t getting clean tool schemas during execution. LangChain mangles tool descriptions when converting them for the model, especially with function calling on smaller models. Here’s what fixed it for me: Use Pydantic schemas to define your tools explicitly instead of docstrings. This forces cleaner schema generation. Turn on debug mode and check the agent’s intermediate steps. You’ll likely see the model getting garbled tool info or incomplete parameter descriptions. If you’re using function calling agents, switch to ReAct agents. They handle tool selection way better with smaller models - the reasoning traces make debugging easier and the step-by-step approach works better when you don’t have tons of model capacity.
Debug your agent’s tool execution step by step. Had the same prob - LangChain was ignoring parts of my tool descs when parsing schemas. Fixed it by using single verbs for tool names and removing any overlap between what each tool does.
try lowering your temperature to 0.1 first - had the same prob and that fixed tool sellection consistency for me. also check if your system prompt’s too long. shorter prompts usually work better with smaller models.
Tool selection problems happen when you force one agent to do everything. It’s an architecture issue.
I hit this building a data pipeline manager - agents kept grabbing file processors for API calls. Sound familiar?
Don’t fix your current setup, redesign it. Build a request classifier that figures out what operation you need before tools get involved. Then route to specialized mini-agents that only handle their specific stuff.
For you: build input analysis that categorizes requests first. Text processing? Data retrieval? File operations? Each gets its own tools and logic.
This kills the guessing game. Your 1b model never sees conflicting options because routing happens upstream with deterministic rules.
I’ve built this pattern multiple times - it destroys traditional agent setups for reliability. Preprocessing handles complex decisions while your local model just focuses on conversation.
Set up conditional workflows that analyze request patterns, classify intent, then auto-route to the right tools. No more hoping your model picks correctly.
Latenode handles all this workflow orchestration and routing perfectly: https://latenode.com
The model doesn’t have enough context when it needs to pick a tool. Llama3.2:1b gets overwhelmed when you throw multiple similar tools at it all at once. I fixed this by adding a scoring system that ranks tools by keyword matches before the agent even sees them. Map your user inputs to tool priorities, then only show the top 2-3 relevant tools instead of dumping the whole toolkit on it. Also, check your tool descriptions for vague language - I switched from “handles customer information” to “retrieves customer data” and saw way better accuracy. Just reduce the cognitive load while keeping everything functional.