How can I disable the reasoning output when using thinking models with Ollama through LangChain?

I’m currently using reasoning models with the LangChain Ollama package and have observed that they generate their internal thought processes along with the final output. This extra reasoning makes the responses lengthy and difficult to understand.

I’m looking for a method to configure ChatOllama to prevent this reasoning output so that I receive only the final responses. I want to continue utilizing these reasoning models but without the extra details from their thought processes.

Has anyone discovered a way to exclude the reasoning output while preserving the effectiveness of the reasoning models? Tips on configuration or methods to process the output afterwards would be greatly appreciated.

try lowering the temp setting and adding a custom message like “give only the final answer, no explaination.” it helps reduce all that extra reasoning. it’s not 100% but def helps cut down on the lengthy thoughts.

Had the exact same issue with reasoning models in LangChain. Here’s what actually worked for me: I use a two-step process instead of trying to fix it with prompts. First, let ChatOllama give you the full output. Then run a regex to grab everything after the last “Therefore,” “In conclusion,” or “The answer is.” These models are pretty predictable about how they transition from thinking to answering. One other trick - try lowering the num_predict parameter. Sometimes that forces them to be more concise overall. Basically, reasoning models want to show their work no matter what you tell them. Post-processing is way more reliable than trying to shut them up during generation. You’ll probably need to tweak the regex patterns depending on which model you’re using.

In dealing with similar issues, I’ve found that modifying the prompt can help. Specifically, instruct the model to clearly delineate the reasoning and final output with specific phrases like ‘Final Answer:’ to aid in post-processing. Another approach involves implementing output parsing techniques to extract just the final response by identifying specific delimiters in the output. Experiment with system prompts, as they can influence how the model structures its output. This way, you retain the reasoning process without compromising clarity in the final output.

Yeah, regex and prompt tricks work sometimes, but they break when models change their output format or start using different transition phrases.

I deal with this constantly at work across different reasoning models. Each one structures thoughts vs conclusions differently.

What fixed it for me was building an automated pipeline that parses intelligently. Instead of hardcoded patterns, I use a system that learns to spot reasoning sections vs final answers no matter what phrases they use.

The automation processes your ChatOllama responses in real time, strips the thinking parts, and gives you clean answers. Works with any reasoning model and adapts when they change style.

Set it up once and you’re done. No more tweaking regex or adjusting prompts every time you switch models.

This scales way better than manual parsing, especially with multiple reasoning models or production systems.

Check out https://latenode.com.

I’ve hit this exact issue building automated workflows with reasoning models.

Most solutions need constant manual tweaking and still give inconsistent results. I ended up creating an automation pipeline that handles this cleanly.

My workflow grabs the raw Ollama output, finds the reasoning sections vs the final answer, and strips everything except what I need. The trick is pattern recognition to spot where the real answer starts, no matter how the model formats its thinking.

This beats trying to prompt-engineer around it. Reasoning models are built to show their work - fighting that with prompts is swimming upstream.

The automation does all the parsing, so you get clean, consistent outputs every time without losing the reasoning power. Send your question, get exactly what you need back.

Check out https://latenode.com for building this type of output processing pipeline.

just wrap your chatollama call in a function that splits on phrases like “final answer” or “conclusion”. way easier than tweaking temps and prompts. I use response.split('final answer:')[-1].strip() and it works most of the time.