How to retrieve token usage statistics from LangChain LLM responses

I’m working with LangChain and need to track how many tokens are being used in my LLM calls. Most AI libraries make this really easy by returning token counts right in the response object.

But with LangChain I can’t seem to find a clear way to get the input token count and output token count after making a call to my language model. I’ve been looking through the docs but nothing jumps out as obvious.

Does anyone know the right way to access these token metrics? I’m hoping there’s a simple property or method I can use to get both the prompt tokens and completion tokens from my LLM responses.

Any working examples would be super helpful since I need to monitor usage for billing purposes.

Hit this exact problem last year when optimizing our ML pipeline costs. The generation_info approach works, but there’s a cleaner way.

Use get_num_tokens_from_messages() for input counting and enable callbacks for automatic tracking. Set up a callback handler that captures token usage in real time instead of digging through response objects later.

Here’s what I do - initialize your LLM with callbacks=[StdOutCallbackHandler()] or create a custom callback class. The callback fires with detailed usage stats: tokens, model info, timing data.

This scales way better with hundreds of calls. We switched and our billing tracking got much more reliable. Plus you get streamed data instead of parsing after each response.

Callbacks also work consistently across LangChain versions. I’ve seen generation_info structure change between updates, which broke our monitoring several times.

Token usage data is buried in the response metadata - here’s where to find it. Check the generation_info attribute on your LangChain LLM response object. It’s got a dictionary with prompt tokens, completion tokens, and total tokens. I only figured this out by digging through the response structure in a debugger since the docs weren’t much help. The key names change depending on your LLM provider - OpenAI uses token_usage under generation info. Use the newer chat models instead of the old completion ones. The metadata structure is way more consistent.