I’m having trouble confirming that my LangChain setup is actually using the correct model. I configured a pipeline with Mistral-7B through HuggingFace but when I check the chain details, it shows a different model name.
Here’s my setup code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferWindowMemory
# Load the model
llm_model = AutoModelForCausalLM.from_pretrained(
'mistralai/Mistral-7B-Instruct-v0.1',
cache_dir='/cache/path',
trust_remote_code=True,
torch_dtype=torch.float16,
device_map='cuda',
use_auth_token=my_token
)
# Create pipeline
text_pipeline = pipeline(
'text-generation',
model=llm_model,
tokenizer=my_tokenizer,
max_length=512,
temperature=0.1,
do_sample=True,
repetition_penalty=1.2
)
# Wrap in LangChain
my_llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={'temperature': 0.1})
But when I inspect my_chain, the model_id field shows gpt2 instead of Mistral. Is this normal behavior or am I missing something in my configuration? How can I make sure the right model is actually being used?
Yeah, that gpt2 label is misleading but totally harmless - it’s just LangChain’s default when it wraps custom HuggingFace models. Your Mistral setup is working fine. Want to double-check you’re actually running Mistral? Check the tokenizer vocab size with len(my_tokenizer.get_vocab()). Mistral has around 32,000 tokens, GPT-2 has about 50,257. That’ll give you solid proof of what’s loaded. You can also verify by checking the attention mechanism - run my_llm.pipeline.model.config.architectures and you should see ‘MistralForCausalLM’ instead of GPT-2 architecture. I’ve seen this same identifier issue with other custom models. The actual model runs perfectly, but I’d recommend commenting the real model name in your code so you don’t get confused later.
Mistral-7B has around 7 billion parameters. GPT-2 has way less. If you see 7B+ parameters, you’re good.
Also check the model’s generation behavior. Feed it something like “Explain quantum physics” and compare responses. Mistral gives much more detailed, structured answers than GPT-2.
One thing - you’re setting temperature twice (once in pipeline, once in model_kwargs). The model_kwargs override might not work with HuggingFacePipeline. Stick with the pipeline temperature setting.
check memory usage with torch.cuda.memory_allocated() before and after loading - mistral-7b eats way more vram than gpt-2. also print my_llm.pipeline.model.__class__.__name__ to see what model class you’re actually using. should show something mistral-related, not gpt-2.
I’ve hit this exact same issue with custom models in LangChain. That ‘gpt2’ model_id you’re seeing? It’s just LangChain’s default fallback identifier when it wraps HuggingFace pipelines. Your Mistral-7B is actually running fine.
Want to double-check which model’s loaded? Try my_llm.pipeline.model.config.name_or_path - that’ll show you the real model path. Or just run a test prompt and look at the output. Mistral and GPT-2 responses are totally different, so you’ll know right away.
One more tip: add model_id='mistralai/Mistral-7B-Instruct-v0.1' when you create the HuggingFacePipeline wrapper. Won’t change how it works, but it overrides that confusing default identifier and makes debugging way easier.
Yeah, that gpt2 identifier is just LangChain’s HuggingFacePipeline wrapper being weird with labels. Hit the same issue when I switched from OpenAI to local models. Check where your model weights actually live - run print(my_llm.pipeline.model.config) and look at the transformers_version and model_type fields. You’ll see model_type as ‘mistral’, not ‘gpt2’. Easy way to tell: response quality and speed. Mistral-7B is way slower than GPT-2 because it’s much bigger, plus the writing style is totally different. If you’re getting solid, detailed responses with obvious lag time, that’s Mistral running fine. I throw in a quick test prompt after setup like “What model are you?” - not perfect, but different models usually answer meta questions in their own way.