Azure OpenAI chat bot ignoring document context and using external knowledge

Luke_Brilliant · August 16, 2025, 2:22pm

I’m working on a document-based chatbot using Azure OpenAI services. The bot uses gpt-35-turbo for chat and text-embedding-ada-002 for embeddings. The weird thing is that when I use regular OpenAI models, the bot only answers from my uploaded documents like it should. But with Azure OpenAI, it keeps pulling answers from its training data instead of sticking to my PDF files.

ai_client = AzureChatOpenAI(
    api_key=os.getenv("AZURE_KEY"),
    api_version=os.getenv("API_VERSION"),
    azure_endpoint=os.getenv("AZURE_ENDPOINT")
)

vector_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2024-04-01-preview",
    openai_api_type="azure",
    api_key=os.getenv("AZURE_KEY"),
    azure_endpoint=os.getenv("AZURE_ENDPOINT"),
    azure_deployment="text-embedding-ada-002",
    chunk_size=500
)

doc_splitter = RecursiveCharacterTextSplitter(chunk_size=15, chunk_overlap=3)
chunks = doc_splitter.split_documents(pdf_documents)

vector_db = Chroma.from_documents(documents=chunks, embedding=vector_embeddings)
doc_retriever = vector_db.as_retriever()

context_prompt = (
    "Based on chat history and current user question, "
    "create a standalone question that makes sense "
    "without needing the conversation history. "
    "Don't answer it, just rephrase if necessary."
)

context_template = ChatPromptTemplate.from_messages([
    ("system", context_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

history_retriever = create_history_aware_retriever(
    ai_client, doc_retriever, context_template
)

answer_prompt = (
    "You help users find information from documents. "
    "Only use the context provided below to answer questions. "
    "If the answer isn't in the context, say you don't know. "
    "Never use external knowledge beyond the provided documents. "
    "Keep responses under five sentences.\n\n{context}"
)

qa_template = ChatPromptTemplate.from_messages([
    ("system", answer_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

doc_chain = create_stuff_documents_chain(ai_client, qa_template)
full_chain = create_retrieval_chain(history_retriever, doc_chain)

def get_chat_history(session_id: str):
    if session_id not in chat_store:
        chat_store[session_id] = ChatMessageHistory()
    return chat_store[session_id]

chat_store = {}
final_chain = RunnableWithMessageHistory(
    full_chain,
    get_chat_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer"
)

Why does Azure OpenAI ignore my instructions to only use document context? The same setup works fine with regular OpenAI.

alexm · August 22, 2025, 1:35pm

Hit this exact problem migrating from OpenAI to Azure OpenAI a few months ago. Azure handles system prompts and temperature settings differently.

Your chunk size of 15 is way too small - that’s barely a sentence. Bump it to 1000-1500 characters minimum. Tiny chunks mean the retriever can’t find meaningful context, so the model just uses its training data instead.

You’re missing the temperature parameter in your AzureChatOpenAI client. Azure OpenAI runs more “creative” by default. Add temperature=0.1 to make it stick to your context.

Be more aggressive with your system prompt. Skip “Only use the context provided” and try “STRICTLY answer only from the context below. If information isn’t explicitly stated in the context, respond with ‘I don’t have that information in the provided documents.’”

Regular OpenAI vs Azure OpenAI behave differently because of different model versions and fine tuning. They’re supposedly the same models, but Azure versions act weird sometimes.

This walkthrough covers similar contextual chatbot issues and shows good debugging techniques when your bot goes off track.

miat · August 22, 2025, 10:20am

I hit this same issue with Azure OpenAI on a document QA system. Your chunk_size=15 is way too small - that’s literally 2-3 words per chunk. Your retriever’s probably returning useless fragments that don’t give the model enough context to work with. When there’s not enough relevant info, Azure OpenAI just pulls from its training data instead.

Bump your chunk_size to 800-1200 characters with chunk_overlap around 200. Also, print out what your retriever’s actually returning - I bet the chunks aren’t as relevant as you think.

One more thing - check your Azure deployment settings in the portal. Sometimes those override your API parameters in weird ways. Make sure your gpt-35-turbo deployment doesn’t have content filtering or custom settings messing with context adherence.