Comparing two methods for implementing RAG with Azure OpenAI and AI Search using Python

mikechen · June 22, 2025, 10:29pm

I’m trying to figure out which method works better for building a RAG system with Azure OpenAI and AI Search using Python. I’ve tested both approaches and they both give good results, but I’m confused about when to use each one.

Method 1: Manual search query handling

az_openai = AzureOpenAI(
    api_version="2024-05-01",
    azure_endpoint=OPENAI_ENDPOINT,
    azure_ad_token_provider=auth_provider
)

search_service = SearchClient(
    endpoint=SEARCH_ENDPOINT,
    index_name="products-index",
    credential=search_creds
)

CONTEXT_TEMPLATE="""You are a helpful product recommendation assistant.
User Question: {user_query}
Relevant Information:\n{context_data}"""

user_question = "What laptops do you have with good battery life?"

results = search_service.search(
    search_text=user_question,
    top=3,
    select="ProductName,Features,Specifications"
)

context_text = "\n".join([f'{item["ProductName"]}:{item["Features"]}:{item["Specifications"]}' for item in results])

answer = az_openai.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": CONTEXT_TEMPLATE.format(user_query=user_question, context_data=context_text)
        }
    ],
    model=MODEL_NAME
)

Method 2: Built-in search integration

service_endpoint = os.getenv("OPENAI_SERVICE_URL")
service_key = os.getenv("OPENAI_ACCESS_KEY")
model_name = os.getenv("OPENAI_MODEL_NAME")

ai_client = openai.AzureOpenAI(
    azure_endpoint=service_endpoint,
    api_key=service_key,
    api_version="2024-03-01"
)

response = ai_client.chat.completions.create(
    model=model_name,
    messages=[
        {
            "role": "user",
            "content": "Show me available insurance options"
        }
    ],
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": os.getenv("SEARCH_SERVICE_URL"),
                    "index_name": os.getenv("SEARCH_INDEX_NAME"),
                    "authentication": {
                        "type": "api_key",
                        "key": os.getenv("SEARCH_SERVICE_KEY")
                    }
                }
            }
        ]
    }
)

The second method seems much simpler, but I’m wondering if there are important benefits to the first approach that I’m missing. When would you choose one over the other?

lucasg · June 30, 2025, 7:39pm

It’s all about control vs convenience. I’ve used both methods on different projects, and here’s what I’ve found: Method 1 (manual handling) is a must when you need complex filtering or custom retrieval logic. Like when you’ve got business rules to apply before results hit the LLM, or you’re doing semantic search with custom embeddings - manual gives you that flexibility. Method 2 works great for simple stuff where default search does the job. But I’ve hit walls trying to customize retrieval or debug problems - that black box makes troubleshooting a pain. Manual also lets you swap search strategies easily or pull from multiple sources, which has saved me in enterprise projects where requirements keep shifting.

joec · June 29, 2025, 12:43am

Performance and error handling are where these approaches really differ. Method 2’s built-in integration adds latency - you’re making one call that handles search and generation internally, which times out on complex queries. I’ve hit situations where search works fine but generation fails, and you get cryptic errors without knowing what broke. Method 1 lets you implement retry logic for each step separately and cache search results so you’re not making redundant calls. The manual approach also gives you better token control - you can truncate or prioritize context chunks before hitting limits, while the integrated method just fails silently or returns incomplete responses when context gets too large.