I’ve been struggling to figure out how to send multiple requests at once to Azure’s OpenAI service using Python. I want to call the GPT models in batches instead of one by one to make my code more efficient. I searched everywhere but couldn’t find any good examples or guides that show how to do this properly. I’m not really a professional developer so I’m having trouble understanding the documentation. Can someone help me understand how to structure these batch calls correctly? I tried a few different approaches but they don’t seem to work as expected.
I went with a simple queue system using threading instead of async. Created a ThreadPoolExecutor, submitted all requests at once, then collected results as they finished. Key thing is chunking requests into reasonable batches - 10-15 concurrent requests worked best without hammering rate limits. Set up your Azure OpenAI client with retry logic and proper timeouts. Just remember responses come back out of order, so track them with request IDs or enumerate your original prompts. Got about 5x speedup vs running everything sequentially.
I use asyncio with the Azure OpenAI client. Install the azure-openai package first, then write an async function for concurrent requests. Set up your Azure endpoint and API key, then use asyncio.gather() to hit multiple prompts at once. I made the mistake of looping through requests one by one initially - don’t do that, it’s painfully slow. Add error handling for each request since some will fail while others work fine. Azure’s rate limits are different from regular OpenAI, so you might need delays between batches depending on your subscription.
hey alexr1990, i totally get the confusion! for batching, you can use openai.Completion.create() in an async context or with threads. it’s key to have your api key ready and yeah, keep an eye on rate limits so you dont hit any limits.