How to send images to Azure OpenAI API for analysis?

I’ve been using the ChatGPT web interface where you can upload pictures and ask questions about them. This works great on the website. But when I try to do the same thing using the @azure/openai library in my code, I can’t figure out how to include images in my requests.

The chat completion methods I’m working with seem to only accept text messages. I’ve looked through the documentation but haven’t found a clear way to attach image files to my API calls.

Does anyone know if the Azure OpenAI service actually supports image inputs through their API? If it does, what’s the correct way to format the request to include both text and images?

totally! azure openai can process images too. just convert your image to base64 and include it like this: {"type": "image_url", "image_url": {"url": "_BASE64_STRING"}} with your text. hope it helps!

Yeah, Azure OpenAI supports vision through their API, but there’s a few things to watch out for. You’ll need a vision-enabled model like GPT-4 Vision. In your messages array, each message can have multiple content parts - text and image. For images, set the type to “image_url” and use either a direct URL or base64 data URL. I got confused at first too since the basic chat examples only show text, but the vision model docs explain the multi-modal format. Just double-check your Azure deployment actually supports vision - not all regions or model versions have it.

Yeah, azure opanai handles image analysis - just need the right setup. Use the chat/completions endpoint with gpt-4-vision-preview. In the content param, mix text and image_url objects in an array. Worked once I switched from regular gpt-4.

Hit this same problem last month switching from web to API. The trick that got me was you can’t just pass a string - you need an array for the message content. For images, your message object needs a content array with both text and image objects. Set the image object type to “image_url” and put your actual image data in the image_url field. Make sure you’re on a newer @azure/openai version - older ones barely support multimodal stuff. Fair warning: images eat way more tokens than text, so watch your costs.