Can Azure OpenAI's o1 model handle visual content like GPT-4 Vision capabilities?

I’ve been working with Azure OpenAI services and I’m curious about the o1 model’s capabilities. I know that some GPT models can process and analyze images, but I’m not sure if the o1 model has these same visual processing features.

I’m trying to build an application that needs to examine pictures and describe what’s in them. Here’s a code example of what I’m trying to achieve:

import openai

ai_client = openai.OpenAI()

result = ai_client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Can you describe what you see in this picture?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/sample-photo.jpg",
                    },
                },
            ],
        }
    ],
    max_tokens=250,
)

print(result.choices[0].message.content)

Is this type of image processing supported by Azure’s o1 model, or do I need to use a different model for visual tasks?

Azure’s o1 model doesn’t have vision capabilities at all. It’s built purely for complex reasoning tasks - no image processing whatsoever. You’ll need GPT-4 Vision or GPT-4o for image descriptions. I hit this same wall last month building something similar and had to completely rework my approach. The o1 models are great for logic and math problems, but they’re text-only right now. Your code looks fine though - just swap the model to “gpt-4-vision-preview” or “gpt-4o” and you’re good to go with Azure OpenAI.

The o1 models don’t have vision capabilities - they’re built purely for advanced reasoning. You’ll need to stick with GPT-4 Vision or GPT-4o through Azure OpenAI for your image description project. I hit this same wall recently building a document analysis tool and ended up running separate endpoints for text reasoning and visual processing. Your code looks right, but swap out o1 for one of the vision-enabled models in the model parameter. Microsoft’s hinted they might add multimodal features to future o1 versions, but no official timeline yet.

hit this same issue yesterday - o1 doesn’t handle images at all. had to switch back to gpt-4o mid-project. super frustrating because o1’s reasoning blows it away, but you’re stuck with text only.

From my experience with Azure OpenAI, the o1 models don’t handle images at all - they’re text-only. You can’t pass image URLs or base64 images to o1-preview or o1-mini through Azure’s API. Only GPT-4 Vision and GPT-4o variants can process images. I ran into this exact issue when migrating projects to o1 for better reasoning. Your code looks fine, but switch the model parameter to “gpt-4” with vision or “gpt-4o” depending on what’s deployed in your Azure setup. Just heads up - Azure’s model availability varies by region, so double-check which vision models you can actually access before you commit to anything.

You’re right to ask about this. The o1 model only handles text - no images whatsoever. It’s built for reasoning tasks.

But here’s what I’ve learned building similar apps - you don’t have to pick between o1’s reasoning power and vision. Chain them together.

I built an image analysis workflow where GPT-4 Vision describes the image first, then passes that description to o1 for deeper analysis or complex reasoning.

The key is automating this so you’re not manually shuttling data between models. I use Latenode for the entire pipeline. It processes images with GPT-4 Vision, grabs the description, then automatically feeds it to o1 when I need advanced reasoning about the visual content.

Best of both worlds without writing complex orchestration code. Your existing code works fine - just plug it into a bigger automated flow.

Latenode makes multi-model workflows like this dead simple: https://latenode.com

nope, the o1 models dont have vision yet. u’ll need to use gpt-4-vision or gpt-4o for anything with images. o1’s all about reasoning right now but cant process visuals.