Using JSON response format with the GPT-4 Vision model - validation error issue

I’m facing a challenge when trying to enable the JSON response format for the GPT-4 Vision model. Each time I add the response_format option in my request, I encounter a validation error indicating that extra fields are not allowed. If I leave this parameter out, the request goes through without any errors.

Here’s the code I’m currently using:

request_headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

request_payload = {
    "model": "gpt-4-vision-preview",
    "response_format": {"type": "json_object"},
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant. Please respond in JSON format."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": user_input
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_encoded_image}"
                    }
                }
            ]
        }
    ],
    "max_tokens": 1000,
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=request_headers, json=request_payload)
print(response.json())

The validation error message I’m getting is:

{'error': {'message': '1 validation error for Request\nbody -> response_format\n  extra fields not permitted (type=value_error.extra)', 'type': 'invalid_request_error', 'param': None, 'code': None}}

Is there an alternative method to turn on JSON mode for vision models, or is this feature currently unsupported?

Yeah, vision models don’t support response_format at all. Hit the same wall a few weeks ago building an image classifier. The API just doesn’t recognize that parameter for vision endpoints. What saved me was getting super specific in my system message about the JSON structure I wanted. I throw in a concrete example right in the prompt like “Return JSON with this structure: {“result”: “your_analysis”, “confidence”: 0.95}”. Also worth adding some JSON validation on your side since without native JSON mode, the model sometimes spits out broken JSON or adds random text around it.

Yeah, that validation error is totally expected - GPT-4 Vision doesn’t support the response_format parameter for structured JSON responses. It’s a limitation that only hits the vision models, while regular GPT-4 handles it fine. I hit this same wall about two months ago working on an image analysis project. My workaround is getting really explicit in the system prompt about the JSON structure you want. Skip the response_format parameter entirely and just spell it out: “Always respond with a JSON object containing these keys: analysis, confidence, details” - then throw in an example. It’s not as bulletproof as native JSON mode, but works pretty well if you’re specific. You’ll probably want some client-side validation too since the model doesn’t always stick to the format perfectly.