Extracting specific data from OpenAI responses in n8n workflow with JSON formatting

I need help setting up an OpenAI node in n8n that can pull out certain information from user messages and format it as structured JSON.

Basically I want to grab things like customer email and location from their text and get back something like:

{
  "message": "Thanks for that info! What else can I help with? 🙂",
  "extracted_data": [
    {
      "field_type": "Email",
      "results": [{ "content": "[email protected]" }]
    },
    {
      "field_type": "Location", 
      "results": [{ "content": "New York" }]
    }
  ]
}

What prompt instructions work best for this? How do I set up the OpenAI node settings to get consistent JSON output every time? Looking for ways to make sure the response is always in the right format for processing.

Set max tokens to 500-800 - OpenAI cuts off JSON responses halfway without it. Also throw 2-3 sample inputs/outputs in your prompt before the actual request. Way more effective than just describing the format.

hey omarR_85, i totally feel you on this! make sure to include “respond only in valid json format” at the end of your prompt. keeping the temp around 0.1 has helped me a lot for consistency. also, try wrapping your prompt like “extract the following data and return as json:” then list your fields.

JSON schema validation in your prompt is the key. That exact example structure you showed works way better than just asking for JSON format. Use function calling mode on the OpenAI node if you can, or throw a JSON parse node right after the OpenAI response to validate it. For prompts, I start with “Parse this message and extract data using this exact JSON structure” then paste your format. Keep temperature super low - I use 0.05 for data extraction, even lower than Alice suggested. Don’t forget a fallback like “return empty results array if no data found” to handle weird edge cases.

Data extraction workflows turn into a nightmare when you’re juggling multiple APIs and parsing logic. Been there way too many times.

Everyone obsesses over prompt engineering, but that’s missing the point. You’re manually babysitting all these moving pieces. What about validation? Sending data to your CRM? Handling failures?

I ditched the manual approach and automated the whole thing. Set triggers for user messages, auto-run extraction, validate JSON, and route clean data wherever it goes. No more prompt babysitting or workflow debugging.

Extraction works exactly like you want - emails, locations, any fields you need. Plus you get error handling, validation, and tool connections without custom code.

Your JSON structure’s perfect for automation. Define extraction rules once, let the system handle OpenAI calls and formatting.

For OpenAI node setup, enable JSON mode in the settings if it’s there. I structure my prompts like this: “You’re a data extraction assistant. Analyze this text and return the info in JSON format:” then add your schema. Be specific - “put emails in Email field_type” and “put locations in Location field_type.” What really helped me was adding validation rules right in the prompt. I specify what counts as valid data - “valid email addresses only” for emails, “city or state names only” for locations. Throw in a catch-all field for other useful data you might need later. The OpenAI node likes to drop JSON wrappers sometimes, so I always add “Make sure your response starts with { and ends with }” to my prompts. Way more reliable than trying to fix it afterward.