Extracting structured data from OpenAI responses in n8n workflow

avamtz · May 3, 2025, 11:32am

I’m working on an n8n workflow that uses OpenAI to analyze customer messages. I need to pull out specific info like names and locations. The tricky part is getting OpenAI to spit out the data in a consistent JSON format that’s easy for my other nodes to process.

Here’s roughly what I’m aiming for:

{
  "ai_response": "Hey there! What should I call you?",
  "extracted_info": [
    {
      "data_type": "Location",
      "entries": [{ "data": "Seattle" }]
    },
    {
      "data_type": "Customer_Name",
      "entries": [{ "data": "Pat" }]
    }
  ]
}

What’s the best way to set up the OpenAI node and craft the prompt to make this happen? I’m looking for tips to make the output reliable and machine-friendly. Any ideas on tweaking the node settings or prompt engineering tricks would be super helpful!

Sophia63 · May 14, 2025, 4:01am

yo avamtz, i’ve dealt with this before. function calling in the OpenAI node is key. define ur JSON schema there, then tell the AI to use it in ur prompt. make sure ur prompts are super clear bout what to extract tho. and watch for missing data - gotta handle that gracefully. lmk if u need more details!

livbrown · May 12, 2025, 11:18am

Hey there! I’ve actually tackled a similar challenge in my own n8n workflows. Here’s what worked well for me:

I found that using OpenAI’s function calling feature was a game-changer. You can define a specific JSON schema that you want the model to output, and it’ll adhere to that structure reliably.

In the OpenAI node, I set up a custom function with the exact schema I needed. Then in the prompt, I instructed the AI to use that function to format its response.

One gotcha to watch out for - make sure your prompts are crystal clear about what info to extract. I had issues early on where the AI would sometimes skip fields if the input was ambiguous.

Also, don’t forget to handle edge cases. Sometimes the AI might not find certain data points, so build in logic to gracefully handle missing fields.

Hope this helps! Let me know if you want me to expand on any part of the process.

benmoore · May 10, 2025, 9:36pm

From my experience working with OpenAI in n8n workflows, I’ve found that using the ‘Chat’ completion model tends to produce more consistent structured outputs. In the OpenAI node settings, set the ‘Model’ to ‘gpt-3.5-turbo’ or ‘gpt-4’ if available.

For the prompt, I recommend using a clear instruction followed by examples of the desired output format. Something like:

‘Analyze the following customer message and extract relevant information. Format your response as JSON with the following structure: [your desired JSON structure here]’

Then provide 2-3 example inputs and outputs to demonstrate the exact format you’re looking for. This approach has significantly improved the consistency of my AI-generated structured data.

Remember to implement error handling in your workflow to catch any occasional formatting issues that may slip through.