Validating JSON Input Schema for BigQuery with Generative AI

I’m seeking assistance with constructing the correct JSON input for batch processing with BigQuery and Generative AI. While I’m adhering to the guidelines in the official documentation, I’m encountering discrepancies with the schema.

The documentation provides this example format:

{
  "inputs": [
    {
      "user": "person",
      "content": [
        {
          "request": "Can you suggest a recipe for chocolate chip cookies?"
        }
      ]
    }
  ],
  "system_instruction": {
    "content": [
      {
        "request": "You act like a master chef."
      }
    ]
  }
}

However, when I review the generateContent API documentation, it appears that certain fields like system_instruction do not align with the expected schema. This raises concerns about whether I’m using the appropriate format.

I’ve executed multiple batch prediction jobs, but I’m unsure if all my parameters are being utilized correctly. For example, it seems that my generationConfig field isn’t functioning as intended.

How can I confirm the complete schema and ensure my JSON format is accurate? What steps can I take to verify that BigQuery is correctly processing all my parameters during batch operations?

Been down this exact rabbit hole before. The schema mismatch happens because BigQuery batch predictions use a different format than the real-time generateContent API.

For BigQuery batch jobs, structure your JSON like this:

{
  "instances": [
    {
      "prompt": "Can you suggest a recipe for chocolate chip cookies?",
      "system_instruction": "You act like a master chef."
    }
  ],
  "parameters": {
    "temperature": 0.7,
    "maxOutputTokens": 1024,
    "topP": 0.8
  }
}

That generationConfig issue you mentioned? BigQuery expects parameters instead. I wasted way too many hours debugging this last month.

Check the job logs in BigQuery console to verify your schema works. Failed parameter parsing shows up as warnings there. Run a small test batch first - 10 rows max - before processing your full dataset.

One more thing - parameter names might vary slightly depending on which Vertex AI model you’re using through BigQuery. Double-check the specific model docs for your use case.