I’m struggling to get request and response logging working properly with my Vertex AI endpoint setup. When I was using the older AI Platform endpoints, everything worked fine and I could turn on logging through gcloud commands. The logs would show up in my BigQuery table without any problems.
Now with Vertex AI endpoints, I have to use the REST API to enable logging instead. I followed the docs and sent a PATCH request to configure the logging. The request went through successfully and I can see the updated config, but my BigQuery table stays empty. No logs are showing up at all.
I also discovered something weird. When I make prediction calls using the rawPredict endpoint, no logs get created. But if I switch to the regular predict endpoint, the logs appear in BigQuery. The problem is that the predict endpoint changes how my response data looks completely.
Has anyone else run into this issue? I need the logging to work but I also need my response format to stay the same. Any ideas on how to fix this would be really helpful.
Yeah I hit this exact same thing 6 months ago when we migrated our prediction pipeline. The rawPredict vs predict logging difference is documented, just buried deep in the docs.
rawPredict doesn’t support request/response logging at all. Google designed it that way because rawPredict works with custom model containers where the request/response format can be anything. The logging system can’t parse arbitrary formats.
Here’s what I did to keep our response format intact while getting logs:
I kept using predict for logging but added a response transformation layer. Write a small proxy service that calls the predict endpoint, extracts the actual model output from the wrapped response, and returns just the data you need.
Something like this in your proxy:
response = vertex_client.predict(endpoint=endpoint, instances=instances)
raw_predictions = response.predictions
return {"predictions": raw_predictions} # or whatever format you need
If you absolutely need rawPredict, you can implement custom logging in your model container. We did this for one model where the response transformation was too complex.
Just log the request/response data directly to BigQuery from inside your container using the BigQuery client library. More work but gives you full control over the log format.
The predict endpoint wrapper adds maybe 50ms overhead max, so the proxy approach works fine for most cases.
I hit this exact problem during our Vertex AI migration. The rawPredict logging limitation totally caught me off guard too. After testing a bunch of approaches, using Cloud Run as a lightweight wrapper worked way better than messing with the model container.
Here’s what I did: deployed a simple Cloud Run service that takes your original request format, calls the Vertex predict endpoint internally, and strips the response wrapper before sending it back. You keep your existing client integration but get the logging you need.
The Cloud Run service handles format conversion behind the scenes. Your clients send the same requests and get the same responses, but everything goes through predict internally for proper logging.
Watch out for this - make sure your BigQuery dataset is in the same region as your Vertex endpoint. Cross-region logging configs fail silently all the time. Also double-check your service account has BigQuery Data Editor permissions on the target table, not just the dataset.
Overhead’s minimal since Cloud Run scales to zero when idle, and latency impact is usually under 100ms for format transformation.
Hit this same logging issue when I moved from AI Platform last year. The rawPredict limitation sucks, but there’s a workaround that doesn’t need a proxy service. Just modify your model serving container to handle both formats. When predict requests come in, extract the instances array and process it like you’d handle rawPredict data. Then wrap your response back into predict format. In your prediction handler, check if the request has predict wrapper structure. If yes, unwrap it, process normally, rewrap the response. You get logging without touching client code or adding infrastructure. Also heads up - the BigQuery table schema has to match exactly what Vertex expects or logs get dropped silently. Your table needs the right prediction_log schema with timestamp, request, and response fields properly typed. Table creation is super picky about field types. For quick debugging, turn on Cloud Logging alongside BigQuery logging to see if requests are even getting processed. Sometimes it’s just table permissions, not the logging config.