Custom Docker container fails to run on Vertex AI but works on Cloud Run

I’m having trouble getting my custom machine learning container to work properly on Vertex AI, even though it runs fine on Cloud Run.

What I’ve done so far:

  • Created a Docker image with a PyTorch-based object detection model
  • Tested the container locally and it works great
  • Successfully deployed the same image on Cloud Run without any problems
  • The API uses FastAPI and processes image data sent as base64

My setup includes:

  • PyTorch base image with CUDA support (version 2.2.0 with CUDA 12.1)
  • FastAPI server with Uvicorn and Gunicorn
  • Trying to use n1-standard-4 machine with NVIDIA P4 GPU
  • Container starts with a shell script that loads the model and launches the server

Deployment process:

- name: Create Detection Model
  run: |
    MODEL_EXISTS=$(gcloud ai models list --region=$LOCATION --filter="displayName=object-detector" --format="value(name)" --limit=1)
    if [ -z "$MODEL_EXISTS" ]; then
      gcloud ai models upload --region=$LOCATION --display-name=object-detector --container-image-uri=$CONTAINER_URI
    else
      echo "Detection model exists, skipping creation."
    fi

- name: Check Model Registration Status
  run: |
      max_wait=$WAIT_TIME
      current_time=$(date +%s)
      while true; do
        DETECTOR_MODEL=$(gcloud ai models list --region=$LOCATION --filter="displayName=object-detector" --format="value(name)" --limit=1)
        if [ -n "$DETECTOR_MODEL" ]; then
          echo "Model Ready: $DETECTOR_MODEL"
          echo "DETECTOR_MODEL=$DETECTOR_MODEL" >> $GITHUB_ENV
          break
        fi
        if [ $(( $(date +%s) - current_time )) -gt $max_wait ]; then
          echo "Registration timeout exceeded." && exit 1
        fi
        sleep 15
      done

Endpoint deployment:

- name: Setup Detection Endpoint
  run: |
    DETECTION_ENDPOINT=$(gcloud ai endpoints list --region=$LOCATION --filter="displayName=detector-endpoint" --format="value(name)" --limit=1)
    if [ -z "$DETECTION_ENDPOINT" ]; then
      DETECTION_ENDPOINT=$(gcloud ai endpoints create --region=$LOCATION --display-name=detector-endpoint --format="value(name)")
    fi
    gcloud ai endpoints deploy-model $DETECTION_ENDPOINT \
      --region=$LOCATION \
      --model=$DETECTOR_MODEL \
      --display-name=detection-service \
      --machine-type=$INSTANCE_TYPE \
      --accelerator=count=1,type=$ACCELERATOR_TYPE \
      --min-replica-count=$MIN_INSTANCES \
      --enable-access-logging \
      --autoscaling-metric-specs=$SCALING_CONFIG

The problem:
The model deployment fails on Vertex AI and I can’t see any useful logs in the endpoint monitoring. The same exact container image works perfectly when I deploy it to Cloud Run. Has anyone experienced similar issues with custom containers on Vertex AI?

I hit the same issue migrating from Cloud Run to Vertex AI last year. The main difference: Vertex AI needs your container to implement their prediction protocol, while Cloud Run just wants a working HTTP server. Since you’re using FastAPI with base64 image processing, you’ll need to tweak your endpoint handlers. Vertex AI sends prediction requests in a specific format and wants responses that match the AI Platform prediction format. Your FastAPI routes should handle /predict or / endpoints that accept their standard prediction request structure. Health checks also tripped me up. Vertex AI does different health checks than Cloud Run, so make sure your container responds properly during startup. If model loading takes too long, your startup script might be timing out. Check your container logs through the model deployment logs instead of endpoint monitoring. When deployment fails during container initialization, those logs end up in different places than runtime logs.