I’m struggling with generating a proper JSONL file that includes references to images stored in my Google Cloud Storage bucket. I need this for running batch inference using Vertex AI.
What I’m trying to do:
- Extract file paths from my GCS bucket
- Format them appropriately for Vertex AI batch processing.
My current approach:
- First, I list all files:
gsutil ls gs://my-data-bucket/images > file_list.txt - Then, I manually convert the txt file to JSONL format like this:
{"content": "gs://my-data-bucket/images/photo1.jpg", "mimeType": "image/jpeg"}
{"content": "gs://my-data-bucket/images/photo2.jpg", "mimeType": "image/jpeg"}
The problem:
When I submit my batch prediction job, I keep getting an error indicating that the file “cannot be parsed as JSONL.”
I suspect there might be formatting issues with my JSONL structure. Has anyone faced this before? Is there a more straightforward way to export bucket contents into the correct JSONL format that Vertex AI requires?