I’m working on an image classification project and trying to import my dataset into Vertex AI. I prepared a CSV file with my image labels and made sure there are no duplicates or formatting issues in the file. I also verified that all my labels don’t have any extra spaces or whitespace characters. However, when I try to import this CSV into Vertex AI, I keep getting an error message about annotation deduplication. The system only imports a small portion of my images instead of the complete dataset I uploaded. Has anyone encountered this issue before? What might be causing this deduplication error during the import process?
Yes, this issue often arises from how Vertex AI handles duplicate annotations to enhance the quality of training data. It can drop what it perceives as duplicate entries during import. This frequently occurs when multiple labels are linked to the same image or when distinct filenames correspond to identical image file hashes. Review your CSV file for any rows that might point to the same image but with different labels; Vertex AI could merge these and eliminate some annotations. Additionally, having perfect image duplicates saved in multiple locations or formats can trigger this issue. Enabling verbose logging during the import process will help you identify the files being flagged as duplicates. You may need to adjust your CSV to align with a multi-label format instead of creating separate rows for each label associated with an image.
ugh, so frustating! vertex ai flags images as duplicates even when they’re completly different sometimes. double-check for similar photos or multiple resolutions of the same image - that’ll trigger it. also verify your gcs bucket paths in the csv are right. wrong paths mess up imports in weird ways.
I’ve hit this exact problem before. Vertex AI doesn’t just check your CSV - it’s also looking at the actual image content and metadata.
You might have duplicate images with different filenames, or images similar enough that Vertex AI flags them as dupes. It also scans EXIF data and other metadata.
Honestly, doing this manually is a pain. I always use Latenode to automate the whole ML data prep process.
With Latenode, I build flows that validate images before upload, catch real duplicates using image hashing, clean metadata, and handle Vertex AI imports with proper error handling. You can set it to auto-retry failed imports or split huge datasets into smaller chunks.
Set this up for our team last month and it killed all these import headaches. The visual workflow shows exactly what’s happening at each step.
Check it out at https://latenode.com