Extremely slow image loading from Google Drive in Colab notebook

I’m working on a machine learning project and stored my image dataset on Google Drive. When I mount the drive in my Colab notebook and try to load images, the performance is terrible. My local machine processes dozens of images per second, but Colab can only handle 2-3 images in the same timeframe.

The weird thing is that model training with TensorFlow runs fine, so it’s specifically an issue with reading image files from the mounted drive. Has anyone found a workaround for this bottleneck?

Here’s the function I’m using to load my dataset:

def loadImageData(folder_path, class_labels, img_size=64, split_ratio=0.7):
    images = []
    labels = []
    
    count = 0
    # iterate through folders:
    for directory, _, filenames in os.walk(folder_path):
        for filename in filenames:
            full_path = os.path.join(directory, filename)
            category = os.path.basename(directory)
            
            try:
                img = Image.open(full_path)
                img_array = np.asarray(img)
                resized_img = np.array(Image.fromarray(img_array.astype('uint8')).resize((img_size, img_size)))
                final_img = resized_img if len(resized_img.shape) == 3 else color.gray2rgb(resized_img)
                images.append(final_img)
                labels.append(class_labels[category])
            except:
                print(f"Failed to process {filename}")
    
    images = np.asarray(images, dtype=np.float32)
    labels = np.asarray(labels, dtype=np.int16).reshape(1, -1)
    
    return splitData(images, labels, split_ratio)

Google Drive’s API throttling is killing your performance. Every image read hits the network separately, creating huge overhead vs local disk access.

I hit this exact issue on a computer vision project last year. Don’t copy everything upfront - use chunked preprocessing instead. Load batches of 100-200 images from the mounted drive, process them completely, then save the preprocessed arrays as numpy files to Colab’s local storage.

You’ll only hit Drive’s API once per batch instead of once per image, and future training runs load the preprocessed data instantly. Takes longer initially but saves hours on repeated experiments. Your code’s fine - it’s the filesystem layer that’s the bottleneck, not your implementation.

Been there with the same exact headache. Mounted Drive makes Colab fetch every single image over the network one by one. That’s why your performance is trash.

You need to batch download everything first, then process locally on the Colab instance. But doing this manually every time sucks.

I fixed this with an automated pipeline that syncs my datasets from Drive to Colab storage before training starts. Runs without me having to remember copying files or writing download scripts.

The automation handles initial sync, monitors for dataset updates, and cleans up old files to save storage. Once everything’s local on the Colab instance, image loading goes back to normal speeds.

Your code will work fine once the bottleneck’s gone. Real fix is automating the data pipeline so you never deal with mounted drive slowness again.

Check out Latenode for setting up this kind of workflow: https://latenode.com

yeah, gdrive mounting is painfully slow for datasets. copy everything to /content/ first with !cp -r /content/drive/MyDrive/your_dataset /content/ - way faster than reading from the mounted drive every time.