Automated batch upload of PDF documents to Google Drive with resume functionality

I’ve got a massive collection of PDF documents stored locally that I need to move to Google Drive. The total size is about 50GB spread across multiple directories and subdirectories in my digital library folder.

What I’m looking for:

  • A script (preferably Python or PHP) that can automatically upload all PDFs from my local folders to Google Drive
  • Something similar to how rsync works on Unix systems - smart syncing capabilities
  • Most importantly, I need resume functionality since my computer needs frequent restarts

The main challenge: When the upload process gets interrupted (due to system restarts), I want the script to pick up exactly where it left off instead of starting over. This would save me from having to manually track which files have already been uploaded.

Has anyone implemented something like this before? Any suggestions for handling the resume logic would be greatly appreciated.

the google drive desktop app is a great solution! just set it to sync your pdf folder, and it’ll take care of resume uploads after restarts. way simpler than coding your own unless you need a specific folder structure.

I did something similar with 60GB of PDFs - used rclone instead of coding my own solution. It’s like rsync but for cloud storage, works great with Google Drive. The resume feature is bulletproof - had several power outages during upload and it picked up right where it left off every time. Keeps your folder structure intact and skips files that are already there. Takes 15 minutes to set up with their Google Drive guide. Love the bandwidth controls and progress reports. Way more reliable than anything I could’ve built myself, plus it’s got solid community support. Still use it for regular backups - zero problems.

I went through this exact thing migrating my research archive last year. You need a local state file to track upload progress. I built a Python solution with the Google Drive API that creates a JSON log - it records each successful upload with file paths and Drive file IDs. Before uploading anything, the script checks this log and skips files that are already done. For big files that get interrupted, I used chunked uploads with checksum verification. The Drive API handles resumable uploads natively - just use MediaFileUpload with resumable=True. I also added exponential backoff for API rate limits. One thing that bit me: duplicate detection. Always query Drive by filename and parent folder ID before uploading, or you’ll get duplicates if your log gets messed up. Took about a week for my 40GB collection but ran solid with automatic restarts.