Bulk export of Google Documents using command line interface

I figured out how to grab a single Google document from the command line, but now I want to create a batch script that pulls down all my documents as text files and combines them into one big file.

Right now I have this basic script that works for individual files:

#!/bin/bash
authToken=$(curl -s https://www.google.com/accounts/ClientLogin -d [email protected] -d Passwd=mypassword -d accountType=GOOGLE -d service=writely -d Gdata-version=3.0 |cut -d "=" -f 2)
set $authToken
wget --header "Gdata-Version: 3.0" --header "Authorization: GoogleLogin auth=$3" "https://docs.google.com/feeds/download/documents/Export?docID=${documentId}&exportFormat=txt" -O /tmp/${fileName[$i]}.txt

The problem is I have to manually specify each documentId one by one. Is there a way to get all documents at once, like when you use Google Takeout in the browser? Or do I need to loop through a list of document IDs somehow? Any suggestions to make this more efficient would be great.

Start by hitting the Documents List API to grab all your document IDs. Make a GET request to https://docs.google.com/feeds/default/private/full with your auth token - you’ll get back an XML feed with all documents and metadata. Parse out the document IDs with grep or xmllint, then feed them into your download loop. I did this a couple years ago and the XML parsing was the biggest pain. The document IDs are tucked inside entry elements as part of the self link URLs. Once you’ve got those IDs in an array, just loop through with your current wget setup. Watch out for rate limiting though - Google will throttle you on bulk requests.

Been there with bulk document exports - manual work sucks when you’ve got hundreds of files.

API route works but it’s overkill for most cases. You’ll deal with authentication headaches, rate limits, XML parsing, and network timeout errors. Plus Google keeps changing their auth requirements.

I’ve automated similar workflows using Latenode and it’s much cleaner. Set up a flow that connects to Google Drive, pulls all documents automatically, converts to whatever format you need, and combines into one file. No command line or auth token juggling.

Best part? Schedule it to run whenever you want. Your document backups stay current without manual work. I use similar automation for quarterly compliance exports and save hours every time.

Way simpler than wrestling with deprecated APIs and bash scripts: https://latenode.com

totally agree, google takeout is super easy! just grab the zip and unarchive it. way less hassle than using the api and worrying bout auth issues. plus you get all the formats, so it’s a win-win!

Had this exact problem when backing up hundreds of compliance docs. The document feed approach works, but here’s a cleaner way: skip parsing XML and use curl with basic text processing instead. Grab your document list from the feed endpoint, then pipe it through grep and sed to extract just the IDs. Try curl [your_feed_url] | grep -o 'docid=[^&]*' | cut -d= -f2 - works great. Feed those IDs into a for loop with your wget command. Watch out for special characters in filenames though - sanitize them when creating output files. And definitely add error handling since network timeouts are way more common with large batches than you’d think.

You’ll need to query the Google Docs API first to get a list of all your documents before bulk downloading them. I hit this same problem when migrating my docs a few years back. The trick is using the document feed endpoint to grab metadata on all your files first. Once you’ve got your auth token, make a GET request to https://docs.google.com/feeds/default/private/full with the same authorization headers. This gives you an XML feed with all your documents - IDs and titles included. Parse that XML to pull out the document IDs, then loop through them with your current download code. Watch out for Google’s rate limits though. If you’ve got tons of documents, throw in some sleep delays between downloads. Also heads up - that older ClientLogin method got deprecated years ago and might stop working eventually. OAuth2 is way more reliable for automated scripts.