I’m wondering if there’s a way to fetch data from publicly available Google Docs without going through the authentication process. What I’m trying to achieve is pretty straightforward - I want to build something where I can input a specific user identifier and get back a list of all their public documents that have certain tags or are part of a specific collection.
I’ve been looking into this for a while but can’t seem to find clear documentation on whether this is even possible. Has anyone managed to do something similar? I know that public docs can be accessed directly through their URLs, but I’m not sure if the API supports querying them without auth tokens.
Any guidance would be really helpful!
Unfortunately, you can’t do this with Google’s current API setup. The Docs API requires authentication for everything, even public documents. There’s no way to search or list someone’s docs without proper OAuth credentials. I hit this exact wall two years ago on a content project. The only workaround I found was Google’s Custom Search API to crawl public docs, but it’s super unreliable and won’t give you the structured data you actually need. You can’t filter by tags or collections either since that metadata doesn’t show up in search results. You’re better off just implementing the auth flow and having users grant access to their documents. More work upfront, but you’ll get proper access to all the metadata and search features you’re after.
The Google Drive API has a partial solution that might work. The Docs API needs authentication, but you can use the Drive API’s files.list endpoint to find publicly shared documents. You’ll still need API credentials, but a service account works without requiring user login. I built something like this last year for a research project. Set up a service account through Google Cloud Console, then query the Drive API with search parameters like “visibility=‘anyoneCanFind’” or “visibility=‘anyoneWithLink’”. You can’t search by specific user though - Google blocks querying other people’s files for privacy. The metadata is more limited than authenticated requests, but you’ll get basic file info and can filter by document type. Not perfect, but it gets you closer without dealing with OAuth.
tried this last month - total dead end. google locked everything down after people scraped too many docs without permission. even public docs need auth now, which sucks. only workaround i found was the embed url format, but you need the document id first. doesn’t help with finding new docs at all.