Retrieving selected text details from Google Docs using Chrome extension

I’m working on a Chrome extension and need to extract information about text that users select in Google Docs. I want to get both the actual selected text content and its position coordinates within the document.

I know that regular web pages let you use window.getSelection() to grab selected text, but this doesn’t seem to work with Google Docs since it uses a complex canvas-based editor. I’ve tried the standard selection methods but they return empty results.

Has anyone found a working solution for this? I need to capture what text is highlighted and where it’s located on the page when someone selects it in a Google Docs document.

Google Docs is brutal for text extraction because of their canvas setup. I dealt with this exact issue when building document processing tools for my team.

Everyone’s suggesting DOM manipulation and event listeners, but there’s a cleaner way. Don’t fight Google’s constantly changing internal structure - automate the text capture externally instead.

Connect to Google Drive API and monitor document activity. When users make selections, pull the document content, identify changes, and map selection coordinates through API calls instead of browser DOM parsing.

This kills all the iframe headaches, canvas rendering issues, and Google’s shifting internal APIs that break your extension every few months. You get reliable text extraction and positioning data without wrestling with their editor structure.

I’ve used this for complex document workflows - selection tracking, collaborative editing monitoring, content extraction. Works consistently across browsers and survives Google’s updates.

I ran into this same problem building an annotation tool for research. Google Docs uses canvas rendering, so normal DOM selection methods just don’t work. Here’s what actually worked: I built a content script that watches for selection events at the document level instead of trying to parse the canvas. Google Docs does fire custom events when you select text, but you’ve got to listen in the right spot. Inject your script into the main document frame and set up listeners for mouseup and keyup events on the editor container. When they fire, you can grab the selection data through Google’s own internal APIs - the same ones their toolbar uses. For coordinates, Google keeps shadow DOM elements that match the visible text selections. They’ve got the coordinate data you need, just be careful traversing their internal structure. This has stayed solid for eight months through multiple Google Docs updates. Work with their event system, not against their rendering.

Spent months debugging this exact problem for a citation manager extension. Google Docs uses a parallel DOM structure for accessibility - while you see the visual editor on canvas, there’s a hidden contenteditable div that mirrors everything and handles text selections. You need to access the kix-appview-editor element and its nested contenteditable components. When users select text visually, it syncs to this hidden layer. Hook into it by watching mutation observers on the editor container and checking aria-selected attributes. For positioning, Google Docs stores cursor and selection coordinates in data attributes on wrapper elements. These map directly to canvas positions. Listen for ‘docs-texteventtarget-clipboard’ events - gives you the cleanest access to both content and position data. This method has survived three major Google Docs updates without breaking. Way more reliable than trying to reverse-engineer the canvas rendering.

Google Docs is a nightmare to work with - it’s stuck in an iframe with a proprietary editor and canvas rendering that breaks normal DOM selection methods.

I hit this same wall last year building a productivity tool. Instead of wrestling with Google’s messy DOM and trying to inject code into their iframe, I built a workflow that captures document interactions through API calls.

The key is intercepting data at a different level. Set up automation to monitor document changes, grab selections through Google’s API endpoints, and process everything you need. No more fighting with browser selection APIs or parsing their rendered canvas.

I use this to extract text selections, track document positions, and monitor real-time collaborative editing. The automation does the heavy lifting while your extension just gets clean, structured data about user selections.

This cuts out all the iframe permission headaches, canvas parsing nightmares, and Google’s constantly shifting internal structure. Works reliably across browsers and survives Google Docs updates too.

Honestly, this is exactly why I switched to clipboard events. Google Docs copies selected text to clipboard normally, so just intercept the paste/copy actions. Way simpler than fighting their weird DOM structure and you get actual text content without coordinate headaches.