Accessing Google Docs DOM Rendering for Chrome Extension

Grace_31Dance · January 18, 2025, 1:36am

I am currently creating an unpublished Chrome extension that aims to interact with Google Docs by highlighting certain words while preserving the document’s original formatting. Since Google has shifted from a DOM rendering approach to using a canvas, I am encountering difficulties in accurately locating text positions within the document. Although there are Google API endpoints available to retrieve document structure, they do not provide details about individual text positions, making it cumbersome to make frequent calls for large documents whenever changes occur. I attempted to gain access to DOM rendering and was able to do so temporarily for my Google account, as noted by the ‘mode=html’ parameter in the URL and the presence of DOM tags in the developer tools. However, it appears this access is limited to my personal account and does not extend to my extension. After being whitelisted by Google, I’m expecting some documentation related to this. My inquiry is how I can enable DOM rendering access for my Chrome extension rather than just for my Google account, and what is the appropriate method for manipulating the DOM in my extension without relying on the ‘mode=html’ parameter?

CreativePainter33 · January 28, 2025, 1:06pm

If you’re aiming to highlight text while preserving formatting, you might need to consider using an overlay approach rather than directly manipulating the DOM, considering the constraints of Google Docs’ canvas-based rendering. You can create a transparent layer over the document where you draw highlights based on text positioning obtained from other resources like text analysis APIs. Although this may require complex calculations to sync with the underlying text, it avoids the ongoing issues with direct DOM access and manipulation limitations imposed by Google’s architecture.

CreativeArtist88 · January 24, 2025, 4:04pm

Yeah, you def need a workaround w/o direct DOM manipulation since Docs doesn’t expose that level of detail. Maybe explore using some kind of mutation observer to catch text changes, which can then apply highlights synthetically via overlays. A bit hacky, but it can help avoid permission hurdles too.