I’m having issues with the Google Docs API when trying to add certain Unicode characters to a document. The API keeps throwing a 400 error saying ‘The insertion index cannot be within a grapheme cluster.’
Here’s what I’m trying to do:
- Insert a Thai phrase: ‘สวัสดีครับ ฉันกำลังทดสอบ API’
- Add a new line before this phrase
My Python code looks like this:
from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
creds = Credentials.from_authorized_user_file('token.json', ['https://www.googleapis.com/auth/documents'])
service = build('docs', 'v1', credentials=creds)
requests = [
{'insertText': {'location': {'index': 1}, 'text': 'สวัสดีครับ ฉันกำลังทดสอบ API'}},
{'insertText': {'location': {'index': 1}, 'text': '\n'}}
]
service.documents().batchUpdate(documentId='my_doc_id', body={'requests': requests}).execute()
The docs mention that the API might adjust locations to avoid inserting within grapheme clusters. But how can I fix this? I thought using index 1 would be safe. Any ideas on how to properly handle Unicode text insertion?
hey, maybe try inserting the newline first then the thai text. indexing error might be solved by this simple tweak. hope that works!
The grapheme cluster error you’re encountering is indeed tricky. In my experience working with non-Latin scripts in the Google Docs API, it is often more reliable to append content at the end of the document rather than inserting at a specific index. One possible approach is to first determine the current end index using the documents().get() method, then append your Thai text at that position, and finally insert a newline at the beginning of the text. This helps avoid conflicts within existing text segments. Also, check that your document’s locale settings match the language being inserted, and if the problem persists, consider inserting the Thai text in smaller segments.
I’ve run into similar issues when working with Unicode characters in the Google Docs API. One workaround I found effective is to use the ‘END_OF_DOCUMENT’ enum instead of a specific index. This ensures you’re always appending to the end of the document, avoiding potential conflicts with existing text.
Try modifying your requests like this:
requests = [
{‘insertText’: {‘endOfSegmentLocation’: {‘segmentId’: ‘’}, ‘text’: ‘\n’}},
{‘insertText’: {‘endOfSegmentLocation’: {‘segmentId’: ‘’}, ‘text’: ‘สวัสดีครับ ฉันกำลังทดสอบ API’}}
]
This approach has consistently worked for me when dealing with various scripts and Unicode characters. If you need the text at a specific location, you can always move it after insertion using additional API calls.