How can I programmatically add a rendered HTML template to Google Docs using Python?

I am facing a challenge in programmatically inserting an HTML template into a Google Docs document using Python. I recognize that Google Docs Editor and the Google Docs API do not offer direct features for this task. However, I’ve attempted some techniques to achieve my goal, focusing solely on successfully inserting the content without concern for the specific placement within the document.

My method involved the following steps:

  1. Uploading an HTML file to Google Drive under application/vnd.google-apps.document, since Google Docs converts HTML automatically (though it may not be perfect).
  2. Retrieving the file content using the Google Docs API get() method to obtain the document’s JSON data.
  3. Using Google Docs batchUpdate() to modify the target file with the new content.
def add_html_template(target_doc_id, html_data):
media_file = MediaIoBaseUpload(BytesIO(html_data.encode(‘utf-8’)), mimetype=‘text/html’, resumable=True)
document_structure = {
‘name’: ‘test_doc_from_html’,
‘mimeType’: ‘application/vnd.google-apps.document’,
‘parents’: [DRIVE_FOLDER_ID]
}

try:
# Generate a document from HTML as Google Docs handles HTML conversion
created_doc = drive_service.files().create(body=document_structure, media_body=media_file).execute()
created_doc_id = created_doc.get(‘id’)

# Fetch the document content from Google Docs post creation
fetched_doc = docs_service.documents().get(documentId=created_doc_id, fields=‘body’).execute()
extract_content = fetched_doc.get(‘body’).get(‘content’)

# Add the content from HTML to the specified document
update_result = docs_service.documents().batchUpdate(documentId=target_doc_id, body={‘requests’: extract_content}).execute()
print(update_result)

# Remove the temporary HTML document
drive_service.files().delete(fileId=created_doc_id).execute()
print(‘Content successfully added’)
except HttpError as error:
# Ensure the HTML document is deleted even if there is an error
drive_service.files().delete(fileId=created_doc_id).execute()
print(f’Error occurred: {error}')
Issue: The content retrieved in step 2 does not align with the requirements for batchUpdate() in step 3. I am attempting to convert the content but have not been successful so far.
Target Solution: I need to extract a string containing HTML code and insert the rendered HTML into a specified document in Google Docs, aiming to append the HTML rather than overwrite existing content.
Does my approach seem logical? Are there alternative suggestions to achieve my goal?

One potential solution for your issue is using a library called html2markdown to convert your HTML content into Markdown before insertion. Although Google Docs doesn’t natively support Markdown, converting HTML to Markdown could help simplify the structure. Once converted, you might insert Markdown content into Google Docs as plain text, maintaining basic formatting. You could then manually adjust any complex formatting directly in Google Docs. This approach avoids the complexities of the batchUpdate method with pure HTML and simplifies the conversion process.

you might wana try puppeteer to render html into a png or pdf first, then insert it as an image into google docs. this method won’t keep the text editable but good if you care more about appearance. not perfect but can be a temporary fix.