Python method to embed HTML content into Google Documents programmatically

I need help with adding HTML templates to Google Docs using Python code. The Google Docs API doesn’t have a direct way to do this, so I’m trying to find a workaround.

My current method involves these steps:

  1. Create an HTML file in Google Drive with document mime type so it gets converted automatically
  2. Retrieve the converted content using the Docs API
  3. Add this content to my target document with batchUpdate

Here’s my code attempt:

def add_html_content(doc_id, html_string):
    upload_media = MediaIoBaseUpload(BytesIO(html_string.encode('utf-8')), mimetype='text/html', resumable=True)
    file_metadata = {
        'name': 'temp_html_doc',
        'mimeType': 'application/vnd.google-apps.document',
        'parents': [FOLDER_ID]
    }

    try:
        # Upload HTML and let Google convert it
        new_file = drive_service.files().create(body=file_metadata, media_body=upload_media).execute()
        temp_file_id = new_file.get('id')

        # Get the converted document structure
        document = docs_service.documents().get(documentId=temp_file_id, fields='body').execute()
        html_content = document.get('body').get('content')

        # Try to insert into target document
        update_result = docs_service.documents().batchUpdate(documentId=doc_id, body={'requests': html_content}).execute()
        
        # Clean up temporary file
        drive_service.files().delete(fileId=temp_file_id).execute()
        print("HTML content added successfully")
    except HttpError as err:
        drive_service.files().delete(fileId=temp_file_id).execute()
        print(f"Error occurred: {err}")

The issue is that the content structure I get from step 2 doesn’t work with batchUpdate. I want to append the HTML to existing content, not replace it.

Is this approach reasonable or should I try something different?

your batchUpdate syntax is messed up. the content array you get from get() won’t work directly with batchUpdate - you’ve got to convert it to proper request objects first. try using insertText requests instead of dumping the raw content structure in there.

Both answers above are way overcomplicating this. You don’t need all that manual parsing and conversion logic.

I hit this same issue last month when our marketing team needed automated reports. Instead of fighting Google’s API quirks, I used Latenode to build a workflow that converts HTML to Google Docs seamlessly.

Latenode has native Google Docs integration plus HTML processing nodes. Feed your HTML template into a conversion node, then push the formatted content straight to your document. No temporary files or cleanup mess.

The document manipulation features saved me tons of time. It automatically handles content structure conversion, so your batchUpdate calls won’t break. No manual element iteration or request rebuilding.

Took me 30 minutes to set up the whole pipeline. HTML goes in, properly formatted Google Doc comes out. Plus you get built-in error handling and retry logic.

Skip the Python headaches: https://latenode.com

You can’t pass document content elements directly as batchUpdate requests - that’s your main issue. When you pull content from the converted doc, you get structural stuff like paragraphs and runs, not actual requests you can use. I ran into this same thing building automated contract generation. Here’s what worked: create an insertion point first, then convert each content element into proper requests. For text, use insertText requests with startIndex. For formatted text, combine insertText with updateTextStyle. The trick is finding your insertion index in the target doc first (use document length), then build individual requests for each paragraph and text run from your temp document. This way you control exactly where content goes and don’t mess up the existing structure. Your cleanup approach looks good though - just fix the content transformation part.

I’ve encountered this HTML embedding issue before. Your conversion approach has a major flaw—when using get(), the document structure returns content elements with structural data that batchUpdate cannot interpret as requests. What worked for me was parsing the converted document’s content and manually rebuilding it with appropriate batchUpdate requests. After obtaining your temporary document, iterate through the content elements and create insertText or insertTable requests based on the element types. You will need to extract the actual text from paragraphs and manage formatting separately using textStyle properties. Alternatively, you could opt for a more straightforward solution: convert HTML to plain text first and then apply formatting programmatically. Libraries like html2text can effectively strip HTML, allowing you to recreate fundamental formatting using the Docs API’s text styling. This method avoids the tricky conversion and offers much more control over the appearance.