Extracting email content from Gmail using Python and the Gmail API

Hey folks! I’m working on a Python project to fetch emails from Gmail using the API. I’ve got the sender and subject sorted, but I’m stuck on getting the email body. Here’s what I’ve done so far:

from googleapiclient.discovery import build

email_service = build('gmail', 'v1', credentials=auth_creds)
inbox_messages = email_service.users().messages().list(userId='me', labelIds=['INBOX'], q="from:[email protected], is:unread").execute()

emails = inbox_messages.get('messages', [])

if emails:
    for email in emails:
        email_content = email_service.users().messages().get(userId='me', id=email['id']).execute()
        email_headers = email_content['payload']['headers']
        
        sender = next(header['value'] for header in email_headers if header['name'] == 'From')
        subject = next(header['value'] for header in email_headers if header['name'] == 'Subject')
        
        print(f"From: {sender}")
        print(f"Subject: {subject}")
        
        # Need help with getting the body here!

else:
    print("No new messages found.")

I’ve looked at the Gmail API docs but can’t figure out how to get the email body in plain text. Any ideas? Thanks!

I’ve dealt with this issue before in one of my projects. The tricky part is that email bodies can be structured differently depending on the content type. Here’s a more robust approach that’s worked well for me:

def get_email_body(email_content):
    if 'parts' in email_content['payload']:
        return get_email_body(email_content['payload']['parts'][0])
    elif 'body' in email_content['payload']:
        data = email_content['payload']['body']['data']
        return base64.urlsafe_b64decode(data).decode('utf-8')
    else:
        return ''

# Then in your main loop:
body = get_email_body(email_content)
print(f'Body: {body[:100]}...') # Print first 100 chars

This handles both multipart and single part messages. Don’t forget to import base64. Also, consider error handling for different encodings if you’re dealing with international emails.

I’ve encountered similar challenges with the Gmail API. One approach that’s worked well for me is to recursively parse the payload structure. Here’s a snippet that might help:

def get_body(message):
    if 'data' in message['payload']['body']:
        return base64.urlsafe_b64decode(message['payload']['body']['data']).decode()
    else:
        return get_body_from_parts(message['payload']['parts'])

def get_body_from_parts(parts):
    for part in parts:
        if part['mimeType'] == 'text/plain':
            return base64.urlsafe_b64decode(part['body']['data']).decode()
        elif 'parts' in part:
            return get_body_from_parts(part['parts'])
    return ''

# In your main loop:
body = get_body(email_content)

This handles various email structures, including nested multipart messages. Remember to import base64 and consider implementing error handling for different encodings.

hey tom, i’ve done this before. the body is usually in the ‘payload’ part. try this:

body = email_content[‘payload’][‘parts’][0][‘body’][‘data’]
decoded_body = base64.urlsafe_b64decode(body).decode(‘utf-8’)

make sure to import base64 at the top. hope this helps!