Extracting email content from Gmail using Python and Gmail API?

Hey guys, I’m stuck trying to get the email body from Gmail using Python and the Gmail API. I’ve got a script that can fetch the sender and subject, but I’m scratching my head on how to grab the actual message content.

Here’s what I’ve got so far:

gmail_service = create_service('gmail', 'v1', creds)
inbox_messages = gmail_service.users().messages().list(userId='me', labelIds=['INBOX'], q='from:[email protected] is:unread').execute()

emails = inbox_messages.get('messages', [])

if not emails:
    print('No new messages.')
else:
    for email in emails:
        email_data = gmail_service.users().messages().get(userId='me', id=email['id']).execute()
        headers = email_data['payload']['headers']
        
        sender = next(h['value'] for h in headers if h['name'] == 'From')
        subject = next(h['value'] for h in headers if h['name'] == 'Subject')
        
        print(f'From: {sender}')
        print(f'Subject: {subject}')
        # How do I get the email body?

I’ve tried a few things, but no luck so far. Any ideas on how to extract the email body? Thanks in advance!

hey sparklinggem, i had the same problem! try this:

def get_body(msg):
    if 'parts' in msg['payload']:
        return base64.urlsafe_b64decode(msg['payload']['parts'][0]['body']['data']).decode()
    return base64.urlsafe_b64decode(msg['payload']['body']['data']).decode()

body = get_body(email_data)
print(f'Body: {body}')

this worked 4 me. gl!

I’ve worked with the Gmail API before, and extracting the email body can be tricky due to the different message formats. Here’s what worked for me:

After you get the email_data, you need to dive into the payload structure. The body content is usually in the ‘parts’ section of the payload. You might need to handle both plain text and HTML versions.

Here’s a snippet that should help:

import base64

def get_body(email_data):
    if 'parts' in email_data['payload']:
        for part in email_data['payload']['parts']:
            if part['mimeType'] == 'text/plain':
                return base64.urlsafe_b64decode(part['body']['data']).decode()
    elif 'body' in email_data['payload']:
        return base64.urlsafe_b64decode(email_data['payload']['body']['data']).decode()
    return ''

# Then in your loop:
body = get_body(email_data)
print(f'Body: {body}')

This should handle most cases. Remember to import base64 at the top of your script. Hope this helps!

I’ve encountered similar issues when working with the Gmail API. One approach that’s worked well for me is to recursively parse the payload structure. Gmail messages can be nested, especially with attachments or multipart content.

Here’s a function I’ve used successfully:

def get_message_body(payload):
    if 'body' in payload and 'data' in payload['body']:
        return base64.urlsafe_b64decode(payload['body']['data']).decode()
    elif 'parts' in payload:
        for part in payload['parts']:
            body = get_message_body(part)
            if body:
                return body
    return None

# Usage in your loop:
body = get_message_body(email_data['payload'])
if body:
    print(f'Body: {body[:100]}...')  # Print first 100 chars

This handles various message structures and should reliably extract the content. Remember to handle potential encoding issues, as some emails might use different character sets.