I’m having trouble with a Python script that imports EML files to Gmail. The script works fine but there’s a small issue. When I look at the emails in Gmail, they show the import time instead of the original timestamp from the EML file. This messes up the timeline in my inbox.
I thought I fixed this by setting the internalDate in the script, but it’s not working as expected. Here’s a simplified version of what I’m doing:
import email
from datetime import datetime
import base64
from googleapiclient.discovery import build
def upload_email(service, eml_data, label_id):
eml_message = email.message_from_bytes(eml_data)
date_str = eml_message.get('Date')
timestamp = int(datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %z').timestamp())
message = {
'raw': base64.urlsafe_b64encode(eml_data).decode('utf-8'),
'internalDate': str(timestamp),
'labelIds': [label_id]
}
service.users().messages().insert(userId='me', body=message).execute()
# Main script logic here
Can anyone help me figure out why the original timestamp isn’t being used in Gmail? I want the imported emails to blend in with my existing messages correctly.
I encountered a similar issue when working on an email migration project. In my experience, incorporating the X-GM-THRID and X-GM-MSGID headers helped preserve both the original threading and timestamp information. A method that proved effective was to extract the Message-ID from the EML file and generate a unique thread ID, for example by employing a hash of the subject and date. These values can then be incorporated into your message dictionary as shown below:
message = {
'raw': base64.urlsafe_b64encode(eml_data).decode('utf-8'),
'internalDate': str(timestamp * 1000),
'labelIds': [label_id],
'threadId': thread_id,
'id': message_id
}
This approach may help Gmail correctly position the imported emails in your timeline. Make sure to address any potential encoding issues during header extraction.
I’ve dealt with a similar issue when importing emails to Gmail. The problem is likely with how Gmail handles the ‘internalDate’ parameter. Instead of using milliseconds since epoch, Gmail actually expects microseconds.
Try modifying your timestamp calculation like this:
timestamp = int(datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %z').timestamp() * 1000)
This should convert the timestamp to microseconds. Also, make sure your ‘Date’ parsing format matches exactly with what’s in your EML files. Some might use a different format, so you might need to adjust the strptime format string accordingly.
If this doesn’t solve it, you could try using the ‘headers’ field instead of ‘raw’ in your message dictionary. This allows you to explicitly set the ‘Date’ header, which Gmail might prioritize over ‘internalDate’.
Let me know if this helps or if you need further assistance!
hey mate, i had similar issues. try grabbing the last recived header instead of the date header, ‘coz date is sometimes off. do: timestamp = int(email.utils.parsedate_to_datetime(eml_message[‘Received’].split(’;')[-1].strip()).timestamp()*1000) to get proper microsec time. luck!