Python library to convert email messages to structured JSON format

I need a Python library that can take raw email messages and convert them into JSON or similar structured format. Right now I have several personal projects that process incoming emails using Python’s built-in email module, but each project handles the parsing differently. I want to create one unified solution that transforms emails into JSON so all my projects can work with the same clean data structure. I know services like Mailgun offer this functionality, but I prefer running everything on my own server instead of using external APIs or webhooks. Has anyone found a good library for this kind of email to JSON conversion?

Try flanker by Mailgun - it’s open source so you can run it locally. Handles everything including weird MIME types and broken encodings that break other parsers. I’ve been using it for 6 months and it just works. The JSON output is clean and predictable, making downstream processing much easier.

Hit this same issue two years back with multiple email processing scripts. Tried a bunch of different approaches, but ended up writing a custom wrapper around Python’s email module to standardize everything. The trick was building a consistent schema that handles all the weird edge cases - missing headers, multipart messages, encoding problems, you name it. My wrapper pulls the usual stuff (sender, recipient, subject, timestamp, body) and spits out clean JSON. Also grabs attachment metadata and handles both plain text and HTML properly. Biggest pain was dealing with malformed emails and different client formatting, but solid error handling fixes most of that. Way better than external services since you control exactly how the JSON looks, especially when you need custom schemas.

I’ve been using eml-parser for this kind of work - it’s rock solid. Just pip install it and you’re good to go. It takes raw email content and spits out clean JSON with all the standard fields parsed. What I love about it: keeps the original structure intact while giving you easy key-value access. Pulls out URLs, email addresses, and handles attachments with zero config. Way better than building your own parser because it already handles all the encoding headaches and broken headers you’ll definitely run into. Performance is solid too - I’m processing hundreds of emails per minute on basic hardware. Only complaint is the output gets pretty verbose, so you’ll probably want to filter out stuff you don’t need.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.