I’m working with a collection of email addresses and need to remove the ones from common providers like Gmail, Yahoo, and Hotmail using regex patterns.
Currently I have this regex pattern to match Gmail addresses: ^[a-z0-9](\.?[a-z0-9]){5,}@gmail\.com$
However, I want to expand this to also detect Yahoo and Hotmail addresses in a single regular expression. What would be the best approach to modify my pattern so it can identify all three email providers at once? I’m looking for a way to combine these checks into one efficient regex that can catch emails from these popular domains.
Here’s a sample of what I’m trying to match:
[email protected]
[email protected]
[email protected]
Any suggestions on how to structure this regex pattern would be really helpful.
just use @(gmail|yahoo|hotmail)\.com$
at the end of your pattern. way simpler than what your doing. dont overthink the username part tbh, focus on catching the domains. also remember outlook.com is basically hotmail now so maybe add that too
Instead of trying to cram everything into one complex regex, I’d suggest using the alternation operator with a simplified approach. Something like @(gmail|yahoo|hotmail)\.(com|co\.uk)$
works well for basic filtering. I’ve found that overcomplicating the username validation part often causes more problems than it solves. Your current Gmail pattern is quite restrictive - it might miss legitimate addresses that don’t fit that exact format. For domain filtering purposes, focusing on the domain portion is usually sufficient. One thing to consider is that these providers have multiple domains. Yahoo uses yahoo.com, yahoo.co.uk, and others. Hotmail has hotmail.com, hotmail.co.uk, plus Outlook.com now. You might want to expand your pattern to include these variations depending on your needs. Also worth noting that regex alone might not catch all edge cases. I’ve seen situations where a simple array lookup against known domains performs better and is easier to maintain than complex regex patterns.
You could modify your existing pattern to something like ^[a-z0-9](\.?[a-z0-9]){5,}@(gmail|yahoo|hotmail)\.com$
but honestly the username validation part seems unnecessarily strict. Most filtering scenarios work fine with just @(gmail|yahoo|hotmail)\.com$
appended to a basic email pattern. One gotcha I ran into recently was that Microsoft has moved most Hotmail users to Outlook.com, so you might want to include that domain as well. Also keep in mind that these providers often have country-specific domains like gmail.co.uk or yahoo.ca depending on your user base. The alternation approach with pipes is definitely the right direction though. I’ve used similar patterns in production and they handle the bulk of common email filtering needs without getting too complex. Performance-wise, this kind of domain matching is pretty efficient even with larger datasets.