How to get base domain from URL using Google Sheets formulas

I need help getting the main domain from URLs in Google Sheets

I have several URLs and want to pull out only the main domain. I can strip ‘www’ off the beginning, but my current method fails with subdomains like ‘shop.example.com’ when it should give me just ‘example.com’.

What I am looking for: Extract the root domain that consists of letters or numbers, followed by a single dot, and then more letters or numbers.

This is the formula I’m currently using:

=REGEXREPLACE(REGEXREPLACE(B2045;"(http(s)?://)?(www\.)?";"";"/.*";"")

I believe there’s a way to simplify this.

Here are the examples I need to work with:

https://www.example.com/ => example.com
https://example.com/ => example.com  
http://www.example.nl/ => example.nl
http://example.de/ => example.de
http://www.example.co.uk/ => example.co.uk
http://example.co.au/ => example.co.au
store.example.org/ => example.org
store.example.org => example.org
example.com => example.com
http://www.example.nl?id=123 => example.nl
https://www.example.nl/page/?id=123 => example.nl
http2://shop.website.nl/page/?id=123 => website.nl

Any suggestions on how to make this work?

Your formula’s broken because you’re missing the closing parenthesis and the syntax is off on that second REGEXREPLACE. Here’s what actually works - I’ve used this on tons of mixed domain data: =REGEXEXTRACT(REGEXREPLACE(A1,"^https?://(www\.)?",""),"([^/]+\.[a-z]{2,}(?:\.[a-z]{2,3})?)") First it strips out the protocol and www stuff, then extracts the domain pattern. The regex grabs everything before a dot, then looks for 2+ letters for the TLD, plus handles country codes. Works great on messy datasets with different domain types. You might need to adjust the character limits based on what you’re working with.

I’ve hit this exact problem before. Multi-part TLDs like co.uk need a different approach. Try this regex: =REGEXEXTRACT(REGEXREPLACE(B2045,"^https?://(www\.)?",""),"([^/]+\.[a-z]{2,3}(?:\.[a-z]{2})?)") - it strips out the protocol and www first, then grabs the domain plus TLD structure. The (?:\.[a-z]{2})? part handles country code domains. I tested it on similar URLs and it works for both regular domains and those tricky co.uk/co.au cases. Just heads up - it might choke on newer TLDs longer than 3 characters, but covers most common stuff.

Try this regex: =REGEXEXTRACT(B2045,"([^.]+\.[^./]+)(?:/|$)") - just strip out the protocol and www stuff first. Works well with subdomains in my experience, but you’ll probably need to adjust it for those annoying co.uk domains.