Hello everyone! I have a Google Sheets document filled with real estate website links that I collected while house hunting. I want to create a map visualization from this data, but I need to extract the actual street addresses from these URLs first. The property addresses are embedded within the web links themselves. I’m looking for a way to automatically parse these addresses and populate them in an adjacent column using a formula. I’ve tried searching online but haven’t found a clear solution. As someone who’s still learning spreadsheet formulas, I would really appreciate any guidance or suggestions you might have!
I ran into the same issue when scraping property data. How you extract addresses depends on your URL format - real estate sites use different separators like dashes or underscores. I used MID, FIND, and SUBSTITUTE functions in Google Sheets to grab the address part. First, check your URLs for patterns. Zillow formats theirs pretty predictably, so you can usually build a formula that works consistently. Once you figure out the structure, create a formula that pulls the address and cleans up unwanted characters with SUBSTITUTE. Just heads up - you’ll probably need to tweak the formula for different sites since there’s no standard URL format.
Had this same issue at work extracting location data from property feeds. Manual formulas become a nightmare with multiple real estate sites.
I built an automation that scrapes the actual property pages and pulls addresses straight from the HTML. Way more reliable than parsing URLs - sites constantly change their URL structures.
The workflow grabs your Google Sheets URLs, visits each page, extracts the address from page content, and writes it back automatically. Takes 10 minutes to setup and handles different site formats without writing complex formulas for each one.
I use Latenode for data extraction since it connects directly to Google Sheets with built-in web scraping that grabs addresses from any real estate format. Much cleaner than fighting regex patterns that break every time sites update.
try the REGEXEXTRACT function - it’s super powerful for pulling address patterns from urls. like =REGEXEXTRACT(A1,“([0-9]+[^/]*)”) works if addresses start with numbers. just test it on a few urls first since every site formats them differently.