I’m working with Google Sheets and need help with the REGEXEXTRACT function. I want to extract only letters, numbers, and hyphens from my data while removing apostrophes.
Here are some sample entries I’m dealing with:
- '98 Red Hawks
- Green Eagles '99
- Purple Lions 99-00
I’ve tried several different formulas but none seem to work correctly:
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"\d+\d+\d+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"[A-Za-z0-9\s-]+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"[^']+")))
When I test similar patterns online they work fine, but in Google Sheets the results are inconsistent. Some formulas remove the apostrophe at the beginning but leave ones in the middle. Others don’t remove anything at all.
What am I doing wrong with my regex pattern?
I hit the same issue before. REGEXEXTRACT only grabs the first match it finds, which is why your results are all over the place. Your patterns are trying to match pieces of text instead of cleaning the whole string. I switched to REGEXREPLACE and it worked way better. Try this: =ARRAYFORMULA(IF(C2:C="","",REGEXREPLACE(C2:C,"'","","g"))) That “g” flag makes it replace every apostrophe, not just the first one. Works perfectly for your examples - “'98 Red Hawks” becomes “98 Red Hawks” and “Green Eagles '99” becomes “Green Eagles 99”, but “Purple Lions 99-00” stays untouched. REGEXREPLACE beats REGEXEXTRACT for data cleaning since it handles the entire string instead of hunting for specific chunks.
The main issue you are encountering is that REGEXEXTRACT operates differently than typical online regex tools. While those can show all matches or replace globally, REGEXEXTRACT in Google Sheets only captures the first matching group, resulting in inconsistent patterns. For straightforward removal of characters, using SUBSTITUTE alongside REGEXEXTRACT is more effective: =ARRAYFORMULA(IF(C2:C="","",SUBSTITUTE(C2:C,"'",""))). This formula efficiently removes all apostrophes without the complexities of regex. For more complex data sanitization, REGEXREPLACE would be a better choice, as it processes the entire string and allows for global replacements, which REGEXEXTRACT does not.
yeah, regexextract gets messy for this. you’re trying to extract when you should replace - the other guy’s right. if you really want to stick with extract though, try =ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(REGEXREPLACE(C2:C,"'","","g"),".*"))) - it cleans first then extracts everything. pretty messy though. just using regexreplace alone is way cleaner.