Using REGEXEXTRACT in Google Sheets to filter characters

I need help with a Google Sheets formula that uses REGEXEXTRACT to pull out only letters, numbers, and dashes from text data. My goal is to get rid of apostrophes from the text strings.

Here are some sample data points I’m working with:

  • '05 Red Eagles
  • Red Eagles '05
  • Red Eagles 05-06

I’ve been trying different pattern approaches but none seem to work correctly:

=ARRAYFORMULA(IF(AB3:AB="","",REGEXEXTRACT(AB3:AB,"\w+\w+\w+")))
=ARRAYFORMULA(IF(AB3:AB="","",REGEXEXTRACT(AB3:AB,"w.+")))
=ARRAYFORMULA(IF(AB3:AB="","",REGEXEXTRACT(AB3:AB,"w+")))
=ARRAYFORMULA(IF(AB3:AB="","",REGEXEXTRACT(AB3:AB,"[a-zA-Z0-9].+")))
=ARRAYFORMULA(IF(AB3:AB="","",REGEXEXTRACT(AB3:AB,"[a-zA-Z0-9]+")))

The \w+ pattern works fine when I test it in online regex tools, but it doesn’t behave the same way in Google Sheets. Some formulas remove the leading apostrophe but leave the ones in the middle. Others don’t strip any apostrophes at all.

What am I doing wrong with my regex pattern?

yeah the problem is regexextract grabs only one match like others said. but honestly just use substitute function - way easier than regex for this. =substitute(AB3:AB,"'","") will strip all apostrophes without headache. regex is overkill here imo

The issue you’re encountering is that REGEXEXTRACT only captures the first match of your pattern, not all valid characters throughout the string. Instead of trying to extract what you want, you should use REGEXREPLACE to remove what you don’t want. Try this formula instead: =ARRAYFORMULA(IF(AB3:AB="","",REGEXREPLACE(AB3:AB,"[^a-zA-Z0-9\s-]",""))) This pattern [^a-zA-Z0-9\s-] matches any character that is NOT a letter, number, space, or dash. The caret ^ inside the brackets negates the character class. REGEXREPLACE will remove all apostrophes while preserving everything else you want to keep. I had similar issues when cleaning up sports team data last year and this approach worked much better than trying to extract specific patterns.

You’re facing a common challenge with REGEXEXTRACT’s function, which is to capture only a single matching segment. This means it’s not suitable for your requirement to retain letters, numbers, and dashes while eliminating apostrophes. From my experiences, substituting unwanted characters can be more effective. I recommend utilizing REGEXREPLACE instead, with a pattern that specifically targets unwanted symbols, or simply apply the SUBSTITUTE function as follows: =ARRAYFORMULA(IF(AB3:AB="","",SUBSTITUTE(AB3:AB,"'",""))). This efficiently removes all apostrophes while keeping the desired content.