Extract alphanumeric characters and hyphens using REGEXEXTRACT in Google Sheets

I need help with a Google Sheets formula that uses REGEXEXTRACT to pull out only letters, numbers, and hyphens from text data. My goal is to strip out apostrophes from various text strings.

Here are some sample text entries I’m working with:

'98 Red Hawks

Red Hawks '98

Red Hawks 98-99

I’ve tried several different regex patterns in my formulas but none are giving me the results I expect:

=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"\w+\w+\w+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"w.+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"w+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"[a-zA-Z0-9].+")))
=ARRAYFORMULA(IF(C2:C="","",REGEXEXTRACT(C2:C,"[a-zA-Z0-9]+")))

When I test the \w+ pattern in online regex tools it works perfectly, but in Google Sheets it behaves differently. Some of my attempts remove the leading apostrophe but leave the ones in the middle. Others don’t remove any apostrophes at all.

What am I doing wrong with my regex pattern?

Your problem is that REGEXEXTRACT only grabs the first match, not the whole cleaned string. You want REGEXREPLACE to strip out the unwanted characters instead. Try this:

=ARRAYFORMULA(IF(C2:C="","",REGEXREPLACE(C2:C,"[^a-zA-Z0-9\s-]","")))

The pattern [^a-zA-Z0-9\s-] finds any character that’s NOT alphanumeric, whitespace, or hyphen, then replaces it with nothing. The ^ inside brackets flips it to match everything except those characters.

So '98 Red Hawks becomes 98 Red Hawks, Red Hawks '98 becomes Red Hawks 98, and Red Hawks 98-99 stays the same. REGEXREPLACE modifies your whole string instead of trying to extract pieces like REGEXEXTRACT does.

You’re using REGEXEXTRACT when you need REGEXREPLACE. REGEXEXTRACT just pulls out one match, but REGEXREPLACE can clean up your entire string by removing unwanted characters. I made this same mistake when I started using regex in Sheets. The \w class includes underscores with letters and numbers, which probably isn’t what you want. Plus REGEXEXTRACT only grabs the first match it finds. Try this instead: =ARRAYFORMULA(IF(C2:C=“”,“”,REGEXREPLACE(C2:C,“[^A-Za-z0-9\s-]”,“”,“g”))) The “g” flag at the end is key - it replaces ALL unwanted characters in the string, not just the first one. This’ll strip out apostrophes and other special characters while keeping your letters, numbers, spaces, and hyphens.

yeah, i totally get it! you just need a simple SUBSTITUTE instead of regex: =SUBSTITUTE(C2:C,"'",") it will clean your strings without all the hassle. way easier and works like a charm!