How to perform Unicode case folding string comparison in JavaScript

Laura219 · June 3, 2025, 10:12pm

I’m trying to implement proper case-insensitive string comparison in JavaScript that handles Unicode characters correctly. I know that the recommended approach is to use case folding before comparing strings.

In Python this is straightforward with the casefold() method, but I can’t find a good JavaScript equivalent. I tried using Intl.Collator but it doesn’t give me the exact behavior I need.

Here’s what I’m testing with:

let str1 = "Ñ İ ß ꮳꮃꭹ";
let str2 = "ñ i̇ ss ᏣᎳᎩ";
let str3 = "n i̇ ß ᏣᎳᎩ";

With sensitivity: 'base':

new Intl.Collator('en', { sensitivity: 'base' }).compare(str1, str2) === 0 // returns true (correct)
new Intl.Collator('en', { sensitivity: 'base' }).compare(str1, str3) === 0 // returns true (wrong)

With sensitivity: 'accent':

new Intl.Collator('en', { sensitivity: 'accent' }).compare(str1, str2) === 0 // returns false (wrong)
new Intl.Collator('en', { sensitivity: 'accent' }).compare(str1, str3) === 0 // returns false (correct)

I need str1 and str2 to match but str1 and str3 to be different. Is there a JavaScript library or native method that can handle proper Unicode case folding?

emmad · June 16, 2025, 12:03am

After dealing with similar Unicode comparison challenges in a multilingual application, I found that combining Intl.Collator with custom preprocessing gave better results than relying on a single method. The issue you’re encountering stems from the fact that different sensitivity levels treat accents and case differently, but you need granular control over both. Here’s the approach that worked for me: javascript function compareUnicodeStrings(a, b) { const normalized1 = a.normalize('NFKC').toLowerCase(); const normalized2 = b.normalize('NFKC').toLowerCase(); const collator = new Intl.Collator('en', { sensitivity: 'accent', numeric: true, ignorePunctuation: false }); return collator.compare(normalized1, normalized2) === 0; } The NFKC normalization handles compatibility characters better than NFD for comparison purposes, while the lowercase conversion addresses basic case folding. This method correctly distinguished between your test cases when I tried it. For production use, I also implemented a fallback that checks multiple locale-specific collators when dealing with mixed scripts, since single-locale collators sometimes miss edge cases with non-Latin characters.

Bob_Clever · June 15, 2025, 3:45pm

honestly this is kinda tricky but i found a workaround using the unicode-case-fold npm package. it’s basically a port of the unicode spec case folding algorithm. just npm install unicode-case-fold then use caseFold(str1) === caseFold(str2) and it should handle your test cases properly unlike the built-in methods.

FlyingStar · June 12, 2025, 11:46pm

I ran into this exact issue about six months ago when working on a search feature that needed to handle international text properly. The problem is that JavaScript doesn’t have a direct equivalent to Python’s casefold(), which is frustrating.

What ended up working for me was using the toLocaleLowerCase() method with specific locale parameters, combined with Unicode normalization. For your case, try this approach:

function unicodeCaseFold(str, locale = 'en-US') {
    return str.normalize('NFD').toLocaleLowerCase(locale);
}

let result1 = unicodeCaseFold(str1) === unicodeCaseFold(str2);
let result2 = unicodeCaseFold(str1) === unicodeCaseFold(str3);

The key is using NFD normalization first, which decomposes characters into their base forms plus combining marks. This handles cases like the dotted i properly. For Cherokee characters, you might need to specify ‘chr’ as the locale if available, though browser support varies.

It’s not perfect like Python’s implementation, but it handles most Unicode edge cases I’ve encountered in production. The main limitation is that some complex case folding rules still aren’t handled, but for typical international text it works reliably.