I’m trying to check if a pattern exists within a text string, but I’m running into issues with characters that look identical but have different Unicode values.
For example:
- Pattern:
D (Unicode: 68)
- Text:
Dog where the first character is actually Unicode 1044
Both characters appear as the letter “D” visually, but text.includes(pattern) returns false since they have different Unicode codepoints.
const pattern = 'D';
const text = 'Dog'; // First char is Cyrillic D (1044)
console.log(pattern.codePointAt(0)); // 68
console.log(text.codePointAt(0)); // 1044
console.log(text.includes(pattern)); // false
I believe these are called homoglyphs - characters that look the same but are encoded differently. Is there a native JavaScript method to handle this comparison, or do I need to implement a custom solution? I’d prefer not to maintain a large mapping table if possible.
JavaScript doesn’t have built-in homoglyph detection, but you can use the Intl.Collator API as a partial workaround. Set it to ignore case and accents - it’ll catch some visually similar characters:
const collator = new Intl.Collator('en', { sensitivity: 'base' });
const pattern = 'D';
const text = 'Dog'; // Cyrillic D
console.log(collator.compare(pattern, text[0]) === 0); // May return true
This won’t work well for characters from different scripts (Cyrillic vs Latin). For proper homoglyph detection, you’ll need a dedicated library like confusable-homoglyphs or build your own mapping table. The Unicode Consortium has a confusables list that’s perfect for creating these mappings.
ugh, this is such a pain. i dealt with something similar for user input validation. i used string normalization first - normalize('NFD') to decompose characters, then stripped accents. but that won’t help with cyrillic/latin mix. try converting both strings to lowercase ascii equivalents before comparing? there are some npm libs for this but i can’t remember the names right now.