I’m trying to parse a string that has HTML markup mixed with regular text. I want to separate it so each character becomes its own element, but keep the HTML tags intact as complete units.
let text = 'hel<em class="highlight">lo</em> world';
console.log("parsing result: " + JSON.stringify(text.split(/(<[^>]*>)|/)));
This outputs:
result: ["h",null,"e",null,"l","<em class=\"highlight\">","l",null,"o",null,"</em>"," ",null,"w",null,"o",null,"r",null,"l",null,"d",null]
After removing null values I get the desired output:
final: ["h","e","l","<em class=\"highlight\">","l","o","</em>"," ","w","o","r","l","d"]
Is there a better regex pattern that can handle HTML elements as complete tokens while splitting everything else character by character without generating null values?