I pulled XML content from a web source into my Google Sheets using a script. After applying some regex extraction, I ended up with these data rows:
+0",33.89,“4”],[“Mar 31 2023 09: +0”,34.49,“2”]]
+0",4.3,“10”]]
+0",6.003,“18”],[“Mar 31 2023 09: +0”,6.231,“25”]]
+0",3.628,“26”]]
+0",4.05,“1”],[“Mar 31 2023 09: +0”,4.63,“13”]]
I need to pull out the second-to-last numeric value from each row (the ones I highlighted). The challenge is that these rows have different lengths and formats because the CONCATENATE function handles the XML input differently depending on the source website. Can anyone help me create a REGEXEXTRACT formula that will grab these specific numbers regardless of the row structure?
Here’s a more robust approach for variable XML structures - use REGEXEXTRACT to target the penultimate numeric value directly. Try =REGEXEXTRACT(A1,"([0-9]+\.?[0-9]*)","[0-9]+""]"]$") - it grabs any decimal number followed by a comma, quoted number, then closing brackets. I’ve found this way more reliable than counting commas with inconsistent XML parsing. The main difference is the escaped quotes - they handle cases where your concatenation includes actual quote characters. This pattern’s saved me when scraping financial data that comes through messy after XML processing.
I ran into the same issue with messy XML outputs. You want the number that comes right before the final quoted number in each row. This REGEXEXTRACT formula should work: =REGEXEXTRACT(A1,"([0-9]+\.?[0-9]*),\"[0-9]+\"\]\]$"). It anchors to the end of each row with $ and works backwards to find your number, then captures just the value you need. I’ve tested similar patterns on concatenated data - anchoring to the end beats trying to count positions from the start, especially with variable-length rows like yours. The regex grabs both integers and decimals, so it’ll handle all your highlighted values.
Try this pattern: =REGEXEXTRACT(A1,",([0-9.]+),\"[0-9]+\"\]\]$) - looks for comma, your number, comma, then quoted digits at the end. Worked for me with similar XML scraping mess.