You could try using SUBSTITUTE and SPLIT functions to clean up the data. Since your JSON structure’s consistent, try =SUBSTITUTE(SUBSTITUTE(SPLIT(importxml(B2, "//span[@class='data-wrapper']//text()"), ",")[1], "createdAt:""", ""), """, "") but honestly, that gets messy with all the quotes. I found a cleaner solution when I hit similar issues - use IMPORTXML with a more specific xpath if you can. Sometimes webpages have individual elements for timestamps you can target directly like “//span[@class=‘timestamp’]” or similar. Check the page source to see if createdAt values are wrapped in their own tags. This skips parsing JSON entirely and gives you cleaner data extraction. If that’s not an option, definitely go with the regex solution mentioned above.
This regex finds the “createdAt” field and grabs whatever’s between the quotes after it. I’ve used this tons when scraping JSON data from web pages. The ([^"]+) part catches any non-quote characters - perfect for ISO datetime strings.
For multiple rows, just drag the formula down and it’ll extract the createdAt from each cell. Double-check your IMPORTXML is pulling data correctly first, then add the regex extraction.
Been there! Manual extraction gets old fast with JSON data from web scraping.
I hit this exact problem pulling API responses from web pages. Regex and string manipulation work but break easily when JSON structure changes.
I set up an automated workflow that handles parsing and extraction without Google Sheets formulas. Pull the raw IMPORTXML data, run it through JSON parsing, extract what you need, and push clean results back to your sheet.
My workflow handles thousands of extractions daily. Way more reliable than formula gymnastics and adapts automatically when websites change structure. Plus you get proper error handling when pages don’t load or JSON breaks.
Much cleaner than wrestling with nested SUBSTITUTE functions or hoping regex covers edge cases. Set it up once and forget it.
you can also use MID and SEARCH if regex seems too complicated. try =MID(importxml(B2, "//span[@class='data-wrapper']//text()"), SEARCH("createdAt":"", importxml(B2, "//span[@class='data-wrapper']//text()"))+12, 20) - finds where createdAt is located and grabs 20 characters from that spot. pretty reliable for timestamps.