Extracting specific data from IMPORTXML results in Google Sheets

amelial · August 9, 2025, 2:58pm

I’m working with Google Sheets to pull data from websites using the IMPORTXML function.

Currently I’m using this formula:

=importxml(B2, "//span[@class='data-wrapper']//text()")

This gives me output like:

Cell1: {"createdAt":"2020-03-15T14:30:45Z","modifiedAt":"null","deletedAt":"null"}

Cell2: {"createdAt":"2020-02-28T09:15:22Z","modifiedAt":"null","deletedAt":"null"}

Cell3: {"createdAt":"2020-02-20T16:45:11Z","modifiedAt":"null","deletedAt":"null"}

I need to extract just the “createdAt” timestamps from this JSON-like text. What I want is:

Cell1: 2020-03-15T14:30:45Z

Cell2: 2020-02-28T09:15:22Z

Cell3: 2020-02-20T16:45:11Z

How can I modify my approach to get only these date values instead of the full JSON string?

mikechen · August 16, 2025, 3:23pm

You could try using SUBSTITUTE and SPLIT functions to clean up the data. Since your JSON structure’s consistent, try =SUBSTITUTE(SUBSTITUTE(SPLIT(importxml(B2, "//span[@class='data-wrapper']//text()"), ",")[1], "createdAt:""", ""), """, "") but honestly, that gets messy with all the quotes. I found a cleaner solution when I hit similar issues - use IMPORTXML with a more specific xpath if you can. Sometimes webpages have individual elements for timestamps you can target directly like “//span[@class=‘timestamp’]” or similar. Check the page source to see if createdAt values are wrapped in their own tags. This skips parsing JSON entirely and gives you cleaner data extraction. If that’s not an option, definitely go with the regex solution mentioned above.

SwiftCoder15 · August 15, 2025, 3:38am

You can grab those timestamps with REGEXEXTRACT wrapped around your current IMPORTXML formula:

=REGEXEXTRACT(importxml(B2, "//span[@class='data-wrapper']//text()"), "\"createdAt\":\"([^\"]+)\"")

This regex finds the “createdAt” field and grabs whatever’s between the quotes after it. I’ve used this tons when scraping JSON data from web pages. The ([^"]+) part catches any non-quote characters - perfect for ISO datetime strings.

For multiple rows, just drag the formula down and it’ll extract the createdAt from each cell. Double-check your IMPORTXML is pulling data correctly first, then add the regex extraction.

opalEcho · August 14, 2025, 9:56pm

Been there! Manual extraction gets old fast with JSON data from web scraping.

I hit this exact problem pulling API responses from web pages. Regex and string manipulation work but break easily when JSON structure changes.

I set up an automated workflow that handles parsing and extraction without Google Sheets formulas. Pull the raw IMPORTXML data, run it through JSON parsing, extract what you need, and push clean results back to your sheet.

My workflow handles thousands of extractions daily. Way more reliable than formula gymnastics and adapts automatically when websites change structure. Plus you get proper error handling when pages don’t load or JSON breaks.

Much cleaner than wrestling with nested SUBSTITUTE functions or hoping regex covers edge cases. Set it up once and forget it.

More info at https://latenode.com.

Mia92 · August 12, 2025, 11:56am

you can also use MID and SEARCH if regex seems too complicated. try =MID(importxml(B2, "//span[@class='data-wrapper']//text()"), SEARCH("createdAt":"", importxml(B2, "//span[@class='data-wrapper']//text()"))+12, 20) - finds where createdAt is located and grabs 20 characters from that spot. pretty reliable for timestamps.