How to extract meta tag content from HTML response in API testing?

SwimmingShark · April 9, 2025, 6:17pm

Hey folks, I’m stuck trying to get a value from a meta tag in an HTML response. The API returns this structure:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta name="csrf-token" content="important-value-here">
</head>

I need to grab that csrf-token content. I’ve tried a few things:

Storing the response as bytes
Using karate.extract() with regex
Converting response to string first
Even tried using Karate UI

Nothing’s working so far. Here’s one of my attempts:

Given url dashboard
When method get
* def responseHtml = response
* def csrfToken = karate.extract(responseHtml, 'csrf-token.text=\"([^\"]+)', 1)
* print csrfToken

It just prints null. Any ideas on how to tackle this? I feel like I’m missing something obvious. Thanks!

sophiac · April 15, 2025, 12:29am

I’ve encountered this issue before, and I found that using a combination of karate.xmlPath() and a specific XPath query works well for extracting meta tag content. Here’s what worked for me:

* def csrfToken = karate.xmlPath(responseHtml, "//meta[@name='csrf-token']/@content")
* print csrfToken

This approach treats the HTML as XML and uses XPath to directly target the content attribute of the csrf-token meta tag. It’s more reliable than regex for HTML parsing and handles different formatting variations. Make sure your response is being stored as a string, not bytes. If you’re still having trouble, double-check that the API is actually returning the expected HTML structure in your test environment.

sarahj · April 13, 2025, 1:27pm

I’ve had success extracting meta tag content using a different approach. Instead of Karate’s built-in functions, I’ve found that incorporating a Java HTML parser like jsoup can be more robust. Here’s what worked for me:

First, add jsoup to your project dependencies. Then, in your Karate test:

def jsoup = Java.type(‘org.jsoup.Jsoup’)
def doc = jsoup.parse(responseHtml)
def csrfToken = doc.select(‘meta[name=csrf-token]’).attr(‘content’)
print csrfToken

This method has been reliable across various HTML structures and doesn’t rely on specific formatting. It’s also more forgiving if the HTML isn’t perfectly valid. Just make sure you’re handling the response as a string. If you’re still having issues, it might be worth double-checking the actual response content to ensure it matches what you’re expecting.

avamtz · April 12, 2025, 8:27am

hey mate, have u tried using cheerio? its pretty neat for parsing html. heres a quick example:

* def cheerio = read('classpath:cheerio.min.js')
* def parse = function(html){ return JSON.parse(karate.invoke('cheerio.load', html)) }
* def $ = parse(responseHtml)
* def csrfToken = $('meta[name=\"csrf-token\"]').attr('content')

works like a charm for me. good luck!