I’m working with Node.js and trying to scrape text from a webpage using Puppeteer. My code runs fine but I’m having trouble getting the actual text value from the Promise. Here’s what I have so far:
const puppeteer = require('puppeteer');
async function scrapeData(targetUrl){
const browser = await puppeteer.launch();
const pageInstance = await browser.newPage();
await pageInstance.goto(targetUrl);
const [element] = await pageInstance.$x('//header//h1//a');
const property = await element.getProperty('textContent');
const content = (await property).jsonValue();
console.log({content});
await browser.close();
}
scrapeData('https://example.com');
When I execute this script, the output shows: { content: Promise { 'Some Text' } }
I need to extract just the string value from this Promise so I can use it in other parts of my code. How can I properly await the Promise to get the actual text content?
yeah this happens alot with puppeteer. try using page.$eval()
instead - its much simpler. something like const content = await pageInstance.$eval('header h1 a', el => el.textContent);
should work better and avoids the whole promise mess your dealing with.
You’re almost there but have a small issue with the Promise handling. The problem is in this line: const content = (await property).jsonValue();
- you need to await the jsonValue() method as well. Here’s the corrected version:
const content = await (await property).jsonValue();
Or even cleaner, you can chain it like this:
const content = await property.jsonValue();
The getProperty method already returns a JSHandle, so you just need to await the jsonValue() call. I’ve run into this exact same issue before when I started with Puppeteer - it’s easy to miss that extra await. Once you make this change, your console.log should show the plain string value instead of the Promise wrapper.
Another approach that works well is using page.evaluate()
to execute the XPath query directly in the browser context. This eliminates the need for handling JSHandles altogether:
const content = await pageInstance.evaluate(() => {
const result = document.evaluate('//header//h1//a', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
return result.singleNodeValue ? result.singleNodeValue.textContent : null;
});
I’ve found this method particularly useful when dealing with complex XPath expressions since it runs the evaluation in the browser’s native environment rather than through Puppeteer’s wrapper methods. The text content comes back as a plain string without any Promise complications, which makes it easier to work with in subsequent operations.