I’m working on scraping some data from a social media site using Puppeteer. I managed to navigate to the page I need, but I’m having trouble with element selection.
When I test this XPath in the browser console, it works perfectly:
$x(`//span[@data-id="postContent"]`)
I can even extract the text content successfully:
$x(`//span[@data-id="postContent"]`)[0].textContent
But when I try to use the same selector in my Puppeteer script with page.$$, it returns nothing:
const elements = await page.$$(`xpath/.//span[@data-id='postContent']`);
console.log(elements); // returns empty array
The XPath selector definitely works in the browser console, so I’m not sure why it fails in Puppeteer. Has anyone encountered this issue before? I’d really appreciate any suggestions on how to make this work. Getting the text content from these elements would be even better if possible.
I’ve double-checked that the page is fully loaded before running the selector, but still no luck.
Had the same issue a few months ago scraping product data. You’re mixing up the methods like emmat83 said - page.$$ only works with CSS selectors, not XPath. Switch to page.$x() and you’ll get an array of ElementHandle objects back. To grab the text from each element, use evaluate() on each handle: const textContent = await elements[0].evaluate(el => el.textContent);
worked for me. page.$x() already returns an array, so you can loop through it for multiple elements. Also, drop the xpath/ prefix you were using with page.$$ - page.$x() doesn’t need it.
Had this exact issue scraping e-commerce sites last year. You’re using the wrong Puppeteer method - page.$$ only works with CSS selectors, not XPath. That’s why your XPath fails even though it works fine in the browser console. For XPath in Puppeteer, use page.$x() instead: const elements = await page.$x(‘//span[@data-id=“postContent”]’); Watch out for timing issues too. Social media sites load content dynamically, so even if the page seems loaded, your target elements might appear later. I’d add waitForSelector or waitForXPath before selecting anything. page.$x() returns ElementHandle objects, so you can grab text with evaluate() on each element if you need it.
try using page.$x() instead of page.$$() for xpath… the $$ method is for css selectors. just do const elements = await page.$x('//span[@data-id="postContent"]');
and it should work.