Troubleshooting Puppeteer web scraping issues

Hey everyone, I’m working on fixing some old web scrapers at my internship. The original developer left, and the target websites have changed their layouts. I’m having trouble with the Puppeteer code.

I updated the selectors like this:

const PATH_VARIATIONS = [
  {
    URL_XPATH_CLASS: 'jobs',
    URL_XPATH_ATTRIBUTES: '/header/h2/a/@href',
    TITLE_XPATH_CLASS: 'job clicky',
    TITLE_XPATH_ATTRIBUTES: '/header/h2/a'
  }
];

But I’m getting this error when trying to use page.$x():

TypeError: page.$x is not a function

This happens in the tryPathVariationOnPage function:

let xPathTitleStr = `//*[contains(@class, "${titleClass}")]${titleAttributes}`;
let xpathTitleData = await page.$x(xPathTitleStr);

I’m not sure how to fix this. Any ideas what could be causing the $x function to be undefined? Thanks for any help!

hey, i had a similar problem. try using page.evaluate() instead. it lets u run javascript directly in the browser context. something like:

const xpathResults = await page.evaluate((xpath) => {
const results = document.evaluate(xpath, document);
return [… results];
}, xPathTitleStr);

this worked for me. good luck!

Have you considered using Puppeteer’s built-in waitForXPath function? It’s designed specifically for handling XPath selectors and might resolve your issue. Here’s a quick example of how you could modify your code:

const xPathTitleStr = `//*[contains(@class, "${titleClass}")]${titleAttributes}`;
await page.waitForXPath(xPathTitleStr);
const [element] = await page.$x(xPathTitleStr);

This approach ensures the element exists before attempting to select it. Also, double-check that you’re using the latest version of Puppeteer, as older versions might lack full XPath support. If you’re still running into issues, you might want to explore using CSS selectors instead, as they tend to be more reliable and performant in Puppeteer.

I ran into the same Puppeteer issue recently and discovered that the problem actually lies in how the $x function is being accessed. Instead of being a method directly on the Page object, it belongs to an ElementHandle, which can lead to some confusion if not handled correctly. My solution was first to update Puppeteer, since older versions sometimes mismanage XPath support. I also found success by invoking page.evaluate to execute the XPath query in the page’s own context using document.evaluate to collect the elements. In my experience, making sure the page is fully loaded before running such queries eliminates a lot of unpredictable behavior.