Retrieving values from page.evaluate() in Puppeteer: Any tips?

Hey folks! I’m working on a YouTube scraper with Puppeteer and I’m stumped. I’m trying to get data from inside page.evaluate(), but I can’t seem to return the result. Here’s what I’ve got:

async function fetchVideoData() {
  const videoInfo = await page.evaluate(() => {
    return new Promise(resolve => {
      let scrollDistance = 0;
      let intervalId = setInterval(() => {
        let videoCards = document.querySelectorAll('.video-card');
        console.log(`Found ${videoCards.length} videos`);
        let pageHeight = document.body.scrollHeight;
        window.scrollBy(0, 200);
        scrollDistance += 200;
        if (scrollDistance >= pageHeight || videoCards.length >= 50) {
          clearInterval(intervalId);
          resolve(Array.from(videoCards));
        }
      }, 500);
    });
  });
  return videoInfo;
}

let videoList = await fetchVideoData();
console.log(videoList);

The console.log in the browser shows the array, but videoList is empty. What am I missing? Any help would be awesome!

I’ve encountered this issue before when working with Puppeteer. The problem lies in how Puppeteer handles serialization of data between the browser and Node.js contexts. Complex objects like DOM elements can’t be directly passed. Instead, try extracting the needed data from the video cards before returning. For example:

const videoInfo = await page.evaluate(() => {
  // ... your existing code ...
  resolve(Array.from(videoCards).map(card => ({
    title: card.querySelector('.title').textContent,
    views: card.querySelector('.views').textContent,
    // Add other relevant data
  })));
});

This approach should allow you to successfully retrieve the data from page.evaluate(). Remember to adjust the selectors based on your actual HTML structure.

hey emma, i had a similar issue. try using page.evaluate(async () => {...}) and await resolve(...). also, make sure ur returning the promise from page.evaluate. sometimes puppeteer can be tricky with async stuff. hope this helps! lmk if u need more details

I’ve been down this road before, and it can be frustrating. The issue you’re facing is likely due to serialization limitations in Puppeteer. DOM elements can’t be passed directly from the browser context to Node.js.

Here’s a workaround that’s worked well for me:

Instead of returning the entire videoCards array, extract the data you need into a plain object first. Something like this:

const videoInfo = await page.evaluate(() => {
  return new Promise(resolve => {
    // Your existing scrolling logic here
    // When ready to resolve:
    resolve(Array.from(videoCards).map(card => ({
      title: card.querySelector('.title')?.textContent?.trim(),
      url: card.querySelector('a')?.href,
      // Add other properties you need
    })));
  });
});

This approach should give you a nice, serializable array of video data. Just make sure to adjust the selectors to match your actual HTML structure.

Also, don’t forget to add error handling. Puppeteer can sometimes throw unexpected errors, especially with network issues or timing problems.