Extracting values from page.evaluate() in Puppeteer: What's the trick?

I’m working on a Puppeteer script to scrape YouTube. I’m having trouble getting data out of the page.evaluate() function. Here’s what I’ve tried:

async function fetchData() {
  const results = await page.evaluate(() => {
    return new Promise(resolve => {
      let videoCount = 0;
      const checkScroll = setInterval(() => {
        const videos = document.querySelectorAll('.video-item');
        videoCount = videos.length;
        window.scrollBy(0, 200);
        if (videoCount >= 50 || window.innerHeight + window.scrollY >= document.body.offsetHeight) {
          clearInterval(checkScroll);
          resolve(Array.from(videos));
        }
      }, 500);
    });
  });
  return results;
}

const videoList = await fetchData();
console.log(videoList);

The console.log inside page.evaluate() shows the array, but videoList is empty. What am I missing? How can I get the data out?

hey mate, i think i kno whats goin on. puppeteer can’t handle complex stuff coming outta page.evaluate(). try this:

return JSON.stringify(Array.from(videos, v => ({
  title: v.querySelector('.title').textContent,
  url: v.querySelector('a').href
})));

the parse it back outside. should work!

I’ve encountered this issue before when working with Puppeteer. The problem lies in how page.evaluate() serializes data. It can’t return complex objects or DOM elements directly.

To fix this, you need to extract the relevant data from the video elements before returning. Try modifying your code like this:

const results = await page.evaluate(() => {
  return new Promise(resolve => {
    // ... existing code ...
    if (videoCount >= 50 || window.innerHeight + window.scrollY >= document.body.offsetHeight) {
      clearInterval(checkScroll);
      const extractedData = Array.from(videos).map(video => ({
        title: video.querySelector('.title').textContent,
        url: video.querySelector('a').href
        // Add other properties you need
      }));
      resolve(extractedData);
    }
  });
});

This approach extracts the necessary information from each video element, creating a new array of plain JavaScript objects that can be serialized and returned from page.evaluate(). You should now see the data in videoList when you log it.

I’ve dealt with similar issues in Puppeteer before, and there’s a neat trick to solve this. The problem is that page.evaluate() can’t return complex objects or DOM elements directly. Instead, you need to serialize the data before returning it.

Here’s what I’d suggest:

  1. Inside your page.evaluate(), create a simple object or array with the data you want.
  2. Use JSON.stringify() to convert this object into a string.
  3. Outside page.evaluate(), use JSON.parse() to convert it back into a JavaScript object.

So your code might look something like this:

const results = await page.evaluate(() => {
  // Your existing code here
  const extractedData = Array.from(videos).map(video => ({
    title: video.querySelector('.title').textContent,
    url: video.querySelector('a').href
  }));
  return JSON.stringify(extractedData);
});

const videoList = JSON.parse(results);
console.log(videoList);

This approach ensures that complex data structures can be safely passed out of page.evaluate(). Give it a try and see if it resolves your issue!