I have a Puppeteer script that scrolls down a webpage automatically. I want to stop the scrolling when I press any key and get back the collected data. The scrolling part works fine but I can’t figure out how to return the values at the end. Here’s my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
userDataDir: "C:\\Users\\admin\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
});
const newPage = await browser.newPage();
await newPage.setViewport({
width: 1920,
height: 1080,
deviceScaleFactor: 1,
});
await newPage.goto('https://www.facebook.com/groups/123456/members', {waitUntil: 'networkidle0'});
let memberData = await newPage.evaluate(() => {
const delay = 3000;
let shouldStop = false;
document.addEventListener('keypress', e => shouldStop = true);
let collectedData = [];
let counter = 0;
let currentHeight = 0;
let timer = setTimeout(function scrollDown() {
if ((shouldStop === false) && (document.body.scrollHeight > currentHeight)){
currentHeight = document.body.scrollHeight;
document.scrollingElement.scrollTop = currentHeight;
collectedData.concat(currentHeight); // this should contain the actual results
timer = setTimeout(scrollDown, delay);
}
else {
clearTimeout(timer);
return collectedData; // this return doesn't work
}
}, delay);
});
console.log('FINISHED');
//await browser.close();
})();
The problem is that the return statement inside the setTimeout callback doesn’t actually return anything to the main function. How can I properly return the collected results?
Promise solutions work, but you’re asking for maintenance hell. I’ve debugged enough Puppeteer scripts - they break constantly. Facebook changes DOM structure, scroll behaviors get weird, keypress events randomly fail.
Skip the timeout and callback wrestling. Build this as a proper automation workflow that handles edge cases for you.
You’ll get automatic retries when Facebook throws errors, smart stopping that works even when keypress dies, and data validation so you’re not collecting empty arrays.
Bonus: schedule regular runs, export to different formats, add monitoring so you catch breaks before wasting debug time.
I switched all my scraping after too many nights fixing the same async garbage repeatedly. Way more reliable than babysitting Puppeteer.
Latenode handles these automations perfectly. Set it once, escape callback hell forever.
your setTimeout callback can’t return values directly - it’s async. wrap everything in a promise and call resolve() when you’re done scrolling. also, fix that concat line. use collectedData.push(currentHeight) instead since concat doesn’t modify the original array.
Skip the Promise wrappers and setTimeout chains - just automate this whole thing. Puppeteer scripts get messy fast when you handle async operations manually.
I’ve built similar scrapers for social media data, and the manual approach always creates timing issues and unreliable captures. Set up an automated pipeline that handles scrolling, data collection, and stopping conditions without babysitting.
With automation, you get:
Smart scroll detection that stops based on data patterns
Retry logic for failed network requests
Scheduled runs for fresh data
Automatic storage in whatever format you want
BTW, your code has a bug - you’re using concat() instead of push() for adding items to your array. But honestly, rewriting this as an automated workflow will save way more time than debugging individual issues.
I use Latenode for these web scraping automations. It handles all the async complexity so you can focus on the data you need instead of fighting Puppeteer callbacks.
This is classic async behavior in JavaScript. page.evaluate doesn’t wait for your setTimeout callbacks to finish before returning. I hit this exact issue scraping dynamic content last year. What fixed it for me was wrapping the whole scrolling logic in a Promise and using resolve() to return data when done. But there’s a cleaner approach - use async/await with a while loop instead of recursive setTimeout calls. You get better control and easier debugging. Also caught that you’re using concat() which returns a new array instead of modifying the existing one. Switch to push() or the spread operator to actually add items to your collectedData array. Without this fix, you’ll always get an empty array no matter how you solve the async issue.
The problem is page.evaluate runs synchronously and exits before your setTimeout callbacks finish. I hit the same issue scraping infinite scroll feeds. The Promise approach works, but add error handling since Facebook’s DOM is unpredictable. Also set a max scroll limit so you don’t get stuck in infinite loops if the page won’t stop loading. What saved me was storing results as I went - Facebook pages crash or keypress events fail more than you’d think. I’d log collectedData.length every few scrolls to make sure you’re actually collecting data.
To retrieve the collected data from the scrolling function in Puppeteer, you’ll need to utilize a Promise, as the page.evaluate() function only returns after the code execution is completed, which happens before the setTimeout callback finishes. Below is a revised version of your code:
let memberData = await newPage.evaluate(() => {
return new Promise((resolve) => {
const delay = 3000;
let shouldStop = false;
document.addEventListener('keypress', e => shouldStop = true);
let collectedData = [];
let currentHeight = 0;
let timer = setTimeout(function scrollDown() {
if ((shouldStop === false) && (document.body.scrollHeight > currentHeight)){
currentHeight = document.body.scrollHeight;
document.scrollingElement.scrollTop = currentHeight;
collectedData.push(currentHeight); // using push to update the array
timer = setTimeout(scrollDown, delay);
}
else {
clearTimeout(timer);
resolve(collectedData); // resolve the Promise to return data
}
}, delay);
});
});
Ensure to make this adjustment to effectively collect and return your data.