I’m working on a web scraping project and running into issues with scrolling behavior
I need to extract data from a webpage that has a specific container with automatic content loading as you scroll down. The target element is a div with id main_scroll_area that holds several link elements. When you scroll to the bottom of this div, more content gets loaded automatically.
My current approach isn’t working and I keep getting this error: Evaluation failed: ReferenceError: container is not defined
Here’s what I’ve been trying:
let targetDiv = '#main_scroll_area';
// Getting the element reference
let container = await page.evaluate((targetDiv) => {
return document.querySelector(targetDiv);
})
// Trying to scroll within the container
let currentHeight = await page.evaluate("container.scrollHeight");
await page.evaluate("container.scrollTop = container.scrollHeight");
await page.waitForFunction(`container.scrollHeight > ${currentHeight}`);
What’s the correct way to handle scrolling inside a specific div element rather than the whole page? I think I’m missing something about how to properly reference the element across different evaluate calls.
I ran into this exact same problem a few months back when scraping an e-commerce site with infinite scroll. The solution that finally worked for me was using a loop with proper element re-selection on each iteration. Instead of trying to maintain references across evaluate calls, I query the element fresh each time and check for height changes within the same context. Something like this worked well: for(let i = 0; i < maxScrolls; i++) { const prevHeight = await page.evaluate(() => { const el = document.querySelector('#main_scroll_area'); el.scrollTop = el.scrollHeight; return el.scrollHeight; }); await page.waitForTimeout(2000); const newHeight = await page.evaluate(() => document.querySelector('#main_scroll_area').scrollHeight); if(newHeight === prevHeight) break; }. The timeout gives the content time to load between scrolls, and breaking when height stops changing prevents infinite loops when you reach the end.
your issue is that you’re trying to pass container between evaluate calls but it doesn’t persist. better use page.evaluateHandle() to keep the element reference or combine everything in one evaluate block like await page.evaluate(() => { const cont = document.querySelector('#main_scroll_area'); cont.scrollTop = cont.scrollHeight; }). that should work better!
The problem stems from how Puppeteer handles element references across separate evaluate contexts. Each page.evaluate() call runs in its own isolated context, so variables defined in one call won’t be available in subsequent ones. I’ve dealt with similar issues before and found that wrapping the entire scroll logic into a single evaluate function works reliably. Try this approach:
For waiting on new content to load, you’ll need to handle the timing separately using waitForFunction with a fresh query of the element. The key is avoiding cross-context variable references entirely.