I’m working on a project where I need to load all the content on a webpage that uses infinite scrolling. The tricky part is that new stuff shows up as you scroll down and it has a special class name. I want to keep scrolling until everything’s loaded but I’m running into some issues.
Here’s what I’ve tried so far:
async function scrollPage(page) {
await page.evaluate(() => {
document.documentElement.scrollTop = document.documentElement.scrollHeight;
});
await page.waitForSelector('.dynamic-content');
}
The problem is that this keeps going even after all the content is loaded. It ends up timing out because it’s still looking for new stuff that isn’t there anymore.
Does anyone know a smart way to detect when we’ve hit the bottom and there’s no more new content coming in? I’m stumped and could really use some help figuring this out. Thanks!
hey noah, try checking if new content loads before scrolling further. for example, keep track of page height:
let last=0;
while(true) {
await scrollPage(page);
let h = await page.evaluate(() => document.body.scrollHeight);
if(h===last) break;
last = h;
}
hope it helps!
I’ve encountered similar challenges with infinite scrolling pages. One effective approach is to use a combination of scrolling and a mutation observer. Below is a snippet that might help:
async function scrollUntilNoNewContent(page) {
await page.evaluate(() => {
return new Promise((resolve) => {
let lastHeight = document.body.scrollHeight;
const observer = new MutationObserver(() => {
const newHeight = document.body.scrollHeight;
if (newHeight > lastHeight) {
lastHeight = newHeight;
window.scrollTo(0, newHeight);
} else {
observer.disconnect();
resolve();
}
});
observer.observe(document.body, { childList: true, subtree: true });
window.scrollTo(0, lastHeight);
});
});
}
This method uses a MutationObserver to detect when new content is added to the page. It automatically scrolls when new content appears and resolves when no new content is detected. I’ve found it reliable for multiple infinite scroll implementations.
I’ve dealt with this exact issue before, and it can be tricky. One approach that worked well for me was combining a scroll check with a timeout. Here’s a rough idea:
async function scrollUntilNoNewContent(page, timeout = 30000) {
let lastHeight = await page.evaluate('document.body.scrollHeight');
let start = Date.now();
while (true) {
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
await page.waitForTimeout(2000); // Give time for content to load
let newHeight = await page.evaluate('document.body.scrollHeight');
if (newHeight === lastHeight || Date.now() - start > timeout) {
break;
}
lastHeight = newHeight;
}
}
This function scrolls and waits, then checks if the page height has changed. It also includes a timeout to prevent infinite loops. You might need to tweak the timeout values based on the specific site you’re scraping. Hope this helps!