I’m trying to scrape info from a bunch of websites using Puppeteer. I’ve got a list of web addresses stored in an array. Here’s what my code looks like right now:
const sites = ['site1', 'site2', 'site3'];
sites.map(async (site) => {
await browser.newPage(site);
await browser.waitForPageLoad({ timeout: 30000 });
});
But it’s not working like I hoped. The browser seems to rush through all the sites without waiting for each page to load properly. I even tried using browser.pause()
, but no luck.
Am I doing something wrong here? Or is this not the right way to handle multiple pages with Puppeteer? Any tips would be super helpful!
Your approach is close, but there’s a better way to handle multiple pages with Puppeteer. The issue stems from map() not waiting for each async operation to complete. Here’s a more effective method:
for (const site of sites) {
const page = await browser.newPage();
await page.goto(site, { waitUntil: 'networkidle0' });
// Perform your scraping operations here
await page.close();
}
This ensures each page loads fully before moving to the next. It’s also crucial to close pages after use to manage resources effectively. Consider adding error handling with try/catch blocks for robustness. If you’re scraping many sites, adding a small delay between requests (e.g., page.waitForTimeout(1000)) can help avoid potential IP blocks.
i had a simlar issue. using a for…of loop awaits each page load. check this:
for (const site of sites) {
const page = await browser.newPage();
await page.goto(site);
await page.waitForLoadState(‘networkidle’);
await page.close();
}
hope that helps!
Hey there, I’ve dealt with similar challenges when scraping multiple sites with Puppeteer. Your issue is likely due to how map() handles async operations. Instead of using map(), I’d recommend a for…of loop for better control. Here’s what worked for me:
for (const site of sites) {
const page = await browser.newPage();
await page.goto(site);
await page.waitForNavigation({ waitUntil: "networkidle0" });
// Do your scraping here
await page.close();
}
This approach ensures each page loads fully before moving to the next. Also, don’t forget to wrap everything in a try/catch block to handle any errors that might pop up during scraping. It’s saved me a ton of headaches!
One more tip: if you’re scraping a lot of sites, consider adding a small delay between requests to avoid getting blocked. Something like await page.waitForTimeout(1000) can help. Good luck with your project!