I’m working on a web scraping project where I need to extract information from several different websites. I have a list of target URLs stored in an array like this:
const websites = ['site1.com', 'site2.com', 'site3.com']
Currently I’m trying to process each URL using this approach:
websites.map(async (site) => {
await browser.goto(site);
await browser.waitForNavigation({ waitUntil: 'networkidle' });
})
The problem is that my script doesn’t seem to wait properly for each page to fully load before moving to the next one. It jumps between URLs very quickly instead of processing them one by one. I also tried using browser.waitFor but got similar results.
Is there something wrong with my implementation? Should I be using a different method to handle multiple URLs in sequence? Any advice would be helpful.
map is not the right choice here, it runs tasks in parallel. try a for loop like: for (const site of websites) and put await inside that loop, this way it waits for each site to load before going to the next one.
The problem is map() doesn’t wait for async functions - it fires off all the promises at once instead of running them one by one. I hit this same issue scraping product data from e-commerce sites. A for loop fixes it, but you’ll also want error handling and delays between requests. Sites will ban you if you hammer them too fast. I throw in await new Promise(resolve => setTimeout(resolve, 1000)) between each loop. Also, ditch waitForNavigation if you’re using goto() - it already waits by default. Just use await page.goto(site, { waitUntil: 'networkidle2' }) instead. Works way better for dynamic content.
Your problem is map() - it fires off all promises at once instead of running them one by one. I hit this same issue scraping financial data across exchanges. Fixed it with a for...of loop and proper error handling. Try this: for (const site of websites) { try { await page.goto(site, { waitUntil: 'networkidle2' }); // your scraping logic here } catch (error) { console.log(Failed to scrape ${site}:, error.message); } }. BTW, you don’t need waitForNavigation() after goto() - it already waits by default. I’d also add delays between requests and retry logic for failures to make it more reliable.
This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.