I’m working on a web scraping project where I need to visit multiple websites at once using puppeteer with headless Chrome. The problem is really frustrating because the timeouts happen randomly.
Sometimes everything works perfectly and all my crawlers finish without issues. But other times, most of them just hang and timeout after 30 seconds. It’s not consistent at all which makes debugging really hard.
The random timeouts scream network issues, not code problems. I’ve hit this exact thing scraping at scale - the inconsistency comes from DNS delays or your OS running out of connection slots. You’re opening all connections at once, which kills your available sockets. Use a semaphore to limit concurrent connections - maybe 3-5 pages running while the rest wait in queue. Also check if you’re maxing out Chrome’s per-domain connection limits. Adding --max-connections-per-host=6 to my launch args fixed it for me. Could also be your ulimit settings for file descriptors. Each page eats several file handles, and when you hit that wall you get these exact random failures.
you’re missing proper error handling. big sites usually block headless browsers, and your code doesn’t handle navigation timeouts well. add {timeout: 60000} to your goto call and rotate user agents between requests. also, use browser contexts instead of tabs - they’re more isolated and won’t interfere with each other.
Those random timeouts are probably from your system running out of resources. Managing 10 browser tabs at once can overwhelm your CPU and memory, leading to inconsistent results. I’ve faced a similar issue when scraping e-commerce sites, which I resolved by implementing a connection pool with maximum concurrency limits. It’s better to limit the number of open tabs, ideally keeping it to 3-4 at a time.
Moreover, the websites you’re targeting might have strict rate limiting and bot detection measures. Sites like Facebook and Amazon are particularly harsh towards automated requests, which could explain why you’re encountering these random failures. Consider adding explicit timeouts to your goto() calls and incorporating retry logic for any failures. Additionally, adding random delays between requests can help mimic human behavior and reduce the likelihood of being blocked.