Managing memory crashes in Puppeteer when dealing with resource-intensive web pages

I’m working with Puppeteer to capture screenshots of various web pages, but I’m running into serious memory issues with certain pages that consume excessive resources.

Here’s my current implementation:

const browser = await puppeteer.launch();
for (const pageId of pageIds) {
    try {
        const newTab = await browser.newPage();
        await newTab.goto(targetUrl + pageId);
        await newTab.waitForTimeout(5000);
        await newTab.screenshot({
            path: `./screenshots/${pageId}.png`,
            clip: { x: 15, y: 80, width: 800, height: 500}
        });
        await newTab.close();
    } catch (err) {
        console.error('Error processing:', pageId, err.message);
    }
}
await browser.close();

The issue occurs when certain pages attempt to load enormous amounts of data (sometimes over 1GB). This causes the entire Chrome process to crash and my server becomes unresponsive. Even though I’m catching the timeout error, the memory consumption at the OS level seems to freeze everything.

What’s the best approach to prevent server crashes when handling memory-heavy pages? Are there specific Chrome flags or Puppeteer settings that can limit resource usage and fail gracefully instead of hanging the entire system?

I’ve encountered similar memory crashes in production. To address this, try launching Puppeteer with the --max-old-space-size=512 and --memory-pressure-off flags, which can significantly help manage resource consumption. Instead of using waitForTimeout, consider implementing page.setDefaultTimeout(10000) prior to navigation to ensure smoother operation. Additionally, if images and stylesheets are unnecessary, utilize request interception to block them. Processing pages in smaller batches and restarting the browser every 50-100 pages can also prevent memory buildup. Implement memory monitoring to terminate the process when RAM usage exceeds a certain threshold, allowing for a restart with the next batch.

You need resource isolation for this workload. Set up a worker pool where each page runs in its own child process - use Node.js cluster module or worker threads. When a memory-heavy page crashes, it’ll only kill that worker instead of your whole app. Use cgroups (or similar) to set OS-level resource limits so individual processes can’t eat all your system memory. For Chrome, add --disable-background-timer-throttling and --disable-renderer-backgrounding flags for better predictability. Check memory usage before loading each page and skip ones that’d push you over your threshold. Better to fail fast than crash everything.

try --max_old_space_size=1024 and --disable-dev-shm-usage flags when launching chrome. also spawn separate browser instances for heavy pages instead of reusing the same one - if one crashes, it won’t kill your entire process. puppeteer memory leaks are brutal.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.