Hardware specs needed for Puppeteer automation

Server crashed when running multiple Puppeteer instances

I deployed my web scraping script today and it completely overloaded my server. The whole API went down when I tried to run multiple browser instances at once.

I need advice on what kind of server specifications would work best for handling around 1000 automated requests per hour. Right now I’m using a basic DigitalOcean droplet but it clearly can’t handle the load.

What happened:
Tried running 5 concurrent browser sessions on a cheap cloud server and everything crashed.

Current setup:

  • Latest Puppeteer version
  • Ubuntu 16.04 server
  • Node.js version 10

Sample code that’s causing issues:

async function scrapeWebsite(targetUrl) {
  console.log("Starting browser session")
  const chromeBrowser = await puppeteer.launch({ 
    headless: true, 
    args: [`--window-size=${1920},${1080}`, '--no-sandbox', '--disable-dev-shm-usage'] 
  });
  const newPage = await chromeBrowser.newPage();
  await newPage.goto(targetUrl, {waitUntil: 'networkidle2', timeout: 30000});

  var counter = 0;
  let intervalTimer = setInterval(() => {
    console.log(`Waiting... ${counter++}`);
    if (counter > 8) clearInterval(intervalTimer);
  }, 1500);

  await newPage.waitForTimeout(8000);
  return await newPage.content();
}

async function parseResults(htmlContent, sourceUrl) {
  console.log("Processing scraped data");
  const $ = cheerio.load(htmlContent);

  let propertyList = [];
  let totalPages;
  let foundItems = $(".listing-count .total-results").text().trim();

  if(foundItems < 5){
    return{
      success: false,
      error: "Not enough sample data found",
      itemsFound: foundItems,
      sourceLink: sourceUrl
    };
  }

  $(".property-listings .listing-item").each(function () {
    if(propertyList.length < foundItems){
      let priceText = $(this).find(".price-container .listing-price").text();
      let itemUrl = $(this).find(".listing-header a").attr("href");
      priceText = priceText.replace("$", "").trim().replace(/,/g, "");
      if(priceText.trim() * 1){
        propertyList.push({
          price: parseInt(priceText.trim()),
          url: "https://example-realestate.com" + itemUrl
        });
      }
    }
  });

  return {
    success: true,
    properties: propertyList,
    totalFound: propertyList.length,
    pages: totalPages,
    expectedCount: foundItems * 1,
    sourceUrl: sourceUrl
  };
}

Anyone know what kind of RAM and CPU I should be looking at for this type of workload?

Your server specs are way too low for this workload. I’ve run similar automation setups - you need at least 16GB RAM and 6-8 CPU cores to handle that volume reliably. Each Chrome instance eats 800MB-1.2GB of memory, so with 5 concurrent sessions you’re already using 4-6GB just for browsers. Big issue in your code: you’re not closing browser instances properly. Always call await chromeBrowser.close() in a finally block to stop memory leaks. Also try a browser pool pattern - launch one browser and create multiple pages instead of separate browsers for each task. Cut my memory usage by 60-70%. For 1000 requests/hour, you can probably get away with 2-3 concurrent browser instances if you optimize right. Start with 16GB/8-core and monitor your resources before scaling up.

your biggest bottleneck is probly ubuntu 16.04 - that’s ancient now. node 10 is also super outdated and has memory management issues. upgrade to ubuntu 20+ and node 16+ before adding more hardware. i bet you’ll handle way more instances just by fixing those versions.

I encountered similar issues with browser automation recently. The primary concern with your current setup is that each Puppeteer instance consumes a significant amount of resources, typically around 500MB to 1GB of RAM and a considerable CPU load. To handle 1000 requests per hour efficiently, I recommend a minimum of 8GB of RAM and at least 4 CPU cores. Additionally, consider reusing browser instances rather than opening new ones for each task, which can cut memory usage dramatically. Implementing a queue to manage your requests may also improve performance, ideally limiting concurrent sessions to 3 or 4 on an 8GB server. Lastly, it’s essential to upgrade from Node.js version 10 as it’s no longer maintained and may contribute to your system’s instability.