How to manage multiple tabs per browser instance with Puppeteer Cluster?

I’m working on a project that needs to handle screenshots for many users at once. Right now, I’m using Puppeteer to do this one at a time, but I want to scale it up. I’ve been looking into Puppeteer Cluster to handle more users simultaneously.

My main concerns are:

  1. I don’t want to set maxConcurrency to 100 because that would use too much memory.
  2. I’d rather have 10 browser instances, each with 10 tabs.

Is there a way to control how many tabs each browser can use in Puppeteer Cluster? Here’s a basic setup I’m considering:

const cluster = await Cluster.launch({
  concurrency: Cluster.CONCURRENCY_CONTEXT,
  maxConcurrency: 10,
  puppeteerOptions: {
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  },
  timeout: 25000,
  retryLimit: 2,
})

Any ideas on how to manage multiple tabs per browser efficiently?

hey there JackWolf69, have you tried using the workerCreationDelay option? it can help spread out the load. also, you might wanna look into the perBrowserOptions to customize each browser instance. i’ve found these helpful for managing resources better. good luck with your project!

I’ve been down this road before, and I can tell you it’s tricky to balance performance and resource usage. One thing that worked well for me was using the CONCURRENCY_BROWSER mode like you’re considering, but with a twist. I implemented a custom queue system that tracks the number of tabs per browser.

Here’s the gist of what I did:

I set up a Map to keep count of tabs for each browser. When a new task comes in, I check if there’s a browser with less than 10 tabs. If not, I create a new browser instance. This way, you’re naturally limiting to about 10 browsers with 10 tabs each.

The tricky part was handling tab closures properly to keep the count accurate. Make sure you’re decrementing the tab count when a task finishes, otherwise you’ll end up with phantom tabs.

Also, don’t forget to monitor your system resources. Even with this setup, you might need to adjust based on your specific hardware constraints. It’s a balancing act, but once you get it right, it’s pretty smooth sailing.

I’ve faced similar challenges with resource management in Puppeteer Cluster. One approach that worked well for me was using the browser concurrency mode instead of context. This allows more control over browser instances. You can then implement a custom task queue that limits the number of tabs per browser. Here’s a rough idea:

const cluster = await Cluster.launch({
  concurrency: Cluster.CONCURRENCY_BROWSER,
  maxConcurrency: 10,
  puppeteerOptions: { ... },
});

const tabsPerBrowser = 10;
let currentBrowser = null;
let tabCount = 0;

cluster.task(async ({ page, data: url }) => {
  if (!currentBrowser || tabCount >= tabsPerBrowser) {
    currentBrowser = await cluster.workerInstance().browser();
    tabCount = 0;
  }
  
  const newPage = await currentBrowser.newPage();
  tabCount++;
  
  // Perform your tasks here
  
  await newPage.close();
  tabCount--;
});

This approach gives you more granular control over browser and tab management while leveraging Puppeteer Cluster’s efficiency.