How to adapt a Puppeteer script for the Puppeteer Cluster framework?

OwenNebula55 · June 24, 2025, 2:20am

I’m looking to convert my existing Puppeteer script into one that works with the Puppeteer Cluster to enhance its performance. The current implementation is operational, but I want to leverage parallel processing capabilities with it.

This is my current code:

const currentURL = process.argv[2];
puppeteer.launch({ 
    headless: true, 
    args: ['--no-sandbox', '--disable-setuid-sandbox'] 
}).then(async browserInstance => {
    const newTab = await browserInstance.newPage();
    await newTab.setViewport({width: 320, height: 568});
    await newTab.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1');
    await newTab.goto(currentURL);
    console.log('page loaded');
    await newTab.waitForSelector('body.content');
    await browserInstance.close();
    process.exit(0);
}).catch(function(err) {
    console.log('completed');
    process.exit(0);
});

What are the best practices for refactoring this to utilize Puppeteer Cluster while keeping all functionalities intact?

DancingBird · July 1, 2025, 9:21am

Converting to Puppeteer Cluster means switching to a task-based approach. You’ll define a task function with your page logic and pass URLs as data to the cluster. Here’s the basic structure:

const { Cluster } = require('puppeteer-cluster');

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 4,
    puppeteerOptions: {
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    }
});

await cluster.task(async ({ page, data: url }) => {
    await page.setViewport({width: 320, height: 568});
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X)...');
    await page.goto(url);
    await page.waitForSelector('body.content');
    console.log('page loaded');
});

await cluster.queue(process.argv[2]);
await cluster.idle();
await cluster.close();

Puppeteer Cluster handles browser instances and pages for you, so you just define what each task does instead of managing the browser lifecycle yourself.

Laura219 · June 28, 2025, 6:15am

wrap your logic in the task function and let cluster manage the browser. change newTab to page since cluster gives you that automatically. also, try CONCURRENCY_PAGE instead of CONCURRENCY_CONTEXT if you don’t need isolated contexts - it’s way lighter for basic scraping.