How to adapt a Puppeteer script for the Puppeteer Cluster framework?

I’m looking to convert my existing Puppeteer script into one that works with the Puppeteer Cluster to enhance its performance. The current implementation is operational, but I want to leverage parallel processing capabilities with it.

This is my current code:

const currentURL = process.argv[2];
puppeteer.launch({ 
    headless: true, 
    args: ['--no-sandbox', '--disable-setuid-sandbox'] 
}).then(async browserInstance => {
    const newTab = await browserInstance.newPage();
    await newTab.setViewport({width: 320, height: 568});
    await newTab.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1');
    await newTab.goto(currentURL);
    console.log('page loaded');
    await newTab.waitForSelector('body.content');
    await browserInstance.close();
    process.exit(0);
}).catch(function(err) {
    console.log('completed');
    process.exit(0);
});

What are the best practices for refactoring this to utilize Puppeteer Cluster while keeping all functionalities intact?

Converting to Puppeteer Cluster means switching to a task-based approach. You’ll define a task function with your page logic and pass URLs as data to the cluster. Here’s the basic structure:

const { Cluster } = require('puppeteer-cluster');

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 4,
    puppeteerOptions: {
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    }
});

await cluster.task(async ({ page, data: url }) => {
    await page.setViewport({width: 320, height: 568});
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X)...');
    await page.goto(url);
    await page.waitForSelector('body.content');
    console.log('page loaded');
});

await cluster.queue(process.argv[2]);
await cluster.idle();
await cluster.close();

Puppeteer Cluster handles browser instances and pages for you, so you just define what each task does instead of managing the browser lifecycle yourself.

wrap your logic in the task function and let cluster manage the browser. change newTab to page since cluster gives you that automatically. also, try CONCURRENCY_PAGE instead of CONCURRENCY_CONTEXT if you don’t need isolated contexts - it’s way lighter for basic scraping.