Optimizing Puppeteer Scraping Performance

I’m scraping a webpage with Puppeteer, but the response takes 12-15 seconds. How can I reduce this delay? See below:

router.get("/dashboard", async (req, res) => {
  try {
    const browserInstance = await puppeteer.launch();
    const pageView = await browserInstance.newPage();
    const currentPage = req.query.page || 1;
    await pageView.goto(`https://sampledomain.com/page=${currentPage}`);
    const itemTitles = await pageView.$$eval('.card h2', items => items.map(item => item.textContent));
    const itemImages = await pageView.$$eval('.card img', items => items.map(item => item.src));
    await browserInstance.close();
    res.json({ itemTitles, itemImages });
  } catch (err) {
    res.status(500).json({ error: "Data retrieval failed." });
  }
});

hey u try blocking unneeded reqs with setRequestInterception and device headless mode. also block css, imgs if possible. that worked for me to speed things up, might work for u too

I adjusted my approach by reusing a single browser instance for multiple requests instead of launching a new one each time. This significantly reduced the overhead associated with browser startup. I also implemented request interception to block resources like fonts and unnecessary scripts that were slowing down page rendering. Additionally, I replaced the default wait times by explicitly waiting for specific selectors to appear, ensuring that the script only waits as long as needed. These changes, combined with tuning device emulation settings, brought down the response time and improved overall performance.

I encountered similar challenges when scraping complex webpages using Puppeteer. I found that by launching the browser in a minimal configuration and avoiding unnecessary features, the performance improved a lot. My approach involved setting up the page to abort requests that weren’t essential for rendering, such as ads or tracking scripts, and explicitly waiting only for key selectors instead of arbitrary delays. I also experimented with simplifying rendering by disabling animations and extra CSS processing which helped reduce the time to load dynamic content. Logging the network events was crucial to fine-tune these settings and ensure faster time-to-content.

i tried enabling cache with page.setCacheEnabled(true) and setting a tiny viewport. this helped reduce unecessary load on static pages. might not work everywhere but its saved me a bunch on similar sites