How to improve performance when creating large PDF files with Puppeteer

liamj · July 24, 2025, 8:00pm

I’m working on a web application that creates massive PDF files, sometimes reaching 100+ pages. The current process is really slow and uses too much memory.

My current workflow looks like this:

Build HTML content with nunjucks
Launch puppeteer browser instance
Generate cover page as PDF
Generate remaining content pages
Combine everything into final document

Here’s my current implementation:

import { PDFDocument } from 'pdf-lib';

const htmlContent = await nunjucks.render(...);

const browser = await puppeteer.launch({
  args: [
    '--disable-dev-shm-usage',
    '--no-first-run', 
    '--no-sandbox',
    '--no-zygote',
    '--single-process'
  ],
  headless: true
});

const newPage = await browser.newPage();

await newPage.setContent(`${htmlContent}`, { waitUntil: 'networkidle0' });

const coverPdf: Buffer = await newPage.pdf({
  ... configOptions,
  pageRanges: '1'
});

const contentPdf: Buffer = await newPage.pdf({
  ... configOptions, 
  pageRanges: '2-',
  footerTemplate: ...
});

const finalDoc = await PDFDocument.create();
const titleDoc = await PDFDocument.load(coverPdf);
const [titlePage] = await finalDoc.copyPages(titleDoc, [0]);
finalDoc.addPage(titlePage);

const bodyDoc = await PDFDocument.load(contentPdf);
for (let pageIndex = 0; pageIndex < bodyDoc.getPageCount(); pageIndex++) {
    const [currentPage] = await finalDoc.copyPages(bodyDoc, [pageIndex]);
    finalDoc.addPage(currentPage);
}

const result = Buffer.from(await finalDoc.save());
// process result buffer

When dealing with really large documents, this whole process becomes extremely slow and consumes way too much memory. This blocks my API completely until everything finishes. What are some ways to optimize this workflow or alternative approaches I could use to prevent API blocking?

Harry47 · July 30, 2025, 8:26pm

You’re creating unnecessary overhead by making two PDFs and merging them. Skip the pageRanges splitting - generate everything in one pass and handle footers with CSS page rules instead. I’ve had better luck using page-break properties in CSS. No performance hit from multiple PDF operations that way. Also, try pagination at the HTML level rather than letting Puppeteer churn through massive single pages. For memory problems, shrink your viewport and disable images/fonts you don’t need with page.setRequestInterception(). That single-process flag actually makes large documents worse - ditch it and let Chromium handle its own processes. Most important: clean up properly. Put page.close() and browser.close() in finally blocks or you’ll get memory leaks that stack up over multiple requests.

benmoore · July 30, 2025, 2:17pm

Had the same memory problems with large PDF generation in production. You’re holding the entire document in memory while processing - that’s your issue. Don’t generate the full HTML upfront. Break it into smaller chunks and process them one by one. Render sections separately and merge as you go instead of building one massive HTML string. I also set up a queue system - offload PDF generation to a background worker and return a job ID right away so your API doesn’t get blocked. For memory management, set explicit limits on the browser instance and use streams instead of buffers when you can. Make sure you’re disposing of pages and closing browser instances between operations. Just chunking alone cut my processing time by 60% for docs over 50 pages.

bellagarcia · July 27, 2025, 5:23pm

Switch to wkhtmltopdf for large docs - it’s way faster and uses less memory than Chromium-based tools. You’re making two separate pdf() calls when you could generate once without pageRanges and split after. Try streaming the output directly instead of loading everything into PDFDocument. You don’t always need pdf-lib for simple concatenation.