Improving Puppeteer Performance for PDF Generation

I have a web application that produces large PDF files, sometimes exceeding 100 pages. The process I follow is:

  1. Generate HTML with nunjucks templates.
  2. Launch a Puppeteer instance.
  3. Create the PDF cover page (see code snippet below).
  4. Generate the remaining PDF pages.
  5. Combine the pages into a single document and create a byte buffer.
import { PDFDocument } from 'pdf-lib';

const generatedHtml = await nunjucks.render(...);

const browserInstance = await puppeteer.launch({
  args: [
    '--disable-dev-shm-usage',
    '--no-first-run',
    '--no-sandbox',
    '--no-zygote',
    '--single-process',
  ],
  headless: true
});

const newPage = await browserInstance.newPage();

await newPage.setContent(`${generatedHtml}`, { waitUntil: 'networkidle0' });

const coverBuffer = await newPage.pdf({
  ... someOptions,
  pageRanges: '1'
});

const contentBuffer = await newPage.pdf({
  ... someOptions,
  pageRanges: '2-',
  footerTemplate: ...,
});

const finalDoc = await PDFDocument.create();
const coverDocument = await PDFDocument.load(coverBuffer);
const [cover] = await finalDoc.copyPages(coverDocument, [0]);
finalDoc.addPage(cover);

const contentDocument = await PDFDocument.load(contentBuffer);
for (let index = 0; index < contentDocument.getPageCount(); index++) {
    const [contentPage] = await finalDoc.copyPages(contentDocument, [index]);
    finalDoc.addPage(contentPage);
}

const finalPdfBytes = Buffer.from(await finalDoc.save());
// Handle the bytes as needed

As the PDF size increases, the processing time and memory consumption also rise, causing delays in the API. What strategies can I implement to optimize this process, or are there alternative tools available to prevent API stalls?

To optimize Puppeteer PDF generation for large documents, consider using parallel processing when feasible. If your architecture supports scaling, distribute the PDF generation tasks across multiple instances or containers. Each instance can process a portion of the document, and finally, merge them into a complete PDF. This approach reduces the time taken by each instance and can lead to faster processing overall. Also, exploring headless browser alternatives like Playwright might offer better performance and resource management in some cases.