Extracting and Using Page Count Variables in Puppeteer PDF Generation

ethant · July 30, 2025, 6:54am

I’m working on a project where I create several PDFs from different URLs using Puppeteer. After generating these PDFs, I merge them together using easy-pdf-merge.

My goal: I want to update the page numbering across all PDFs so they continue sequentially. For example, if the first PDF has 5 pages, the second PDF should start numbering from page 6.

The problem: I need to extract the total page count from each generated PDF and use that information to set the starting page number for the next PDF in the sequence.

const puppeteer = require('puppeteer');

class DocumentGenerator {
    static async createPDF(websiteUrl) {
        const browser = await puppeteer.launch({ headless: true });
        const webpage = await browser.newPage();

        await webpage.goto(websiteUrl);
        const documentSettings = {
            path: 'output.pdf',
            printBackground: true,
            margin: {
                top: '3cm',
                bottom: '3cm',
                left: '2cm',
                right: '2cm'
            },
            width: 1920,
            height: 2480
        };
        
        await webpage.emulateMedia('screen');
        const document = await webpage.pdf(documentSettings);

        await browser.close();
        return document;
    }
}

(async() => {
    const websites = ['https://example.com/', 'https://test.com/'];
    
    if(websites.length > 1){
        for(let j = 0; j < websites.length; j++){
            await DocumentGenerator.createPDF(websites[j]);
        }
    }
})();

I found that Puppeteer uses Chrome’s printing engine which has pageNumber and totalPages variables available. But I can’t figure out how to access these values programmatically to use them for sequential numbering across multiple PDFs.

Is there a way to retrieve the page count from a generated PDF and use it to set custom page numbering for subsequent PDFs?

Mandy_45Photography · August 10, 2025, 10:59pm

Use pdf-lib instead - it reads page count straight from the buffer without extra libraries. Just const pdfDoc = await PDFDocument.load(document) then pdfDoc.getPageCount(). Much cleaner than parsing with separate tools.

ameliat · August 10, 2025, 5:45pm

Use pdf2pic or pdf-poppler to grab the page count after generating each PDF. I did something similar by tweaking the DocumentGenerator to return metadata with the PDF buffer. The trick is reading the PDF metadata right after you create it - don’t try accessing Chrome’s internal stuff. In your loop, just add up the page counts and pass that total as a parameter for custom headers/footers with the right starting page number. For the actual numbering, CSS @page rules with custom counters work great. Set counter-reset dynamically by injecting CSS into each page before generation. Something like counter-reset: page ${startingPageNumber} does the job. You’ll need to read each generated PDF to get its page count, then use that to calculate where the next PDF in your sequence should start.

olivias · August 10, 2025, 11:00am

You don’t need external parsing libraries - handle this right in your PDF generation loop. Just modify your document settings to include displayHeaderFooter and add the page numbering logic before generating each PDF. When you call webpage.pdf(), use headerTemplate and footerTemplate with custom HTML that includes page numbering variables. For sequential numbering, inject JavaScript into each page with webpage.evaluate() to set a global page offset variable before PDF generation. Try await webpage.evaluate((offset) => { window.pageOffset = offset; }, currentPageTotal) then reference it in your footer template as <span class="pageNumber"></span> + window.pageOffset. Keep the running total in your main loop and pass it to each webpage context. PDF generation will automatically apply correct sequential numbering - no parsing generated PDFs or extra dependencies needed. Works consistently across different webpage structures.

omarR_85 · August 9, 2025, 9:25pm

I dealt with this exact workflow last year building automated report generators. The real problem isn’t just getting page counts - it’s making the whole sequence actually work reliably.

Don’t patch Puppeteer together with PDF parsing libraries. I moved everything to Latenode and it’s way better for this:

You build a workflow that generates each PDF, pulls the page count automatically, saves it as a variable, then feeds that count to the next PDF step. No manual counter tracking or CSS injection mess.

Basically: URL input → generate PDF → extract page count → store variable → generate next PDF (with updated starting page) → merge → done.

Latenode handles all the state stuff between steps, so you don’t maintain running totals in your code. Plus it’s got built-in PDF operations that make page counting super easy.

What used to be a fragile script with tons of dependencies becomes a solid automated workflow. I’ve been running similar setups for months with zero issues.

Check it out: https://latenode.com

Harry47 · August 9, 2025, 6:38am

I’ve faced this exact problem before. Puppeteer doesn’t provide the page count from the PDF buffer it generates, so you’ll have to find a workaround. What I did was use pdf-parse to count pages after each PDF is created. Update your createPDF method to return both the buffer and page count. Generate your first PDF, run it through pdf-parse to obtain the page count, then inject custom CSS into the next webpage before conversion. This CSS trick involves using counter-reset and counter-increment properties, setting counter-reset dynamically based on the current page total. You’ll need to include a small CSS snippet for displaying the page numbers, but this way, you can fully control the sequential numbering across all your merged PDFs. In essence, maintain a running total of pages and use that to set counter-reset for each new PDF generation.

SwimmingShark · August 6, 2025, 6:17am

Don’t extract page counts after generating the PDF - calculate them beforehand instead. Use webpage.evaluate() to measure content height before calling webpage.pdf(). Try something like const contentHeight = await webpage.evaluate(() => document.body.scrollHeight) then divide by your PDF height to estimate pages. Once you’ve got the count, inject CSS variables for page numbering with counter-reset using your calculated starting number. I’ve found this way more reliable than parsing generated PDFs since you control the numbering logic from start to finish. The page estimation might be slightly off, but it’s accurate enough for sequential numbering and you won’t need extra PDF parsing dependencies.