How to handle non-serializable parameters with Puppeteer's exposeFunction method

I’m working with a third-party library that does DOM manipulation and parsing. The library has a function that needs the document object as input:

// Example library function
const domUtils = require('./dom-utils.js');
// domUtils.extract = (documentObj) => { return documentObj.title; }

When I try to expose this function in Puppeteer and call it from the browser context, I run into serialization issues:

const browser = await puppeteer.launch();
const pageInstance = await browser.newPage();
await pageInstance.goto('https://example.com', { waitUntil: ['domcontentloaded'] });

await pageInstance.exposeFunction('extractData', (documentObj) => {
    return domUtils.extract(documentObj);
});

const result = await pageInstance.evaluate(() => {
    return window.extractData(window.document);
});

This throws an error about circular structure in JSON conversion. The Puppeteer docs show examples with simple serializable data like strings or numbers. How can I work around this limitation when I need to pass complex objects like the document or window to my exposed Node.js functions?

you could also run the extraction logic directly in the browser with page.evaluate() instead of exposing functions. just copy your domUtils code into the evaluate block or bundle it for browser use. cuts out all the serialization hassle.

The issue arises because Puppeteer’s exposeFunction establishes a bridge between the browser and Node.js contexts, but the document object contains circular references that can disrupt JSON serialization. Instead of passing the entire document object, you’ll want to first extract the necessary details in the browser context.

Consider using this method:

const result = await pageInstance.evaluate(() => {
    const serializedDoc = {
        title: document.title,
        url: document.URL,
        innerHTML: document.documentElement.innerHTML
    };
    return window.extractData(serializedDoc);
});

Then, modify your exposed function to accept this serialized data. If your library requires a full document-like object, you might create a mock document in Node.js using jsdom to populate it with the serialized data from the browser. This way, you can maintain compatibility with your third-party library while adhering to Puppeteer’s serialization constraints.

Instead of trying to serialize the document object, consider doing the extraction work within the browser context. You can inject your DOM manipulation logic directly into the page using either addScriptTag or evaluateOnNewDocument. For third-party libraries, bundling them for browser use or rewriting the necessary pieces as browser-compatible code will help avoid the serialization bottleneck. If you must use the Node.js library, a potential workaround is to extract the page’s HTML in the browser and then recreate a proper DOM environment in Node.js with jsdom, allowing your library to operate with an actual document object without serialization issues.