Using Puppeteer to save complete webpage with all assets for offline viewing

I’m trying to figure out how to capture a full website using Puppeteer including all the CSS files, JavaScript code, images and other resources so I can view it offline later.

I’ve been working with Puppeteer for web scraping tasks and it works great for those. But now I need to save complete pages with everything intact.

Right now I’m only able to get the HTML content using this approach:

const pageHTML = await browser_page.content();

This method only gives me the basic HTML structure but leaves out stylesheets, scripts, and media files. The saved page looks broken when I try to open it offline.

Does anyone know if Puppeteer has built-in functionality to download entire webpages with all their dependencies? I want something similar to the “Save Page As” feature in browsers that creates a complete offline copy.

Try the CDP (Chrome DevTools Protocol) approach. Use Puppeteer’s client to send the Page.captureSnapshot command - it generates an MHTML file with all page resources bundled together. You get a single file with everything embedded, just like browser save functionality. I’ve used this for archiving dynamic content and it grabs most assets automatically without manually intercepting each request. Downside is you get MHTML instead of traditional HTML structure, but most browsers open these files directly for offline viewing. Works great for pages with complex JavaScript that changes content after loading.

I’ve hit this same issue. Intercepting network requests works best - use page.on(‘response’) to grab all resources as they load. Download each asset locally, then update the HTML to point to your saved files instead of the original URLs. The tricky bit is handling relative paths and keeping the directory structure intact. I wrote a script that builds a local mirror with proper folders for CSS, JS, and images. More work than just using content() but you get a fully functional offline copy that looks and works exactly like the original.

puppeteer’s kinda limited for this. you might wanna check out puppeteer-cluster or listen for network responses to snag all assets. it’s a bit of a hassle but can give ya a proper offline look.