Extracting complete HTML content after full page load in Puppeteer

lucask · April 10, 2025, 5:27am

Need help grabbing all HTML after scripts finish loading

I’m new to Node.js and just got Puppeteer working. But I’m running into a problem. When I fetch the page, I only get the basic template without the info I need.

The page I’m trying to scrape loads content dynamically after a few seconds. I want to grab the data inside specific tags once everything is fully loaded.

I’d prefer a solution using vanilla JavaScript if possible since I’m not familiar with jQuery.

Here’s what I’ve tried so far:

const puppeteer = require('puppeteer');
const cheerio = require('cheerio');

async function scrapePage() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  await page.goto('https://example.com/map?lat=123&lon=456');
  
  const content = await page.content();
  
  const $ = cheerio.load(content);
  $('strong').each((i, el) => {
    console.log($(el).text());
  });

  await browser.close();
}

scrapePage().catch(console.error);

This only gives me the default page elements in strong tags. There should be way more data - like 50+ items instead of just 10.

Any ideas on how to wait for everything to load before grabbing the HTML? Thanks!

Alex_Brave · April 16, 2025, 4:20am

hey, i had a similar issue. try adding a wait before grabbing the content:

await page.waitForSelector('.dynamic-content');
const content = await page.content();

replace ‘.dynamic-content’ with a selector that appears when everythings loaded. this should give puppeteer time for all the js to run and populate the page.

elizabeths · April 15, 2025, 1:46pm

I’ve encountered this issue before when scraping dynamic sites. One effective approach is to use Puppeteer’s page.evaluate() method to run JavaScript directly in the browser context after the page loads. This allows you to wait for specific elements or conditions before extracting content.

Here’s an example that might help:

const content = await page.evaluate(() => {
  return new Promise((resolve) => {
    const checkReady = setInterval(() => {
      if (document.querySelectorAll('strong').length > 10) {
        clearInterval(checkReady);
        resolve(document.documentElement.outerHTML);
      }
    }, 100);
  });
});

This waits until there are more than 10 ‘strong’ elements before returning the full HTML. Adjust the condition as needed for your specific case. Hope this helps!

Neo_Movies · April 14, 2025, 4:21pm

I’ve dealt with this exact problem when scraping dynamic sites. One trick that worked well for me was using Puppeteer’s waitForFunction() method. It lets you define a custom condition to wait for before proceeding.

Here’s a snippet that might help:

await page.waitForFunction(() => {
  return document.querySelectorAll('strong').length > 40;
}, {timeout: 10000});

const content = await page.content();

This waits up to 10 seconds for the page to have more than 40 ‘strong’ elements before grabbing the content. You can adjust the selector and count to match what you’re expecting on the fully loaded page.

Also, make sure your browser isn’t running in headless mode - some sites detect that and serve different content. Adding {headless: false} to your puppeteer.launch() options might help if nothing else works.