How to Retrieve Dynamic Content in a Headless Browser with Node.js?

I am working on a Node.js application that features a homepage which loads content through an API dynamically. To optimize performance, I want to avoid fetching data every time a user visits the URL. Therefore, my goal is to generate a static HTML file containing the rendered content and save it in a specific folder for future access. Initially, I need to use a headless browser to visit the URL and extract the HTML content, storing it as a unique file (like File123.html) in my directory using Node.js. Below are my attempts to capture the dynamic HTML content:

Attempt 1:

const http = require('http');

http.get('http://localhost:3001/File123', function(response) {
    response.setEncoding('utf8');
    response.on('data', function(data) {
        console.log(data);
    });
});

Attempt 2:

const puppeteer = require('puppeteer');
(async () => {
    try {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto('http://localhost:3001/File123');
        await page.waitForSelector('html', { timeout: 3000 });
        const content = await page.evaluate(() => {
            return document.body.innerHTML;
        });
        console.log(content);
        await browser.close();
    } catch (err) {
        console.error(err);
    }
})();

Unfortunately, these methods only yield static HTML instead of the expected dynamic content. Below is the structure of HTML I typically receive:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
    <meta name="author" content="">
    ....
</head>
<body>
    <div class='wrapper'>
        <div class="container">
            Loading...
            <!-- DYNAMIC CONTENT -->
        </div>
    </div>
</body>
</html>

Could you provide guidance on how to improve my method?

To efficiently capture and store dynamic content using a headless browser with Node.js, it seems your primary challenge is ensuring that all dynamic content is fully loaded before extracting the HTML. A suitable approach involves using Puppeteer to listen for specific dynamic elements or indicators that confirm that the page content is fully rendered.

Let's refine your Puppeteer attempt by focusing on checking for a key element that typically loads last or by employing a timeout to ensure that asynchronous data fetching has concluded:

const puppeteer = require('puppeteer'); const fs = require('fs');

(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(‘http://localhost:3001/File123’);

    // Wait for a specific DOM element to load that indicates content is ready
    await page.waitForFunction(() => {
        return document.querySelector('.container') && document.querySelector('.container').innerText.trim() !== 'Loading...';
    }, { timeout: 10000 }); // time in ms

    const content = await page.content();

    // Save the content to a local file
    fs.writeFileSync('File123.html', content, { encoding: 'utf8' });

    await browser.close();
} catch (err) {
    console.error('Error capturing content:', err);
}

})();

In this example, we use page.waitForFunction() to wait until the "Loading..." text in the .container class is replaced, indicating that the dynamic content has loaded. Adjust the function's logic or timeout as necessary based on your application's behavior. Additionally, fs.writeFileSync() is utilized to save the HTML content locally, ensuring the process is synchronous and prevents further code execution before completion.