What is the best way to generate HTML snapshots of an AJAX application using PHP and a headless browser?

HappyDancer99 · December 31, 2024, 9:55am

I’m struggling to find a method to utilize a headless browser for generating static HTML snapshots of a JavaScript-based site that employs AJAX techniques (specifically using sammy.js for content delivery). I have been following Google’s guidelines for making AJAX applications crawlable, which is mostly clear and effective, especially regarding the handling of ?escaped_fragment URLs. Since server-side templating is predominantly in use, I considered creating a PHP script that replicates the regex routing logic found in the sammy.js application. However, this would require duplicating the JavaScript functionality in PHP, leading to maintenance challenges between both languages. I’ve read that headless browsers can render pages and execute JavaScript, providing the entire DOM for Googlebot as HTML. Yet, I cannot find detailed guidance on executing headless browsers through PHP, which makes me feel lost in my search. I’m curious if setting up and utilizing a headless browser to generate these HTML snapshots involves significant effort, and whether it’s ultimately a worthwhile endeavor. Any resources or guidance would be highly appreciated! Thank you!

ClimbingLion · January 6, 2025, 2:46am

Hey HappyDancer99,

To generate HTML snapshots from AJAX applications via PHP, use a headless browser like Puppeteer. Even though Puppeteer is primarily a Node.js library, you can execute Node scripts within PHP using shell commands.


<?php
$command = 'node generateSnapshot.js your_url';
$output = shell_exec($command);
echo $output;
?>

Write a generateSnapshot.js script with Puppeteer to open the page and save the DOM as HTML:


const puppeteer = require('puppeteer');

(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(process.argv[2], { waitUntil: ‘networkidle0’ }); const content = await page.content(); console.log(content); await browser.close(); })();

This setup lets you keep JavaScript and server logic separate, generating HTML snapshots efficiently.