I’ve been running Puppeteer scripts for web scraping for about a year now, and I keep running into the same wall. Sites that load content dynamically—especially ones that use infinite scroll or lazy loading—just destroy my scripts. I’ll write something that works perfectly in testing, deploy it, and then two weeks later it’s completely broken because the site changed their DOM structure or added some new JavaScript framework.
I’ve tried adding waits, using better selectors, even waiting for specific network requests, but it feels like I’m always playing catch-up. The fragility is killing me. Every time a client’s website updates, I’m scrambling to fix the automation.
I’ve heard there are ways to generate Puppeteer workflows just by describing what you want in plain language, which sounds wild but honestly might be worth exploring if it means less manual tweaking. Has anyone actually gotten that to work reliably, or does it just spit out code that has the same brittleness problems?
What’s your approach to handling dynamic content? Are you just constantly maintaining scripts, or have you found something that actually sticks?
Dynamic pages are brutal because they move the goalpost constantly. You end up in this cycle where you’re always fixing selectors and timeouts.
I dealt with this same frustration until I started using Latenode’s AI Copilot to generate workflows from plain descriptions. Instead of hand-coding Puppeteer scripts and then debugging them every time a website changes, I describe the task—like “extract product names and prices from a page that loads content as you scroll”—and the copilot generates a ready-to-run workflow that handles dynamic content way better than my manual scripts ever did.
The key difference is the AI generates workflows that are more resilient to DOM changes because they understand context, not just selectors. Plus, when something does break, I just update the description and regenerate instead of diving into code.
It’s not magic, but it cuts down the maintenance burden significantly. Worth checking out: https://latenode.com
I’ve been there. The pattern I found that works is building in some flexibility upfront. Instead of relying on a single selector, I try to identify elements using multiple approaches—text content matching, ARIA labels, nth-child fallbacks. It’s more work initially but saves so much maintenance.
Also, network request listening has been a game changer for me. Instead of waiting for DOM elements to appear, I wait for the actual API calls that populate the page. That way even if they redesign the UI, my script still works because the underlying data endpoint is usually more stable.
You could also consider building observability into your scripts—log what selectors actually matched, what the page structure looked like, send that data somewhere you can review. Helps you catch breaks earlier.
Dynamic pages need a different mindset than static ones. I started treating my Puppeteer scripts like they’re on borrowed time and built them assuming failure. That means adding retry logic with exponential backoff, multiple selector fallbacks, and explicit checks for whether the data actually loaded before trying to extract it.
One thing that helped was moving away from overly specific CSS selectors. Instead of targeting .product-item-container .title-text h2, I started using XPath or searching for text content directly. Less brittle, more adaptable when markup changes.
For infinite scroll specifically, I use page.evaluate() to repeatedly scroll and wait for new items to load, checking the DOM count each time. It’s cleaner than fighting with scroll events.
The brittleness problem stems from over-coupling your automation to the current DOM structure. What you need is abstraction between your selectors and your logic. I typically create a layer that defines multiple ways to find the same element, and the script tries them in order until one succeeds.
For dynamic content specifically, mutation observers or page.on(‘framenavigated’) events let you respond to page changes in real time rather than guessing when things are loaded. You could also use the Performance Observer API to wait until specific metrics are hit, which is more reliable than hardcoded timeouts.
The reality is that any Puppeteer script needs to be treated as maintenance-intensive. The question is whether you build the maintenance into the architecture upfront or deal with it reactively.
Use waitForNavigation or waitForFunction instead of static waits. Network-stable approach works better—wait for API calls rather than DOM elements. Combine multiple selector types as fallbacks. Also consider headless browser recorder tools to capture interaction patterns, not just selectors.