I’m working on a Puppeteer automation script and need help with text detection functionality. My goal is to scan the currently displayed webpage in an open Chrome browser tab for specific keywords or phrases.
Here’s what I want to accomplish:
Search for multiple target words within the active page content
If any of the search terms are found, log a success message to a results.txt file
If none of the keywords exist on the page, write “no matching terms detected” to an errors.txt file
Run this check repeatedly in a loop
I’m not trying to navigate to new URLs, just analyze whatever content is already loaded in the browser tab. What’s the best approach to implement this text searching feature with Puppeteer?
For text searching, use page.$eval() or page.evaluate() to extract page content and search through it. I’ve built similar content monitoring stuff before. Grab all visible text with document.body.innerText inside an evaluate function, then use JavaScript’s includes() to check for keywords. For file ops, Node.js fs.writeFileSync() works great for basic logging. Watch out for dynamic content - it’ll mess you up if you check too fast. I throw in a small delay before extracting text so everything loads properly. Also try page.waitForSelector() if you’re targeting specific elements. For the loop, depends what timing you need. I use setInterval() for regular checks, but setTimeout() with recursion gives you way more control over execution and error handling.
I’ve done similar monitoring tasks and honestly, Puppeteer scripts get messy fast when you’re dealing with loops, file operations, and error handling.
You’d grab text with page.content() or page.evaluate(), then search for keywords. But managing all that code plus file logging? It’s a pain.
I ended up automating the whole workflow instead. Rather than writing custom Puppeteer code, I built a monitoring system that handles browser automation, text detection, and file logging automatically.
You just set up search terms, define your output files, and let it run. No managing Puppeteer complexity, better error handling, and you can change search criteria without touching code.
I use this for monitoring competitor pages and content changes. Way more reliable than maintaining custom scripts.
The trick is combining page.$$eval() with proper content extraction. I target document.querySelectorAll('*') to grab all elements, then filter for visible text nodes. This catches stuff that innerText misses, especially in shadow DOM or dynamically loaded sections.
For your case, wrap the search logic in try-catch since page content changes unexpectedly. I check element.offsetParent to identify truly visible elements vs hidden ones.
For file operations, use fs.appendFileSync() instead of writeFileSync() if you want to keep previous results. For the loop, while(true) with await new Promise(resolve => setTimeout(resolve, interval)) gives better control than setInterval, especially with async Puppeteer operations that take different amounts of time.