I’m working with Puppeteer for web scraping and need help extracting meta tag information. I can successfully grab page titles and paragraph text, but I’m struggling to get the content from meta tags.
Specifically, I want to extract the content attribute from meta tags like this one: <meta name="description" content="Some website description here">
Here’s my current working code that gets titles and text successfully:
you can grab meta content with await webpage.evaluate(() => document.querySelector('meta[name="description"]').getAttribute('content')) - just add this after ur existing code and ur good to go. just make sure that meta tag exists first or ur gonna get an error.
Building on what others said - I hit the same problem scraping multiple meta tags. Way more efficient to grab them all in one evaluate call instead of separate DOM queries. Here’s what works for me:
This cuts down context switches between Node and browser, which speeds things up when you’re scraping tons of pages. Just watch out - some sites use property instead of name for certain meta tags.
You need getAttribute('content') instead of textContent for meta tags since the content lives in an attribute, not as text. Here’s what to add to your code:
The null check prevents errors when the meta tag doesn’t exist. You can grab other meta tags by switching the selector - meta[property="og:title"] for Open Graph titles or meta[name="keywords"] for keywords. Works for any meta tag attribute.