Extracting metadata using Puppeteer and Node.js

I’m trying to get metadata from websites using Puppeteer and Node.js. My code works fine for getting the title tag and text from paragraphs, but I’m stuck on how to extract the content from meta tags. Specifically, I want to get the text from the description meta tag. Here’s what I’ve got so far:

const puppeteer = require('puppeteer');

async function scrapeWebsite() {
  const browser = await puppeteer.launch({headless: true});
  const page = await browser.newPage();
  await page.goto('https://example.com', {waitUntil: 'networkidle0'});

  const siteTitle = await page.evaluate(() => document.title);
  const firstParagraph = await page.evaluate(() => document.querySelector('p').textContent);

  console.log('Title:', siteTitle);
  console.log('First paragraph:', firstParagraph);

  await browser.close();
}

scrapeWebsite();

Can someone show me how to grab the content from the description meta tag? I’ve been searching for hours and can’t figure it out. Thanks for any help!

I’ve found that using the page.$$eval() method can be quite efficient for extracting metadata. Here’s a snippet that might help:

const description = await page.$$eval(‘meta[name=“description”]’, (metas) => metas.length ? metas[0].content : null);

This approach targets all meta tags with name=‘description’, then returns the content of the first one (if it exists). It’s concise and handles cases where the tag might be missing.

For more robust metadata extraction, you could also look into the ‘metascraper’ library. It’s designed specifically for this task and can handle a wide variety of metadata formats across different websites.

I’ve encountered similar issues with scraping metadata using Puppeteer. In my experience, sometimes document.getElementsByTagName offers a more reliable way to access meta tags than document.querySelector. Here’s an alternative approach that worked for me:

const description = await page.evaluate(() => {
  const metaTags = document.getElementsByTagName('meta');
  for (let i = 0; i < metaTags.length; i++) {
    if (metaTags[i].getAttribute('name') === 'description') {
      return metaTags[i].getAttribute('content');
    }
  }
  return null;
});

This method loops through all meta tags and retrieves the content of the one whose name attribute is set to ‘description’. It has proven to be effective across various site structures. Hopefully, this alternative approach will help resolve the problem with extracting the metadata.

hey dave, i’ve had success using page.$eval() for this. try something like:

const description = await page.$eval(‘meta[name=“description”]’, el => el.content);

it’s quick n easy. just make sure to handle cases where the tag might not exist. good luck with ur scraping project!