How to extract table data with Puppeteer by looping through rows and accessing cell content?

Extracting Table Cell Data with Puppeteer

I’m working on a web scraping project and have managed to set up Puppeteer successfully. I can currently grab all table rows from my target table like this:

const tableRows = await page.$$eval('#dataTable tbody tr', elements => elements);

The next step I need help with is accessing individual cells within each row. For every row I retrieve, I want to extract the text content from each <td> element.

What I’m trying to achieve is something equivalent to this vanilla JavaScript approach:

const cells = currentRow.querySelectorAll('td');

But I need to do this within the Puppeteer context where currentRow represents each table row element. How can I iterate through the rows and then access the cell data from each td element to get their text content?

For complex tables, try using page.$$ to grab element handles first, then process each one individually. You get way more debugging power since you can inspect elements before extracting data:

const rowHandles = await page.$$('#dataTable tbody tr');
const tableData = [];

for (const row of rowHandles) {
  const cellData = await row.$$eval('td', cells => 
    cells.map(cell => cell.textContent.trim())
  );
  tableData.push(cellData);
}

await Promise.all(rowHandles.map(handle => handle.dispose()));

This works great with dynamic content or when you need per-row error handling. Yeah, it’s slower than single evaluation methods, but the extra control is usually worth it.

Just modify your approach to grab both rows and cells in one go. Skip getting row elements separately - use $$eval to dive into each row and pull the cell data directly:

const tableData = await page.$$eval('#dataTable tbody tr', rows => {
  return rows.map(row => {
    const cells = row.querySelectorAll('td');
    return Array.from(cells, cell => cell.textContent.trim());
  });
});

You’ll get a 2D array where each sub-array has all the text from that row’s cells. The .trim() cleans up whitespace. I’ve used this tons when scraping messy tables - way more efficient than separate queries since it all runs in one browser evaluation.

if you want finer control, try evaluate with a for loop:

const data = await page.evaluate(() => {
  const rows = document.querySelectorAll('#dataTable tbody tr');
  let result = [];
  for(let row of rows) {
    result.push([...row.cells].map(cell => cell.innerText));
  }
  return result;
});

works just as well but gives you more flexibility for custom logic.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.