I’m working on a web scraping project and have managed to set up Puppeteer successfully. I can currently grab all table rows from my target table like this:
const tableRows = await page.$$eval('#dataTable tbody tr', elements => elements);
The next step I need help with is accessing individual cells within each row. For every row I retrieve, I want to extract the text content from each <td> element.
What I’m trying to achieve is something equivalent to this vanilla JavaScript approach:
const cells = currentRow.querySelectorAll('td');
But I need to do this within the Puppeteer context where currentRow represents each table row element. How can I iterate through the rows and then access the cell data from each td element to get their text content?
For complex tables, try using page.$$ to grab element handles first, then process each one individually. You get way more debugging power since you can inspect elements before extracting data:
This works great with dynamic content or when you need per-row error handling. Yeah, it’s slower than single evaluation methods, but the extra control is usually worth it.
Just modify your approach to grab both rows and cells in one go. Skip getting row elements separately - use $$eval to dive into each row and pull the cell data directly:
You’ll get a 2D array where each sub-array has all the text from that row’s cells. The .trim() cleans up whitespace. I’ve used this tons when scraping messy tables - way more efficient than separate queries since it all runs in one browser evaluation.