But when I try to loop through all rows, I run into issues:
for (let i = 1; i <= 10; i++) {
let link = await page.evaluate(() => {
return document.querySelector(`table tr:nth-child(${i}) td.description a`).href
})
console.log(link)
}
This throws an error saying ‘i is not defined’. What’s the right way to do this? I’m new to Puppeteer and could use some help figuring out how to properly iterate over table rows and extract links. Thanks!
Having worked extensively with Puppeteer, I can share a trick that’s served me well. Instead of looping through rows individually, you can leverage the power of Array.from() and querySelectorAll() to grab all links in one go:
This approach is not only more concise but also significantly faster as it reduces the number of context switches between Node.js and the browser. It’s been a game-changer for me when dealing with large tables.
Remember to add error handling and consider pagination if the table spans multiple pages. Happy scraping!
The issue you’re encountering is due to the scope of the variable ‘i’ in your loop. It’s not accessible within the page.evaluate() function. Here’s a more reliable approach:
This code selects all table rows, maps them to their respective links (if they exist), filters out any null values, and returns the array of links. It’s more efficient as it performs all operations in a single page.evaluate() call, reducing the number of context switches between Node.js and the browser.