Hey everyone! I’m trying to figure out how to make Puppeteer visit a series of URLs one after another. Here’s what I’ve got so far:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('example.com/page-0');
await page.waitForSelector('.content-box');
const element = await page.$('.content-box');
const text = await page.evaluate(el => el.textContent, element);
console.log(text + ' - section');
} catch (err) {
console.error('Oops, something went wrong:', err);
}
})();
I want to loop through URLs like:
example.com/page-0
example.com/page-1
example.com/page-2
and so on…
Any ideas on how to make this work? Thanks in advance for your help!
I’ve tackled this issue before in my automation projects. One approach that’s worked well for me is using a generator function to yield URLs. It’s especially handy when you’re dealing with a large number of pages or don’t know the exact count upfront. Here’s a snippet that might help:
function* urlGenerator() {
let i = 0;
while (true) {
yield `example.com/page-${i}`;
i++;
}
}
(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
const gen = urlGenerator();
for (let i = 0; i < 10; i++) { // Adjust the number as needed
const url = gen.next().value;
try {
await page.goto(url);
// Your existing code for scraping
console.log(`Processed ${url}`);
} catch (err) {
console.error(`Error processing ${url}:`, err);
break; // Stop if we hit an error, assuming no more pages
}
}
await browser.close();
})();
This method gives you flexibility to keep going until you hit an error or a specific condition. It’s been quite reliable in my experience.
hey mike, i’ve done something similar before. you could use a for loop to iterate through the urls. something like:
for (let i = 0; i < 3; i++) {
await page.goto(`example.com/page-${i}`);
// rest of your code here
}
hope that helps! let me know if u need more info
I’ve encountered a similar challenge in my projects. One efficient approach is to use an array of URLs and map through them. Here’s a modification to your code that should work:
const urls = ['example.com/page-0', 'example.com/page-1', 'example.com/page-2'];
(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
for (const url of urls) {
try {
await page.goto(url);
await page.waitForSelector('.content-box');
const element = await page.$('.content-box');
const text = await page.evaluate(el => el.textContent, element);
console.log(`${text} - section (${url})`);
} catch (err) {
console.error(`Error processing ${url}:`, err);
}
}
await browser.close();
})();
This method allows for easy URL management and error handling per page. You can expand the urls array as needed.