I am developing a web scraping tool for a personal project using Puppeteer, and while it effectively retrieves data, I’m encountering performance issues. My API endpoint currently has a response time of approximately 12 to 15 seconds, which is unacceptable.
app.get("/data", async (req, res) => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const requestedPage = req.query.page || 1;
await page.goto(`https://example-site.com/page=${requestedPage}`);
const titles = await page.$$eval(".item-title", (elements) => {
return elements.map((element) => element.textContent);
});
const episodeNumbers = await page.$$eval(".item-episode", (elements) => {
return elements.map((element) => element.textContent);
});
const images = await page.$$eval(".item-image", (elements) => {
return elements.map((element) => element.src);
});
const links = await page.$$eval(".item-link", (elements) => {
const uniqueLinks = new Set();
elements.forEach((element) => {
const link = element.getAttribute("href");
if (link) {
uniqueLinks.add(link);
}
});
return Array.from(uniqueLinks);
});
const resultData = [];
for (let index = 0; index < titles.length; index++) {
resultData.push({
title: titles[index],
episodes: episodeNumbers[index],
image: images[index],
link: links[index],
});
}
await browser.close();
res.json(resultData);
} catch (error) {
res.status(500).json({ error: "An error occurred while retrieving data." });
}
});
I initially attempted to use Cheerio but it was ineffective for dynamic content, so I transitioned to Puppeteer. While it processes data well, the delay in response is detrimental to user satisfaction. What strategies can I implement to reduce this response time?