I’m attempting to retrieve information from a page that loads its content dynamically. For this purpose, I am utilizing the headless browser puppeteer.
In the code, puppeteer acts as the headlessBrowserClient.
The primary difficulty lies in ensuring that the browser closes properly once the required data is collected. However, if I terminate the browser before the evaluateCustomFunction completes its execution, I’ll lose the progress of this function.
The evaluateCustomFunction operates similarly to how we execute code in Chrome’s Developer tools.
To manage network requests and the asynchronous flow of the puppeteer API, I implement an async generator to encapsulate all relevant logic.
I suspect that my code may be poorly designed, but I’m struggling to find a more effective alternative.
Any suggestions?
module.exports = function createClient(headlessBrowserClient) {
const fetchPageData = async (url, evaluateCustomFunction) => {
const request = initiateRequest(url);
const { value: page } = await request.next();
if (page) {
const content = await page.evaluate(evaluateCustomFunction);
request.next();
return content;
}
};
async function* initiateRequest(url) {
const browserInstance = await headlessBrowserClient.launch();
const pageInstance = await browserInstance.newPage();
const requestDetails = { req: { url } };
try {
await pageInstance.goto(url);
yield pageInstance;
} catch (error) {
throw new APIError(error, requestDetails);
} finally {
yield browserInstance.close();
}
}
return {
fetchPageData,
};
}