Puppeteer: Execution context destroyed, likely due to navigation

I’m encountering an issue with Puppeteer while iterating through a for loop. When I attempt to navigate to a new page to collect data, and then return to the previous page, I receive the following error message:

Error: The execution context was destroyed, likely due to a navigation.

My goal is to scrape a directory page that lists 15 companies per page and gather information for each company. Here’s an example of my code:

try {
    const browser = await puppeteer.launch({
        headless: false,
        devtools: true,
        defaultViewport: { width: 1100, height: 1000 }
    });

    const page = await browser.newPage();
    await page.goto('YourLink');

    await page.waitForSelector('.company-list');

    for (let i = 0; i < 10; i++) {
        const companyList = await page.$$('.company-list > div.company');

        for (const company of companyList) {
            const companyName = await company.$eval('.details > h3 > a', el => el.innerText);
            const companyURL = await company.$eval('.details > h3 > a', el => el.href);

            await Promise.all([
                page.waitForNavigation(),
                page.goto(companyURL),
                page.waitForSelector('.company-info'),
            ]);

            const companyInfo = await page.$eval('#company-information', el => el.innerText);

            const companyData = [{
                name: companyName,
                info: companyInfo,
            }];

            await page.goBack();
        }
        await Promise.all([
            page.waitForNavigation(),
            page.click('span.pagination > a[rel="next"]')
        ]);
    }
} catch (error) {
    console.log('An error occurred', error);
}

Currently, I am only able to retrieve data for the first company.

Hey Alice45, the issue arises from racing conditions between the navigation actions and the context switch. To solve it, try using waitUntil: 'networkidle2' for more reliable navigation. Adjust your await Promise.all like this:

await Promise.all([
    page.goto(companyURL, { waitUntil: 'networkidle2' }),
    page.waitForSelector('.company-info')
]);

// After goBack, ensure the page is fully loaded.
await page.goBack({ waitUntil: 'networkidle2' });

This should help by making sure navigations complete before moving on.

Alice45, besides the useful advice from Bob_Clever regarding synchronization, you might want to consider managing asynchronous operations more effectively by introducing a slight delay after each page.goBack() to ensure the page is fully settled before interacting again. Here's how you can modify your approach:

const waitForTimeout = ms => new Promise(resolve => setTimeout(resolve, ms));

try {
    const browser = await puppeteer.launch({
        headless: false,
        devtools: true,
        defaultViewport: { width: 1100, height: 1000 }
    });

    const page = await browser.newPage();
    await page.goto('YourLink', { waitUntil: 'networkidle2' });
    await page.waitForSelector('.company-list');

    for (let i = 0; i < 10; i++) {
        const companyList = await page.$$('.company-list > div.company');

        for (const company of companyList) {
            const companyName = await company.$eval('.details > h3 > a', el => el.innerText);
            const companyURL = await company.$eval('.details > h3 > a', el => el.href);

            await Promise.all([
                page.goto(companyURL, { waitUntil: 'networkidle2' }),
                page.waitForSelector('.company-info'),
            ]);

            const companyInfo = await page.$eval('#company-information', el => el.innerText);

            const companyData = [{
                name: companyName,
                info: companyInfo,
            }];

            // Log the company data or process it as needed

            await page.goBack({ waitUntil: 'networkidle2' });

            // Adding a short delay
            await waitForTimeout(500); // 500 milliseconds
        }

        await Promise.all([
            page.waitForNavigation({ waitUntil: 'networkidle2' }),
            page.click('span.pagination > a[rel="next"]')
        ]);
    }
} catch (error) {
    console.log('An error occurred', error);
}

This implementation uses a waitForTimeout function, ensuring there's a brief pause which can often help with scripts racing ahead of page loads. Also, ensure that your page click actions within paginations and back navigations consistently use waitUntil: 'networkidle2' to ensure that your interactions only proceed when the network is idle, which helps avoid race conditions.

Hey Alice45, it looks like you are facing race condition issues with your current implementation. Here are some steps you can take to address the navigation timing problems:

  1. Use waitUntil: 'networkidle2' for Navigation: This ensures your scripts wait until the page is almost fully loaded, reducing race conditions during navigation:
  2. await Promise.all([
            page.goto(companyURL, { waitUntil: 'networkidle2' }),
            page.waitForSelector('.company-info'),
        ]);
    
    // Navigate back using waitUntil
    await page.goBack({ waitUntil: 'networkidle2' });
    </code></pre>
    
    <li><strong>Add Small Delays:</strong> Introducing a delay can stabilize actions especially after <code>page.goBack()</code>. You can define a timeout function:</li>
    <pre><code>const waitForTimeout = ms => new Promise(resolve => setTimeout(resolve, ms));</code></pre>
    
    <li><strong>Ensure Consistent Waiting:</strong> Always incorporate waiting strategies such as <code>waitForSelector</code> and consider waiting after clicking pagination links.</li>
    

Applying these changes should improve your script’s reliability, allowing you to gather data from multiple companies without error interruptions. Adjusting these async actions can significantly improve the efficiency and smooth execution of your web scraping process.