Puppeteer: Managing tab lifecycle for sequential form submissions and data extraction

I’m working with web scraping on a .NET application using Puppeteer. My workflow involves a repetitive process where I need to:

  1. Fill out a form and submit it
  2. Navigate to results page
  3. Parse data from a table on that page
  4. Go back to the original form and repeat with new parameters

The tricky part is that the results page uses the same URL each time, so I need to make sure each cycle completes fully before starting the next one. I can handle opening new pages with browser.on('targetcreated') and extracting the data, but I’m struggling with the synchronization part.

How can I make my code wait for a tab to close completely before moving on to submit the form again with different values? This seems like it might be a broader JavaScript async/await question.

Here’s my current implementation that handles the form submission and checks whether data appears immediately or requires opening a new page:

async function processDataAnalysis(currentPage, recordId, xValue, yValue) {
    const WAIT_TIMEOUT = 90000; // 90 seconds max wait
    const xInput = await currentPage.$(SELECTORS.analysis_x_field);
    await xInput.type(xValue[1])
    const yInput = await currentPage.$(SELECTORS.analysis_y_field);
    await yInput.type(yValue[1])
    await currentPage.click(SELECTORS.calculate_button);
    await currentPage.waitForSelector(SELECTORS.loading_spinner, { timeout: WAIT_TIMEOUT, hidden: true });

    // check if results appear inline
    var gridSelector = null;

    if (await currentPage.$(SELECTORS.inline_results_grid) !== null) {
        console.log("Results appeared inline");
        await currentPage.screenshot({ path: './screenshots/Results: '+xValue[1]+' VS '+yValue[1]+'.png' });
        var gridSelector = SELECTORS.inline_results_grid;
    } else {
        console.log("Results require new page");
        await currentPage.click(SELECTORS.open_results_link);
        console.log("Link clicked");
        return;
    }

    const extractedData = await utils.parseTableToJson(currentPage, gridSelector);
    await db.query('INSERT INTO analysis_db.results_table ( record_id, x_param, y_param, data_json ) VALUES (?,?,?,?)', [ recordId, xValue[1], yValue[1], JSON.stringify(extractedData) ], function (error, results, fields) {
        if (error) throw error;
    });
    console.log("Data saved successfully");
}

try page.waitForNavigation() with waitUntil: ‘networkidle0’ after clicking the results link. this waits until there’s no network activity for 500ms - usually means the page is fully loaded. then extract your data and close the tab manually. i’ve found this works way better than event listeners since it actually waits for content to be ready instead of just detecting tab open/close events.

Had the same issue scraping a financial site that kept reusing URLs. Here’s what fixed it for me: treat the new tab like a promise that resolves when it closes, don’t poll for completion. I wrapped the whole new tab workflow in a promise using the browser’s targetdestroyed event. When your code detects results need a new page, create a promise that listens for target destruction, then await it before hitting the next form. The pattern’s pretty simple - click the results link, create the promise, store the target reference, and resolve when that target gets destroyed. Your main loop waits for complete tab closure before moving on. Way more reliable than setTimeout polling or trying to detect page changes.

You’ve got a race condition problem. Skip the event listeners for tab creation - they’re unreliable. Instead, use page.evaluate() to check when the results page actually loads its content. Click your results link, grab the new page reference with browser.pages(), then use page.waitForFunction() to watch for the specific DOM changes that show processing is done. Once you’ve extracted what you need, call page.close() and await it. This way you control the tab lifecycle directly instead of hoping the browser handles it. I’ve used this approach before with similar form workflows and it’s way more predictable than event-based solutions.

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.