Submitting a Form Using a Headless Browser in Node.js

I am dealing with an outdated tech stack where form submission returns HTML. The critical information I need is populated by a portlet on my client’s site, but it doesn’t appear when I test the HTML locally. Hence, I can’t just send data directly to the form endpoint. Instead, I need to use a headless browser to submit the form and then extract information from the resulting success or failure page. I plan to create an API endpoint in my Node.js app to accept the form data, submit it via the headless browser, and return the required scraped data. Are there any suitable frameworks available for this purpose? I’ve explored options like Nightwatch and Web Driver, but they seem more focused on automated testing rather than my specific needs.

Hey there! Sounds like an interesting challenge. For submitting forms and scraping data with a headless browser in Node.js, I'd recommend using Puppeteer. It's a great tool for this purpose as it provides a high-level API over the Chrome DevTools Protocol and can run a Chrome instance headless (without GUI).

Here's a quick setup to get you started:

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(‘your_form_url’);

// Fill the form
await page.type('#inputSelector', 'yourData');
// Submit the form
await page.click('#submitSelector');

// Wait for navigation
await page.waitForNavigation();
// Scrape the required data
const result = await page.evaluate(() => {
    return document.querySelector('#targetElement').innerText;
});

console.log(result);
await browser.close();

})();

This should help you submit forms and extract data from pages that render content dynamically.

Hi DancingBird,

Based on your requirements, an effective solution would be utilizing Puppeteer for interacting with the forms and extracting data. Puppeteer is particularly well-suited for handling scenarios where content is loaded dynamically via JavaScript, similar to what you’re describing with your portlet.

Here's how you can implement this in your Node.js app:

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch({ headless: true }); // Run in headless mode
const page = await browser.newPage();
await page.goto(‘your_form_url’);

// Fill in the form
await page.type('#inputSelector', 'yourData');

// Click submit
await page.click('#submitSelector');

// Ensure the page reloads to the expected state
await page.waitForNavigation();

// Extract the desired information
const scrapedData = await page.evaluate(() => {
    return document.querySelector('#targetElement').innerText;
});

console.log(scrapedData);
await browser.close();

})();

This code sets up Puppeteer to automate the form submission and data extraction process efficiently. Keeping it headless boosts performance, and the high-level API simplifies interaction with the Chrome browser instance. This approach aligns perfectly with creating an API endpoint in your Node.js application that handles this task efficiently.