Automating form submission with headless browsing

Hey everyone, I’m stuck with a tricky situation. I need to register users through a headless browser because the platform doesn’t have an API. It’s a bit of a hassle.

I’ve got it working with Symfony, Selenium, and PHPUnit, but keeping Selenium running all the time doesn’t seem like the best solution. I’ve tried using Symfony’s Process class to start Selenium on demand, but it’s not firing up when I need it.

Here’s a snippet of what I’m trying to do:

public function autoFillForm($data, $targetUrl, $formClass)
{
  $seleniumProcess = new Process('xvfb-run -a selenium-standalone start');
  $seleniumProcess->start();

  // Wait for Selenium to start
  sleep(3);

  if (!$seleniumProcess->isRunning()) {
    throw new Exception('Failed to start Selenium');
  }

  // Rest of the form filling logic here
  // ...

  // Clean up
  $seleniumProcess->stop();
}

Does anyone have ideas on how to make this work better? Or maybe there’s a smarter way to approach this whole thing? I’m all ears for suggestions!

hey liam, have u tried using phantomjs? its pretty good for headless browsing n doesnt need selenium. u can run it directly with php using symfony’s process. heres a quick example:

$phantomjs = new Process('phantomjs script.js');
$phantomjs->run();

just put ur form filling logic in script.js. way simpler than dealing w/ selenium IMO

Having worked on similar projects, I’d suggest considering a different approach altogether. Instead of running a headless browser for each form submission, you might want to explore using a library like Guzzle for HTTP requests. It’s much lighter and faster than browser automation.

You could simulate the form submission by reverse-engineering the request the form makes. Use browser dev tools to inspect the network traffic when submitting the form manually. Then replicate that request in your PHP code.

This method is more efficient and less prone to breaking. It also eliminates the need for maintaining browser instances. However, be aware that some sites might have anti-bot measures in place. In such cases, you may need to add appropriate headers or handle CAPTCHAs.

Remember to respect the site’s terms of service and rate limits when implementing this solution.

I’ve tackled similar challenges before, and I can tell you that running Selenium on-demand can be quite finicky. Instead of using Selenium, have you considered using a headless browser like Puppeteer or Playwright? They’re much easier to integrate and manage programmatically.

In my experience, Playwright with PHP has been a game-changer for automating form submissions. It’s more lightweight, faster, and doesn’t require a separate server like Selenium. You can start and stop browser instances on the fly, which sounds like exactly what you need.

Here’s a rough idea of how you could structure it:

use \Microsoft\Playwright\Playwright;

public function autoFillForm($data, $targetUrl, $formClass)
{
    $playwright = Playwright::create();
    $browser = $playwright->chromium()->launch();
    $page = $browser->newPage();
    $page->goto($targetUrl);

    // Fill form logic here
    // ...

    $browser->close();
}

This approach has been much more reliable for me in production environments. It might be worth giving it a shot if you’re open to exploring alternatives.