I’ve been using Selenium for a while but now I need a headless browser that can handle lots of instances at once. PhantomJS works okay but it’s too memory-hungry for what I want to do next.
I’m looking for something that:
Supports JavaScript, Ajax, and HTML5
Works with proxies
Uses little memory and CPU so I can run 100+ instances at the same time
Runs on Windows
It would be great if it also:
Has a C# .Net wrapper
Doesn’t need installation
Has good docs
Is Webkit-based
I’ve heard about ZombieJs and HTMLUnit but I’m not sure if they’re right for this. Has anyone tried something similar? What worked best for you?
I really need help picking the right tool for this job. Any advice would be super helpful!
hey, i’ve used playwright recently and it’s pretty solid. handles javascript and ajax well, works with proxies, and doesn’t eat up too much memory. you can run a bunch of instances without killing your machine. there’s a .net wrapper too, so it should work for your c# stuff. might be worth checking out
Have you considered Playwright? It’s a solid choice for your requirements. Supports JavaScript, Ajax, and HTML5 out of the box. Handles proxies well and is quite efficient with memory usage - you should be able to run 100+ instances without much trouble.
The big plus is it has a C# .NET wrapper, so it’ll integrate nicely with your existing setup. No installation needed beyond the NuGet package. Documentation is comprehensive and it’s based on Chromium, which is close to your Webkit preference.
I’ve used it for similar high-volume scraping tasks and it’s performed admirably. Much more reliable than PhantomJS in my experience. The API is intuitive and you can get up and running pretty quickly.
Only potential downside is it’s relatively new compared to some alternatives, but the Microsoft backing gives it a solid foundation. Definitely worth a look for your use case.
I’ve been down this road before and can share some insights. After trying various options, I found Puppeteer to be the most reliable for high-volume, multithreaded scraping tasks. It handles JavaScript and Ajax like a champ, works seamlessly with proxies, and is surprisingly light on resources.
For your 100+ instance requirement, Puppeteer really shines. I’ve run 150+ instances on a decent machine without breaking a sweat. The memory usage is way better than PhantomJS.
While it doesn’t have a native C# wrapper, you can integrate it with your .NET setup using Edge.js or similar tools. The documentation is top-notch, which made the learning curve much smoother for me.
One potential drawback is that it’s Chromium-based, not WebKit, but in practice, I haven’t found this to be an issue. The performance and stability more than make up for it.
If you’re open to exploring beyond your initial criteria, Puppeteer could be a game-changer for your project. It’s been a reliable workhorse for me in similar scenarios.