I’ve been using Selenium for a while but now I need something that can handle hundreds of instances at once. PhantomJS works okay but it’s too memory-hungry for what I want to do.
I’m looking for a headless browser that:
- Supports JavaScript, Ajax, and HTML5
- Works with proxies
- Uses little memory and CPU (I want to run 100+ instances at the same time)
- Runs on Windows
It would be great if it also:
- Has a C# .NET wrapper
- Doesn’t need installation
- Has good docs
- Is Webkit-based
I’ve heard about ZombieJs and HTMLUnit but I’m not sure if they’re the best options. Has anyone tackled a similar project? What would you recommend for this kind of large-scale multi-threading setup?
Thanks for any advice!
I’ve been in a similar situation and found that Puppeteer-Sharp might be a good fit for your requirements. It’s a .NET port of Puppeteer and is known for its efficiency and lightweight nature. It covers your needs by offering full support for modern web technologies, working well with proxies, and being memory-efficient enough to run multiple instances on Windows. The availability of a C# .NET wrapper and extensive documentation was particularly valuable. Although it requires some setup and is Chromium-based rather than WebKit, this minor drawback is outweighed by its overall performance. If you are comfortable with C#, Puppeteer-Sharp could be an excellent solution for your project.
I’ve had success using Playwright for similar large-scale scraping projects. It’s cross-browser compatible, supports JavaScript and modern web features, and is designed for multi-threading. Memory usage is quite efficient compared to Selenium. The C# bindings are solid and well-documented. One caveat is it does require installation, but the setup process is straightforward. In my experience, it handled 100+ concurrent instances without issues on a decent machine. The ability to use different browser engines (Chromium, Firefox, WebKit) is also handy for avoiding detection. Might be worth evaluating for your use case.
hey, have u tried puppeteer? it’s pretty lightweight and can handle lots of instances. works great with proxies too. i use it for similar stuff and it’s been solid. only downside is no c# wrapper, but the node.js api is easy to work with. might be worth checkin out!