Seeking lightweight browser for web scraping on Armv7 Linux

I’m trying to do some web scraping on my Armv7 Linux system. I’ve hit a wall with the usual suspects. PhantomJS threw a fit about needing a GUI. Firefox with geckodriver got lost in the woods with path issues and connection problems. Chrome headless with chromedriver? Same story as Firefox.

I’m at my wits’ end here. Does anyone know of a browser that plays nice with Armv7 for headless scraping? Or maybe there’s a totally different approach I should be taking?

I’d love to hear some ideas from folks who’ve tackled this before. Any tips or tricks would be a lifesaver. Thanks in advance for any help!

have u tried using puppeteer-core with firefox? it worked well for me on armv7. just make sure to install firefox-esr and Xvfb first. then u can run it headless. might need to fiddle with some env variables but its not too bad. lmk if u need more details!

I’ve been in your shoes before, and I feel your pain. Have you considered giving HtmlUnit a shot? It’s a Java-based ‘browser without a GUI’ that’s worked wonders for me on Armv7 systems. It’s lightweight, handles JavaScript well, and doesn’t need a display server.

Another option that’s flown under the radar is QtWebEngine. It’s a bit more involved to set up, but it’s incredibly powerful and runs smoothly on Armv7. You’ll need to compile it from source, but once it’s up and running, it’s a beast for web scraping.

If you’re open to non-browser solutions, you might want to look into using Selenium with PhantomJS. Yes, I know you mentioned issues with PhantomJS, but combining it with Selenium can sometimes work around those GUI problems. It’s worth a try if you’re out of options.

Remember, web scraping on Armv7 can be a bit of a wild west. Don’t be afraid to get creative and mix and match tools until you find a combo that works for your specific use case.

Consider using a lightweight headless browser like Splash. It’s designed specifically for web scraping and works well on resource-constrained systems like Armv7. Splash is built on top of WebKit and can handle JavaScript rendering. You can interact with it via a simple HTTP API, which makes integration straightforward.

Alternatively, you might want to explore using requests-html library. It’s a powerful tool that can handle JavaScript rendering without needing a full browser. It’s based on PyQuery and works seamlessly with Python, making it a good choice for scraping tasks.

If you’re open to non-browser solutions, you could look into using specialized scraping libraries like Scrapy. It’s highly efficient and can handle most scraping tasks without needing a full browser environment.