I’m looking for a way to do web scraping on Android without showing the page. I need something that can:
- Get the full page content including stuff loaded by JavaScript
- Let me use XPath or CSS selectors to find things on the page
- Maybe click buttons or follow links later on
I tried a bunch of things like Jsoup and HtmlUnit but they didn’t work out. Jsoup can’t handle JavaScript and HtmlUnit was a pain to set up on Android.
Does anyone know a good library or method for this? I’m using Android Studio but can switch to Eclipse if needed.
Thanks for any ideas!
I’ve been down this road before, and I found that using a combination of OkHttp and Rhino worked wonders for headless scraping on Android. OkHttp handles the network requests efficiently, while Rhino lets you execute JavaScript within your app.
Here’s the kicker: you can use Rhino to evaluate the page’s JavaScript, effectively rendering the dynamic content without a visible browser. Then, you can use a library like JSoup to parse the fully-rendered HTML.
For interacting with the page, I wrote custom JavaScript functions that I executed through Rhino. It takes some setup, but it’s incredibly flexible once you get it running. Plus, it’s all native to Android, so you don’t have to worry about compatibility issues.
Just be prepared for a bit of a learning curve, especially when dealing with more complex sites. And always respect robots.txt and website terms of service when scraping!
Have you considered using Selenium with ChromeDriver for Android? It’s a powerful combo that can handle JavaScript-rendered content and supports XPath/CSS selectors. You’ll need to set up an Android WebView and use Selenium’s Android driver, but it’s quite flexible once configured.
Another option worth exploring is Puppeteer for Android. While it’s primarily for Node.js, there are Java wrappers available that might work for your Android project. It excels at headless browsing and can handle complex JavaScript interactions.
Both solutions require some setup, but they’re robust for advanced scraping needs. Just be mindful of performance on mobile devices and consider running heavier operations server-side if possible.
hey, have u looked into WebView with JavaScript enabled? it’s built into android and can handle js-loaded content. pair it with some parsing libs like jsoup for the xpath/css selector stuff. might need to fiddle with it a bit, but could work for what ur after without too much hassle