Hey everyone, I’m stuck trying to find a good library for web scraping on Android. I need something that can:
Get the full webpage content without showing it on screen
Handle pages with AJAX requests
Let me use XPath or CSS selectors to grab elements
Maybe navigate pages and click buttons in the future
I’ve tried a bunch of options like Jsoup and HtmlUnit, but nothing seems to work right. Jsoup can’t handle JavaScript, and HtmlUnit is giving me headaches on Android.
Anyone know a good library that can do all this? I’m using Android Studio, but I can switch to Eclipse if needed.
I’ve been down this road before, and I can tell you from experience that headless web scraping on Android can be a real pain. After countless hours of trial and error, I found that using a combination of OkHttp for network requests and Selenium WebDriver with ChromeDriver worked wonders for me.
OkHttp handles the initial page load efficiently, while Selenium WebDriver takes care of JavaScript execution and AJAX requests. For parsing, I coupled this with JSoup, which despite its limitations with JavaScript, works great for DOM manipulation once Selenium has loaded everything.
The tricky part was setting up ChromeDriver on Android, but there are some solid guides out there. It’s not the most lightweight solution, but it’s robust and handles pretty much everything you’re looking for.
Just a heads up though, this approach can be battery-intensive, so you might want to consider offloading heavy scraping tasks to a server if possible. Good luck with your project!
Have you considered using WebView in Android for your scraping needs? It’s a built-in component that can handle JavaScript and AJAX requests seamlessly. You can load pages invisibly and execute JavaScript to interact with the DOM.
For parsing, you could combine WebView with a library like Rhino to run JavaScript on the loaded page. This would allow you to use document.querySelector or similar methods to extract data.
The main advantage is that WebView is native to Android, so you avoid compatibility issues. It’s also more lightweight than full browser automation solutions.
One caveat: be mindful of memory usage, as WebView can be resource-intensive. You might need to implement some cleanup routines to manage this effectively.
hey man, have u tried puppeteer? its pretty sweet for headless scraping. handles javascript like a champ and u can use css selectors. might be a bit tricky to set up on android but its worth a shot. i used it for a project and it worked great. good luck bro!