Need JavaScript-capable HTML parser for Android app

I’m stuck trying to find an HTML parsing solution for my Android project. I need to scrape a website that requires login and uses lots of JavaScript and AJAX to load content dynamically.

The main challenge is that the site doesn’t show the actual data until all the JavaScript runs and AJAX calls finish. I tried JSoup first but it only grabs the basic HTML without waiting for the dynamic content to load.

Then I tested HtmlUnit which worked great on regular Java but won’t compile for Android due to dalvik conversion issues and jar conflicts.

I’m looking for suggestions on either:

  • A different HTML parser that handles JavaScript execution on Android
  • Ways to make JSoup work with dynamic content
  • Solutions to get HtmlUnit working on Android platform

This has been driving me crazy for days now. Any help would be amazing!

Using a WebView is perhaps your most effective solution. I dealt with a similar challenge recently where I needed to extract information from a JavaScript-heavy site. Instead of struggling with traditional parsers, I integrated a WebView within my app as a headless browser. This way, you can load the complete webpage, allow the JavaScript to run, and then execute custom scripts to capture the required data. Ensure to use addJavascriptInterface to facilitate communication between the WebView and your Android app. While it may not be the most intuitive approach, it proves to be reliable since it utilizes the same rendering engine as standard browsers. Additionally, by adjusting the WebView settings to disable images and other extraneous content, you can maintain a decent performance level.

I ran into something similar about six months ago and ended up using a hybrid approach that worked surprisingly well. Since you mentioned the site requires login, you might want to look into using OkHttp with cookie persistence to handle the authentication session, then combine it with a lightweight headless browser solution. There’s actually a library called Android-WebDriver that wraps Chrome’s debugging protocol - it’s much lighter than full Selenium but still executes JavaScript properly. The key insight I discovered was that many sites have mobile API endpoints that return JSON instead of rendered HTML, so before going the complex route, try intercepting the network requests in Chrome DevTools to see if you can hit those APIs directly. This approach eliminated the need for JavaScript parsing entirely in my case and made the whole solution much more stable and faster.

honestly webview might be overkill for this. have you considered using selenium with chrome driver? i know it sounds heavy but theres android ports that work pretty well. alternatively you could try making the ajax calls directly if you can figure out the endpoints from browser dev tools - often easier than dealing with js execution