I’m trying to find a serverless web browser that works on Android devices, similar to what PhantomJS offers on desktop platforms. However, I haven’t been able to locate anything comparable for mobile development.
My main goal is to extract data from websites, but the challenge is that the page content gets dynamically generated using JavaScript. This means traditional parsing libraries like Jsoup won’t work since they can’t execute JS code.
I’ve come across some suggestions about implementing a WebView component without any UI elements, but this approach seems quite complex and requires dealing with Android context management, which I’d prefer to avoid.
Another important requirement is that the solution needs to work synchronously so I can integrate it smoothly with my RxJava workflow. Does anyone know of a reliable library or approach that could handle this scenario?
Have you tried using the Chrome DevTools Protocol with remote debugging? I developed a solution for a client looking to extract dynamic content on Android. By launching headless Chrome and communicating through CDP commands, you can spawn the browser, navigate pages, wait for JavaScript execution to complete, and then retrieve the final DOM. This method offers a lighter alternative to full WebView integration while providing effective headless functionality. Just ensure remote debugging is enabled on Chrome, and it should handle JavaScript-heavy sites efficiently. Performance is generally good for batch jobs, although individual page load times depend on JavaScript execution.
I hit the same problem last year building a data extraction tool for Android. Tried a bunch of different approaches before settling on OkHttp + a custom JavaScript engine (Rhino or Nashorn) to run the dynamic content. Not perfect, but works well enough for most sites. The trick is grabbing the initial HTML first, then finding and running whatever JavaScript actually loads the content you’re after. For RxJava sync operations, I just wrapped everything in a blocking Observable - fixed the timing headaches. Takes more setup than an off-the-shelf solution, but you get way better control and don’t have to deal with WebView mess.
honestly mate, you’re probably stuck with the headless webview approach even tho it’s complex. I’ve been down this road - there’s no magic bullet for android like phantomjs was. you could try selenium grid with appium but that’s way more overhead than you want.