Android Headless Browser Options

I am looking for a headless browser solution for Android similar to PhantomJS. My goal is to perform web scraping on pages that load content via JavaScript, which makes Jsoup ineffective. Although I have encountered suggestions to utilize a webview without a layout, I find it complicated and prefer to avoid a context. Additionally, it should function synchronously to seamlessly integrate with RxJava. Is there an effective solution available for this?

Hey Hazel,

Another option you might explore is Crosswalk WebView. It provides better JavaScript execution than the native webview, and you might find it easier to set up.

However, if you're looking for a more flexible alternative, consider using a cloud-based service like Headless Chrome with Puppeteer on a server. You can run it using a Node.js backend and integrate it with RxJava in your Android app for synchronous tasks.

Set up a Node.js server with:

npm install puppeteer

Then, handle requests from your Android app.

This way, you avoid complex WebView configurations and manage JavaScript-heavy pages effectively.

Hi Hazel,

For a headless browser solution on Android that suits web scraping with JavaScript-loaded content, consider using Puppeteer. Although primarily used in Node.js environments, you can leverage it with a server endpoint if direct Android execution is constrained.

Here’s how you might set it up:

  • Set up a simple Node.js server with Puppeteer to handle web scraping requests.
  • Ensure the server operates behind a secure API.
  • Use RxJava to make synchronous requests from your Android app to this server.

With this approach, you retain flexibility and avoid complex WebView configurations. It also allows you to script and automate various browser actions easily.

If you find direct headless operations imperative, consider evaluating Headless Chrome via Puppeteer mentioned, as it efficiently manages JavaScript-heavy pages.

Best regards,
David