Seeking Python-based Headless Browser Compatible with Google App Engine

Hey everyone,

I’m working on a project that combines Angular.js on the client side with webapp2 on Google App Engine. I’ve run into a problem with SEO, though. My plan was to use a headless browser to handle the JavaScript on the server side. This way, I could serve the finished HTML to web crawlers.

The thing is, I’m not sure what options are out there for Python-based headless browsers that work with GAE. Does anyone know of a good solution? I’ve looked around but haven’t found anything that fits the bill yet.

If you’ve dealt with a similar setup or know of a tool that might work, I’d really appreciate your input. Thanks in advance for any help!

I’ve faced a similar challenge with SEO on GAE. Unfortunately, running a full headless browser on App Engine is tricky due to its restrictions. Have you considered using a third-party service like Prerender.io? It can handle the JavaScript rendering externally and serve pre-rendered content to crawlers. This approach worked well for my Angular project on GAE.

Alternatively, you could look into server-side rendering frameworks like Angular Universal. While it requires some refactoring, it can solve the SEO issue without needing a separate headless browser. It’s more work upfront but might be a more robust long-term solution.

If you really need a Python-based solution on GAE, you might want to explore lightweight alternatives like html5lib or lxml for parsing. They’re not full browsers but can handle basic HTML manipulation.

I’ve been down this road before, and let me tell you, it’s not an easy one. GAE’s limitations can be a real pain when it comes to headless browsers. One approach that worked for me was using a stripped-down version of Selenium WebDriver with ChromeDriver. It’s not perfect, but it gets the job done for basic JavaScript rendering.

Another option to consider is using a cloud function or a separate microservice for the headless browsing part. This way, you keep your GAE instance lean and offload the heavy lifting elsewhere. It adds some complexity, but it’s more scalable in the long run.

If you’re open to changing your stack a bit, you might want to look into Next.js or Nuxt.js. They offer server-side rendering out of the box and play nice with GAE. It’s a bit of a learning curve if you’re used to Angular, but the SEO benefits are worth it.

Whatever route you choose, make sure to extensively test your solution. SEO can be finicky, and you want to ensure crawlers are seeing what you expect them to see.

have u considered using phantomjs? it’s a headless browser that works with python. might be tricky to set up on GAE but could be worth a shot. another option is to use a microservice architecture - run the headless browser on a separate instance and have GAE communicate with it. just brainstorming here!