Headless Browser Options for Python on Google App Engine

I have a web application built with webapp2 running on Google App Engine, and I’m using Angular.js for the frontend. The problem I’m facing is that search engine crawlers can’t properly index my site because the content is generated dynamically with JavaScript.

My plan is to implement a headless browser solution that can execute JavaScript on the server side and return the fully rendered HTML to search engine bots. This would help with SEO since crawlers would see the complete page content instead of just the initial template.

Does anyone know if there are Python headless browser libraries that work with Google App Engine’s environment? I need something that can handle JavaScript execution and DOM manipulation server-side.

If you’re searching for a solution to handle JavaScript rendering on Google App Engine, you might want to explore the option of using an external service like Prerender.io. While traditional headless browsers may struggle due to the platform’s restrictions, Prerender.io can intercept requests from crawlers and deliver fully rendered pages, alleviating SEO concerns. Although it’s not a Python library integrated into your app, the implementation can be quite straightforward. Plus, it allows you to avoid the complexities of managing headless browser dependencies directly within GAE.

I ran into this exact issue about two years ago with my Flask app on GAE. The sandbox limitations make it nearly impossible to run traditional headless browsers directly. What ended up working for me was implementing a hybrid approach using Google Cloud Functions alongside my GAE app. I created a separate Cloud Function that runs Puppeteer and handles the JavaScript rendering, then my GAE app calls this function when it detects bot traffic. The setup requires a bit more infrastructure but gives you full control over the rendering process. You can detect crawler user agents in your webapp2 handlers and route those requests through the Cloud Function while serving regular users normally. Performance wise it adds some latency but the SEO benefits made it worthwhile for my use case.

honestly GAE is pretty restrictive when it comes to headless browsers like selenium or puppeteer. most dont work because of the sandboxed enviroment. you could try using requests-html but im not sure if it still works on current GAE versions. another approach is server-side rendering with something like angular universal instead of trying to retrofit headless browsing