My headless browser operates independently but fails when integrated as a FastAPI endpoint

I developed a class to implement a headless browser within my FastAPI application. While the class functions correctly when called directly, it encounters issues when invoked through a FastAPI endpoint.

Here is the class implementation:

from playwright.async_api import async_playwright
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class HeadlessBrowser:
    instance = None  # Class-level attribute for Playwright browser instance
    playwright_instance = None  # Class-level attribute for Playwright process

    @classmethod
    async def start_browser(cls):
        """
        Asynchronously starts the Playwright browser instance.
        """
        try:
            if cls.instance is None:
                if cls.playwright_instance is None:
                    cls.playwright_instance = await async_playwright().__aenter__()
                cls.instance = await cls.playwright_instance.chromium.launch(headless=True)
            logger.info('Browser initialized successfully.')
        except Exception as error:
            logger.error(f'Error initializing browser: {error}')

    async def fetch_page_source(self, page_url):
        """Fetches content from the specified URL.
        
        Args:
            page_url (str): URL to request.

        Returns:
            str: HTML content of the page.
        """
        if self.instance is None:
            raise RuntimeError("Browser not initialized. Use start_browser first.")

        context = await self.instance.new_context()
        page = await context.new_page()
        await page.goto(page_url)
        html_content = await page.content()
        await context.close()
        return html_content

    @classmethod
    async def initiate_and_fetch(cls, page_url):
        """Starts the browser and retrieves the HTML content from the given URL.
        
        Args:
            page_url (str): URL to browse.

        Returns:
            str: HTML content as a string.
        """
        await cls.start_browser()  # Start the browser
        browser_instance = cls()  # Create an instance
        return await browser_instance.fetch_page_source(page_url)  # Retrieve and return content

It executes successfully like this:

async def run_app():
    html = await HeadlessBrowser.initiate_and_fetch("https://example.com")
    print(html)

asyncio.run(run_app())

However, I encounter a failure when trying to use it within a FastAPI route:

from fastapi import FastAPI

app = FastAPI()

@app.get("/retrieve/")
async def fetch_item():
    html = await HeadlessBrowser.initiate_and_fetch('https://example.com')
    return html

I receive the following error:

raise RuntimeError("Browser not initialized. Use start_browser first.")
RuntimeError: Browser not initialized. Use start_browser first.

It seems that the issue arises in the asynchronous execution and lifecycle management when your HeadlessBrowser class is integrated with FastAPI. The key thing to note here is the usage of a class-level instance to manage the lifecycle of the Playwright browser, which requires adjustments when used in a FastAPI application.

Here’s how you can address the problem:

  1. Ensure Browser Initialization: As you’ve correctly identified, initializing the browser instance asynchronously is crucial. However, the manner and context in which it’s done may vary in FastAPI compared to standalone scripts.

  2. Asynchronous Lifecycle Management: FastAPI has startup and shutdown events that can be used to manage resources like a browser instance. Utilizing these events allows for the initialization and proper cleanup of the browser instance, ensuring it’s available when your API endpoints are called.

Here’s an updated approach using FastAPI’s lifecycles:

from fastapi import FastAPI
from playwright.async_api import async_playwright

app = FastAPI()

class HeadlessBrowser:
    instance = None
    playwright_instance = None

    @classmethod
    async def start_browser(cls):
        if cls.instance is None:
            if cls.playwright_instance is None:
                cls.playwright_instance = await async_playwright().__aenter__()
            cls.instance = await cls.playwright_instance.chromium.launch(headless=True)

    @classmethod
    async def stop_browser(cls):
        if cls.instance:
            await cls.instance.close()

    async def fetch_page_source(self, page_url):
        if self.instance is None:
            raise RuntimeError("Browser not initialized. Use start_browser first.")

        context = await self.instance.new_context()
        page = await context.new_page()
        await page.goto(page_url)
        html_content = await page.content()
        await context.close()
        return html_content

    @classmethod
    async def initiate_and_fetch(cls, page_url):
        await cls.start_browser()
        browser_instance = cls()
        return await browser_instance.fetch_page_source(page_url)

@app.on_event("startup")
async def on_startup():
    await HeadlessBrowser.start_browser()

@app.on_event("shutdown")
async def on_shutdown():
    await HeadlessBrowser.stop_browser()

@app.get("/retrieve/")
async def fetch_item():
    html = await HeadlessBrowser.initiate_and_fetch('https://example.com')
    return html

Explanation:

  • Startup Event: The on_startup event initializes the browser when the FastAPI app starts.
  • Shutdown Event: The on_shutdown event closes the browser, ensuring resources are released when the app stops.
  • Singleton Browser Session: This setup prevents the browser re-initialization on each GET request, optimizing resource usage.

By integrating these lifecycles properly, your headless browser solution should function seamlessly as part of your FastAPI API.