Headless browser functionality breaks when integrated into FastAPI endpoint

ameliat · April 18, 2025, 4:18am

I’ve got a weird issue with my headless browser setup. It’s working fine on its own, but it’s acting up when I try to use it in a FastAPI endpoint. Here’s what’s going on:

I created a new class called WebScraper that uses Playwright for headless browsing. When I run it directly, everything goes smoothly:

async def test_run():
    data = await WebScraper.fetch_and_parse('https://google.com')
    print(data)

asyncio.run(test_run())

However, when I embed it within a FastAPI endpoint like this:

@app.get('/scrape/')
async def scrape_page():
    data = await WebScraper.fetch_and_parse('https://google.com')
    return data

I encounter the following error:

RuntimeError: Browser not initialized. Call init_browser first.

I am puzzled why it works perfectly in isolation but fails in the FastAPI environment. Does anyone have insights on resolving this asynchronous issue?

ClimbingLion · April 28, 2025, 12:33pm

hm, seems like ur browser isn’t initializing properly in fastapi. try moving the browser setup to a startup event handler:

@app.on_event(“startup”)
async def startup_event():
await WebScraper.init_browser()

this should ensure the browser’s ready b4 any requests come in. lmk if that helps!

Tom01_Wonder · April 27, 2025, 11:42pm

I’ve faced this issue before when working with headless browsers in FastAPI. The problem likely stems from the browser’s lifecycle not aligning with FastAPI’s asynchronous nature. A solution that worked for me was implementing a connection pool for browser instances.

Here’s a rough idea of how you could modify your WebScraper class:

from playwright.async_api import async_playwright
import asyncio

class WebScraper:
    _pool = None
    _semaphore = None

    @classmethod
    async def get_browser(cls):
        if cls._pool is None:
            cls._pool = []
            cls._semaphore = asyncio.Semaphore(5)  # Adjust pool size as needed
        async with cls._semaphore:
            if not cls._pool:
                async with async_playwright() as p:
                    browser = await p.chromium.launch()
                    cls._pool.append(browser)
            return cls._pool.pop()

    @classmethod
    async def release_browser(cls, browser):
        cls._pool.append(browser)

    @classmethod
    async def fetch_and_parse(cls, url):
        browser = await cls.get_browser()
        try:
            page = await browser.new_page()
            await page.goto(url)
            # Your scraping logic here
            return data
        finally:
            await cls.release_browser(browser)

This approach maintains a pool of browser instances, which should resolve the initialization issues while being more efficient than creating a new browser for each request.

Pete_Magic · April 27, 2025, 9:23am

I’ve encountered a similar issue before, and it’s likely related to how FastAPI handles the event loop. The problem might be that the browser initialization is tied to a specific event loop, which isn’t carried over to the FastAPI context.

One approach that worked for me was to initialize the browser for each request, rather than trying to maintain a single instance. You could modify your WebScraper class to create a new browser instance for each fetch_and_parse call:

class WebScraper:
    @staticmethod
    async def fetch_and_parse(url):
        async with async_playwright() as p:
            browser = await p.chromium.launch()
            page = await browser.new_page()
            await page.goto(url)
            # Your scraping logic here
            await browser.close()
            return data

This way, you’re creating a fresh browser instance for each request, which should resolve the initialization issue. It might be slightly slower, but it ensures each request has its own isolated browser context.