Puppeteer: Issues with Chromium Cache using userDataDir

I’m currently attempting to implement caching in a headless Chromium session with Puppeteer, aiming to enhance page load speeds. I have designated a user data directory for persistent caching, but I’m encountering some unusual problems with cache verification.

Here’s my current code:

const puppeteer = require('puppeteer');

async function initiateTest() {
    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox'],
        userDataDir: "C:\\Users\\user\\AppData\\Local\\Chromium\\User Data"
    });
    
    const page = await browser.newPage();
    const response = await page.goto('https://example.com');
    
    console.log('Cache status:', response.fromCache());
    
    await browser.close();
}
initiateTest();

Upon running this for the first time, the initial log indicates a cache miss (false), which is expected as there’s no cached data. However, on re-initiating, it continues to show false, even as the content loads almost instantly, indicating caching might be at play.

For a different URL (like google.com), the logs indicate a cache miss both times, which confuses me since it loads quickly. Meanwhile, sites like example.com show proper cache usage with true on the second attempt.

Has anyone faced this peculiar behavior before? Is my implementation off, or is this a common issue with certain web pages?

Yeah, this is annoying but pretty normal. Chromium’s cache detection gets wonky with userDataDir sometimes - I’ve noticed it depends on how the browser session ended last time. Try adding --disable-features=VizDisplayCompositor to your args, it helped me with similar cache issues. Also, sites like Google use service workers for caching which won’t show up in fromCache() anyway.

Had the same cache issues with Puppeteer - super frustrating. The response.fromCache() method lies sometimes and doesn’t show what’s actually happening in the browser. Some sites deliberately mess with cache headers to block caching, which might explain the Google weirdness you’re seeing. What worked for me: ditch the --aggressive-cache-discard flag and use page.setRequestInterception(true) to watch network requests instead. Also check your userDataDir permissions - bad permissions can block cache writes but still allow reads from temp storage. Try clearing that directory and running the test again. If it’s still broken, the problem’s probably server-side cache controls, not your Puppeteer setup.

This happens because fromCache() only catches HTTP cache hits, not all the caching that’s actually going on. When pages load fast but the method returns false, you’re seeing DNS caching, connection reuse, or prefetching - not traditional HTTP caching. Google and similar sites use cache-control headers that block HTTP caching while still getting performance boosts from other optimizations. I’ve seen this a lot with sites serving dynamic content or ads. Want to test this? Add --disk-cache-size=0 to your launch args temporarily. If loading gets noticeably slower, caching was working despite the false readings. You can also watch the actual cache directory size before and after page loads - the files should grow if real caching happens, no matter what the API says.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.