How to capture streaming XHR data in real-time with Puppeteer before response finishes

I’m working on a Node.js project where I use Puppeteer to extract live data from a website. Rather than parsing DOM elements, I monitor network requests and grab the JSON responses since this gives me cleaner structured data.

Most things work fine, but I’m stuck with Firebase/Firestore requests. These requests send data in chunks over time, but I can only access the complete response after everything finishes.

When I check the Network tab in Chrome DevTools, I see the response arrives as separate numbered chunks:

22
[[456,["noop"]]]
22  
[[457,["noop"]]]
89
[[458,[{
  "targetChange": {
    "resumeToken": "CgkIabcd123efg=",
    "readTime": "2025-02-15T10:15:22.445123Z"
  }
}]]]
22
[[459,["noop"]]]

Each chunk appears every few seconds. My application needs to process these chunks immediately as they arrive, not wait for the entire request to complete.

Right now my code only gets the full response when everything is done:

page.on('response', async (res) => {
  if (res.request().resourceType() === 'xhr') {
    console.log('Firebase URL:', res.url());
    const data = await res.text();
    console.log('Complete response:', data);
  }
});

Is there a method to intercept streaming response data as it comes in rather than waiting for completion?

yea, it can be a real pain. but i think you might wanna look into using the Chrome DevTools Protocol. try page._client.send(), but fair warning, it ain’t the easiest thing to figure out.

Puppeteer’s response API doesn’t support streaming - that’s just how it is. Here’s what I did when I hit the same wall: enable request interception with page.setRequestInterception(true), then grab the intercepted request and replay it using axios or node-fetch. Pull the headers and cookies from the intercepted request, then make the same call outside Puppeteer. You’ll get full control over the response stream and can process chunks as they come in. Yeah, it’s more work since you’ve got to handle auth tokens and session cookies manually, but you’ll get that real-time access you need.

The Chrome DevTools Protocol approach is definitely worth trying. Use page._client.send('Network.enable') to access lower-level network events - specifically the Network.dataReceived event that fires for each chunk. I used this for a similar Firebase streaming scenario and it worked great. You’ll need to map request IDs to track which chunks belong to your XHR request, then parse the chunk data as it comes in. The downside? You’re using internal Puppeteer APIs that might change. But for real-time streaming data, it’s the most reliable solution I’ve found. Just handle the chunk parsing carefully - Firebase sends those numbered prefixes before each JSON payload.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.