Preventing Puppeteer from following redirect chains

Hey folks, I’m trying to figure out how to make Puppeteer stop after the first redirect. Right now, it seems to follow the whole chain of redirects by default.

I’m working on a project where I need to grab the HTML content of the first redirect page. But when I use the .goto() method, it keeps going until it reaches the final destination.

Is there a way to tell Puppeteer to stop after the first 3xx response? I want to be able to call page.content() and get the HTML from that initial redirect page.

Here’s a quick example of what I’m trying to do:

const browser = await puppeteer.launch();
const page = await browser.newPage();

// I want this to stop after the first redirect
await page.goto('https://example.com/redirect');

// This should give me the HTML of the first redirect page
const content = await page.content();

console.log(content);
await browser.close();

Any ideas on how to achieve this? Thanks in advance for your help!

I faced a similar issue a while back and found that enabling request interception in Puppeteer made a significant difference. In my experience, you first need to turn on request interception by calling page.setRequestInterception(true) before the navigation. Once that is set, you can attach an event listener for the ‘request’ event and then check whether the request is a navigation request that is part of a redirect chain. When you detect the first redirect, you can mark it as handled and abort that specific request so that page.goto stops following further redirects.

Below is an example:

await page.setRequestInterception(true);
let firstRedirectHandled = false;

page.on('request', request => {
  if (request.isNavigationRequest() && !firstRedirectHandled) {
    if (request.redirectChain().length > 0) {
      firstRedirectHandled = true;
      request.abort();
    } else {
      request.continue();
    }
  } else {
    request.continue();
  }
});

await page.goto('https://example.com/redirect', { waitUntil: 'networkidle0' });

This method allowed me to capture the content of the initial redirect page efficiently. I hope you find it useful.

I’ve encountered this issue before, and there’s a straightforward solution using Puppeteer’s navigation options. You can set the ‘waitUntil’ option to ‘networkidle0’ and use a custom timeout. This approach allows you to capture the content after the first redirect without following the entire chain.

Here’s how you can modify your code:

const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto('https://example.com/redirect', {
  waitUntil: 'networkidle0',
  timeout: 5000 // Adjust this value as needed
});

const content = await page.content();
console.log(content);
await browser.close();

This method has worked reliably for me in similar scenarios. The key is finding the right balance with the timeout value to ensure you capture the first redirect without waiting too long.

hey there! have u tried using the page.setMaxNavigationTimeout() method? it might help u catch that first redirect. set it to a low value like 1000ms before ur goto() call. something like:

page.setMaxNavigationTimeout(1000);
await page.goto(‘https://example.com/redirect’);

this could work for ya. lemme know if it helps!