I’m working on a web scraping project where I need to access a login page using Puppeteer. The issue is that after trying to navigate to the site, Chrome displays a “connection refused” error page. I need to capture the original URL that was being accessed before the error occurred.
I’ve attempted several methods to retrieve the URL but they all return the Chrome error page URL instead of the actual target URL:
I don’t think this is a timing issue because I can successfully interact with other elements on the error page, like clicking the “Details” button. This suggests the page has fully loaded when I’m trying to get the URL.
How can I retrieve the original URL that caused the connection error instead of getting the Chrome error page URL?
The response event works great for this. Chrome still creates a response object even when connections fail, and it contains the original URL you requested. Just set up a response listener before navigating:
I’ve had better luck with response listeners than request listeners. Responses give you the final URL after redirects, while requests might grab intermediate URLs. The response handler fires even on connection failures, so you get the actual target URL that caused the error instead of the chrome-error page URL.
You can catch the navigation attempt before it fails by wrapping page.goto() in a try-catch block and storing the URL first. Just catch the error while keeping the original URL intact:
const targetUrl = 'https://example.com/login';
try {
await page.goto(targetUrl);
} catch (error) {
console.log('Original URL that failed:', targetUrl);
// Handle the error page here
}
This beats event listeners since you keep direct control over the URL variable. Works great when connections are flaky and you need to retry with the same original URL.
had this issue too! i found that if you listen to the request event like page.on('request', (request) => { ... }); you can save the url before it tries to navigate. then if it fails, you still have that original url.