I’m working on a web scraping project where I need to collect all possible navigation paths from a webpage without actually leaving the current page.
What I’m trying to do:
I want to grab all outbound navigation routes from a page. These could be regular anchor tags, JavaScript-triggered navigations, or form submissions (both GET and POST methods).
Previous approach with PhantomJS:
I used the onNavigationRequested
event handler which let me click on elements matching certain selectors, capture the destination URL and request details, then prevent the actual navigation from happening.
Current issue with Puppeteer:
When I try using request interception, by the time the request gets intercepted, the page has already changed. This means I’d need to navigate back to continue my analysis.
My question:
Is there a method in Puppeteer to detect when navigation is about to happen while still being on the original page, and then block that navigation? I need to stay on the current page to continue testing other elements.
Any suggestions would be really helpful. Thanks!
try using page.setRequestInterception(true)
with request.abort()
but first capture the url with page.on('request')
. i had similar issue and what worked was hooking into click events before they fire - you can inject js that overrides addEventListener for clicks and captures href targets before preventDefault stops them. bit hacky but effective for staying on same page while getting nav data.
You can achieve this by combining request interception with the page.evaluateOnNewDocument()
method to intercept navigation attempts at the browser level before they actually trigger. Set up a custom event listener in the page context that captures click events on links and form submissions, then use event.preventDefault()
to stop the navigation while still extracting the target URL information. Another approach that worked well for me was using page.on('framenavigated')
combined with page.goBack()
immediately after capturing the navigation data, though this creates a brief flicker. The cleanest solution I found was overriding the window.location
object and HTMLFormElement.prototype.submit
method within the page context using page.addInitScript()
equivalent functionality. This way you can capture all navigation attempts including JavaScript-triggered ones before they execute, giving you complete control over what actually navigates and what gets blocked.
I ran into this exact problem last year and found a reliable solution using the page.on('request')
event combined with strategic request blocking. The key is to set up request interception early and use request.url()
to capture navigation targets before calling request.abort()
. However, what made the difference was also listening to the page.on('response')
event to catch redirects that might slip through. For JavaScript navigations, I inject a script that monkey-patches window.location.assign
and window.location.replace
methods to capture the destination URLs before preventing execution. This approach handles both standard link clicks and programmatic navigation attempts. The tricky part is distinguishing between resource requests and actual navigation requests - I filter by checking if the request is for the main frame and matches navigation patterns. Works consistently across different sites without the page jumping issue you mentioned.