My site keeps getting hit by scrapers that seem to be using Chrome DevTools Protocol directly instead of the usual Selenium or Puppeteer stuff. This makes them way harder to catch since they look almost exactly like real users.
What I already tried:
- Basic browser fingerprinting methods
- Looking for weird browser behaviors
The main problems:
- These CDP-controlled browsers are super sneaky and don’t show the typical automation signs
- They don’t have the usual red flags that most bot detection looks for
My question is: Does anyone know good ways to spot these more advanced scrapers? I need some detection methods that work specifically against direct CDP usage rather than the standard automation frameworks. Any tips on what to look for or techniques that actually work against this type of scraping would be really helpful.
I’m running out of ideas here and could use some guidance from anyone who has dealt with similar issues.
Request patterns are the biggest giveaway. CDP scrapers make requests that real browsers never would - missing referer headers, skipping assets that Chrome loads automatically, weird CORS behavior without proper preflight requests. They also keep the same user-agent across sessions when real users don’t. Memory usage is another dead giveaway. CDP adds overhead that shows up in performance metrics. I track JavaScript heap sizes and garbage collection through performance observer APIs - the automation creates totally different allocation patterns compared to real browsing. Canvas fingerprinting works too if you use complex graphics operations. CDP browsers handle certain rendering tasks slightly different from standard Chrome installs.
if u inspect the CDP, u might notice they leave behind traces like window.cdc_ vars. check if navigator.webdriver is undefined too, that’s a red flag. network requests can be off too, real users have patterns, but CDP skips or messes loads. using setInterval for debugger can help since they handle breakpoints oddly.
Had the same problem last year. Timing analysis worked way better than I expected. CDP-controlled browsers have weird timing patterns that don’t match real users - especially with JavaScript execution and DOM stuff. Watch the gaps between mouse movements and clicks. Bots either have perfect timing (too perfect) or strange micro-delays that humans never make. Also check viewport behavior - CDP automation leaves fingerprints in how browsers handle viewport changes and scrolling. WebGL rendering is another good spot to look since automated browsers render graphics differently than normal sessions. You’re looking for stuff that’s technically possible but way too unlikely for real users.