How to handle reCAPTCHA manually during automated web scraping with Puppeteer in Node.js?

I’m building a web scraper with Puppeteer to automate searches on Google. Everything works fine until I encounter Google’s reCAPTCHA protection.

I don’t want to bypass the reCAPTCHA since that’s against their terms. Instead, I want to pause the automation and let myself solve the CAPTCHA manually when it appears, then continue with the scraping process.

Is there a way to detect when reCAPTCHA shows up and pause the script so I can solve it? After solving it, the script should resume automatically.

Here’s my current approach:

const fs = require('fs');
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');

const searchTerms = ['term1', 'term2', 'term3'];

async function performSearch() {
    for (let term of searchTerms) {
        const browser = await puppeteer.launch({headless: false});
        const page = await browser.newPage();
        
        try {
            await page.goto(`https://www.google.com/search?q=${term}`);
            const pageContent = await page.content();
            const $ = cheerio.load(pageContent);
            
            if (pageContent.includes('unusual traffic from your computer')) {
                console.log('CAPTCHA detected - need manual intervention');
                // How do I pause here for manual solving?
                // Then continue after it's solved?
            }
            
            // Process results here
            
        } catch (error) {
            console.error('Error during search:', error);
        } finally {
            await browser.close();
        }
    }
}

performSearch();

I’m fairly new to Node.js automation. Any suggestions on how to implement this manual CAPTCHA solving approach would be really helpful.

I had this exact issue when scraping job boards last year. Use page.waitForNavigation() or page.waitForSelector() to detect when the CAPTCHA gets resolved. Here’s what worked for me:

if (pageContent.includes('unusual traffic from your computer')) {
    console.log('CAPTCHA detected - solve it manually');
    
    // Wait for navigation after CAPTCHA is solved
    await page.waitForNavigation({ waitUntil: 'networkidle2', timeout: 0 });
    
    console.log('CAPTCHA solved, continuing...');
}

The timeout: 0 removes the default timeout so it waits indefinitely. You can also wait for a specific element that shows up after you solve the CAPTCHA. I’d throw in some user-agent rotation and random delays between requests to hit fewer CAPTCHAs. This approach works reliably across different sites with similar protection.

I’ve had good luck combining element detection with user input prompts. I check for the reCAPTCHA iframe and Google’s blocking messages, then use readline to pause until I manually confirm it’s done:

const readline = require('readline');

async function handleCaptcha(page) {
    const captchaExists = await page.$('iframe[src*="recaptcha"]') || 
                         await page.$('form[action*="sorry"]');
    
    if (captchaExists) {
        console.log('Please solve the CAPTCHA and press Enter to continue...');
        
        const rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout
        });
        
        await new Promise(resolve => rl.question('', resolve));
        rl.close();
    }
}

You get complete control over when to resume instead of relying on automatic detection. Way more reliable than waiting for navigation changes - sometimes the URL doesn’t even change after solving the CAPTCHA.

You can also use page.waitForFunction() to watch for when the captcha disappears. I’ll do something like await page.waitForFunction(() => !document.querySelector('.captcha-container'), {timeout: 0}) - works great on most sites. Just don’t close the browser while it’s waiting or you’ll crash your script.