I’m building a web scraper with Puppeteer to automate searches on Google. Everything works fine until I encounter Google’s reCAPTCHA protection.
I don’t want to bypass the reCAPTCHA since that’s against their terms. Instead, I want to pause the automation and let myself solve the CAPTCHA manually when it appears, then continue with the scraping process.
Is there a way to detect when reCAPTCHA shows up and pause the script so I can solve it? After solving it, the script should resume automatically.
Here’s my current approach:
const fs = require('fs');
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
const searchTerms = ['term1', 'term2', 'term3'];
async function performSearch() {
for (let term of searchTerms) {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
try {
await page.goto(`https://www.google.com/search?q=${term}`);
const pageContent = await page.content();
const $ = cheerio.load(pageContent);
if (pageContent.includes('unusual traffic from your computer')) {
console.log('CAPTCHA detected - need manual intervention');
// How do I pause here for manual solving?
// Then continue after it's solved?
}
// Process results here
} catch (error) {
console.error('Error during search:', error);
} finally {
await browser.close();
}
}
}
performSearch();
I’m fairly new to Node.js automation. Any suggestions on how to implement this manual CAPTCHA solving approach would be really helpful.