What is your favorite approach to web scraping: using a headless browser or accessing private APIs?

CreativeArtist88 · January 15, 2025, 8:18pm

Hello, I previously relied on headless browsers for scraping, but I faced issues like excessive memory consumption and significant delays. Writing the necessary code was frustrating, so I’ve transitioned to using an HTTP client, particularly favoring a combination of Node.js with axios, axios-cookiejar-support, and cheerio libraries. This allows me to either retrieve the raw HTML content or interface with private APIs that many modern sites provide, often in JSON format. I’m curious about the community’s preferences: how many of you utilize headless browsers compared to private APIs? Personally, I predominantly opt for private APIs and avoid headless browsers completely.

Alice45 · January 24, 2025, 9:24am

hey, both have their pros & cons, but I’ve found using frameworks like Puppeteer helps a ton with headless browsing. it makes automating interactions smoother, plus it’s good for sites with heavy js. but yeah, i agree APIs are less resource-heavy. it really depends on the site’s structure.