I have the task of accessing approximately 100,000 URLs daily to ensure that images and HTML are cached in Cloudflare, as the content updates quite often.
I have a feeling that Curl might be quicker than using a headless browser, like Chrome headless with Puppeteer.
Has anyone had experience with this scenario or can suggest more efficient methods?
In my experience, Curl is indeed faster for straightforward requests, as it is designed for non-rendering tasks, such as just fetching HTTP headers and content, making it more efficient when rendering is not required. However, if you do need to verify JavaScript execution or images that load through JS, a headless browser could be more appropriate albeit slower. For 100,000 URLs, scripting with a combination of Curl, potentially with parallel execution, will likely save you significant time and resources.
I think using curl would definitely be faster. Headless browsers tend to take more resources because they render the page while curl just fetches the raw HTML and HTTP headers without any rendering. It’s more lightweight, so that might suit your needs better for checking caches.