I am having trouble with my web scraper’s performance. I use a combination of crawlee and playwright, and I only send about three requests every second along with one or two extra HTTP calls. It’s surprising that such a low load maxes out my Ryzen 5 3600. I’m not happy with these results and would appreciate any advice to improve efficiency.
In my experience with headless browsers, performance issues can sometimes arise from processes that, while seemingly low load, still involve significant overhead from rendering and initialization. I discovered that reducing the launch frequency of full browser instances and utilizing persistent sessions helped to minimize resource spikes. Careful profiling of each component of my scraping workflow was key to identifying unexpected bottlenecks. These changes, although incremental, significantly improved overall efficiency and maintained system stability during extended scraping sessions.