Puppeteer struggles with timeouts in Docker while interacting with web pages

I’m attempting to collect cookies from various websites using a Node.js script with Puppeteer. The script loads the web page, interacts with it, and saves the cookies to a specified file.
On my local machine, this works perfectly, including in headless mode. However, when I try to run it in a Docker container, it fails for certain sites. For instance, it successfully accesses Google, but it has issues with RentalCars.

The provided code is structured to load the page, handle captchas if necessary, ignore cookies, wait for a specific selector (like an input field), and save cookies to a file.

We utilize packages such as “puppeteer-extra”, “puppeteer-extra-plugin-recaptcha”, and “puppeteer-extra-plugin-stealth”.

When testing with Google, it works in both local and Docker environments in headless mode. However, with RentalCars, while it runs fine locally, it fails in Docker, as the page fails to fully load, preventing the resolution of captchas or retrieving page content using page.content(), and results in a ProtocolError: Runtime.callFunctionOn timed out error.

At times, some network requests seem incomplete, indicating inconsistency; for example, GTM scripts sometimes load and sometimes do not, suggesting potential issues with JavaScript files.

A key cookie I need is the Reese84, which is used for fingerprinting, and I’m uncertain if it’s affecting the outcome or if necessary libraries are missing in Docker. I’ve verified that all required packages listed in GitHub documentation are indeed installed within the Dockerfile.

The following versions are currently being used:

  • Puppeteer: 22.15.0
  • Node: 20.13.0
  • npm: 10.5.2
  • OS: macOS M3

    I would greatly appreciate any insights or recommendations regarding missing packages in the Docker environment.

EDIT1: I modified the navigation method to this:

  await page.goto(url, {
    waitUntil: 'networkidle2',
  });

However, it resulted in a TimeoutError: Navigation timeout of 120000 ms exceeded.

Hey DancingFox, it sounds like the issue might be with network settings or missing dependencies in your Docker container. Here are a few steps you could try:

  • Increase Timeout: Try extending the timeout for page.goto:
  • await page.goto(url, { timeout: 300000, waitUntil: 'networkidle0' });
  • Check DNS: Ensure your Docker container has the right DNS settings by adding Google's DNS:
  • RUN echo "nameserver 8.8.8.8" >> /etc/resolv.conf
  • Disable Web Security: If possible, try running Puppeteer without web security:
  • args: ['--disable-web-security']
  • CAPTCHA Solver: Ensure your CAPTCHA solver is configured correctly within Docker.
  • Check User Agent & Plugins: Sometimes, changing the user-agent or ensuring the plugins are correctly set up helps evade bot detections.

Good luck, and hope this resolves your issue!

Hey DancingFox, dealing with Puppeteer in Docker can sometimes be tricky due to environment differences. Here are some additional optimization steps you could apply:

  • Use Headless Mode Carefully: Ensure you're explicitly setting the headless mode. Try testing in { headless: false } mode within Docker to verify interactions first.
  • Install All Dependencies: Ensure all necessary libraries are installed. Include the following in your Dockerfile:
  • RUN apt-get update \
        && apt-get install -y wget gnupg \
        && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
        && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
        && apt-get update \
        && apt-get install -y google-chrome-stable
  • Adjust Resource Limits: Make sure Docker has enough resources allocated (CPU/RAM), which can sometimes cause timeouts.
  • Run with Debug Logs: Use the debug feature to get more insights:
  • PUPPETEER_EXECUTABLE_PATH=/path/to/chrome DEBUG="puppeteer:*" node your_script.js
  • Verify Network Requests: Use Puppeteer's request interception to debug problematic requests and check for blockages or redirects that could be handled differently in Docker.

These tweaks should assist in resolving the inconsistencies. Simplifying the setup to the most basic working form and iteratively adding back features can often identify specific causes for the timeout issues.