Issues with Puppeteer and proxy integration - seeking alternatives for LinkedIn data scraping

TomDream42 · April 13, 2025, 5:32am

I’m pulling my hair out trying to get Puppeteer to work with a proxy service. I’ve tried the example code from the proxy provider’s docs, but it keeps failing with WebSocket errors. Here’s a simplified version of what I’m dealing with:

const puppeteer = require('puppeteer');

async function scrapeWithProxy() {
  const browser = await puppeteer.launch({
    args: ['--proxy-server=proxy.example.com:8080'],
    headless: false,
  });

  const page = await browser.newPage();
  await page.authenticate({ username: 'user', password: 'pass' });
  
  try {
    await page.goto('https://example.com');
    const content = await page.content();
    console.log(content);
  } catch (error) {
    console.error('Scraping failed:', error);
  }

  await browser.close();
}

scrapeWithProxy();

This keeps throwing errors, and I’m at my wit’s end. Does anyone have experience with a proxy service that plays nice with Puppeteer, especially for LinkedIn scraping? Or maybe there’s a better way to get public LinkedIn data without all this proxy hassle? Any tips would be super helpful!

WhisperingWind · April 21, 2025, 7:59am

I have encountered similar challenges with using Puppeteer alongside proxies. In my experience, transitioning to Selenium with Python has significantly reduced connection errors. Using a tool like selenium-wire, which provides better control over network traffic, allowed me to handle proxy settings more seamlessly while scraping dynamic content. This method proved reliable, especially when targeting sites like LinkedIn, where data loads dynamically. If you still prefer a JavaScript solution, you might experiment with Playwright, which often offers improved proxy handling and stability.

mikechen · April 21, 2025, 3:58am

I’ve been down this road before, and I feel your frustration. After countless hours of battling with Puppeteer and proxies, I stumbled upon a game-changer: Bright Data’s Web Unlocker. It’s specifically designed for LinkedIn scraping and handles all the proxy rotation and browser fingerprinting for you. No more WebSocket errors or constant tweaking of settings.

The beauty of it is that you can use it with plain HTTP requests, which simplifies your code dramatically. Plus, their documentation is top-notch, and their support team actually knows their stuff.

Just a word of caution though - make sure you’re complying with LinkedIn’s terms of service. They can be pretty strict about scraping. If you’re after public data only, you might want to look into their official API first. It’s more limited, but it’s the safest route if it meets your needs.

SoaringEagle · April 18, 2025, 10:16pm

have u tried using a residential proxy network? they’re usually more reliable for linkedin scraping. Also, check ur proxy’s connection speed - slow ones can cause timeouts. maybe try adjusting the timeout settings in puppeteer? if nothing works, u could look into using a headless browser API service. they handle proxy stuff for u