Trouble accessing LinkedIn pages with Splash headless browser

I’m running into problems trying to get the source code of LinkedIn pages using Splash headless browser. Every time I try, I get a ‘Failed loading page’ message. This happens for all LinkedIn URLs I’ve tried.

For example, I’ve attempted to access company pages like Amazon and Apple, but no luck. I’m using Splash version 3.4 on Ubuntu.

Has anyone else encountered this issue? Any ideas on how to fix it? I’m not sure if it’s a problem with my setup or if LinkedIn is blocking Splash somehow.

I’d really appreciate any tips or suggestions to get this working. Thanks in advance for your help!

I’ve dealt with this exact problem before. LinkedIn’s anti-scraping measures are quite robust. What worked for me was implementing a combination of techniques. First, I set up a pool of residential proxies to rotate my IP addresses. Then, I created a diverse set of user agent strings to cycle through.

The game-changer, though, was adding realistic delays and randomizing my request patterns. I also found that logging in with a real LinkedIn account (or using a dummy one) and maintaining an active session helped immensely.

For Splash specifically, make sure you’re using the latest version and consider adjusting the rendering wait time since dynamic content may take longer to load. If these methods don’t work, exploring other headless browsers like Puppeteer or Selenium might be the way forward.

yea, ive had similar issues w/ linkedin. they’re pretty aggressive w/ bot detection. have u tried using a proxy or rotating user agents? also, consider adding delays between requests to mimic human behavior. might help bypass their blocks. good luck!

I’ve encountered similar challenges with LinkedIn scraping. One approach that’s proven effective is using Selenium with undetected-chromedriver. This combination often bypasses LinkedIn’s bot detection more successfully than Splash.

Another tactic is to implement browser fingerprinting techniques. This involves mimicking real browser characteristics beyond just the user agent, such as screen resolution, installed plugins, and even mouse movements.

If you’re set on using Splash, try increasing the render time and implementing a wait_for JavaScript snippet to ensure the page fully loads before extraction. Also, consider implementing a backoff strategy for failed requests.

Remember, LinkedIn’s terms of service prohibit scraping, so proceed with caution and respect their policies.