I’m having trouble with my web scraping project on CentOS 6.5. I’m using Python, Selenium, and a headless Firefox WebDriver. The script works fine on my Mac with a regular browser, but on the server, I keep getting connection errors.
When I try to load certain pages, I get errors like Connection reset by peer or Connection refused. It’s frustrating because it works sometimes, but fails often.
Here’s a basic example of what I’m trying to do:
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://example-shopping-site.com/daily-deals'
driver.get(url) # This is where it usually fails
I’ve tried adding waits, using different request headers, and other tricks, but nothing seems to fix it consistently. The weird part is that some AJAX-heavy pages load just fine.
Has anyone run into similar issues with headless browsers on CentOS? Any tips on how to make the connections more stable? I’m out of ideas and could really use some help!
hey Alice45, have u tried using a proxy? sometimes network issues can cause those connection errors. also, check ur firewall settings on CentOS. they might be blocking some connections.
another thing - make sure ur using the latest selenium version. older ones can be buggy with headless browsers.
good luck with ur scraping project!
I’ve faced similar challenges with headless browsers on CentOS. One thing that significantly improved stability for me was increasing the timeout settings. Try something like this:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.set_page_load_timeout(30)
wait = WebDriverWait(driver, 20)
url = ‘https://example-shopping-site.com/daily-deals’
driver.get(url)
wait.until(EC.presence_of_element_located((By.TAG_NAME, ‘body’)))
This approach gives the page more time to load and waits for the body element to appear. It’s not foolproof, but it helped me deal with unreliable connections.
Also, consider setting up a custom user agent string. Some websites block or limit requests from default WebDriver user agents. You can do this by adding:
options.add_argument(‘user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36’)
to your WebDriver options. This can sometimes bypass restrictions that cause connection issues.
I’ve encountered similar issues with headless Firefox on CentOS. One thing that helped in my case was updating to a newer version of GeckoDriver and Firefox ESR. CentOS 6.5 is quite old, so you might be running into compatibility problems.
Another approach that worked for me was switching to headless Chrome instead. It tends to be more stable in my experience, especially on older systems. You’d need to install ChromeDriver and use webdriver.Chrome() instead.
If you’re still set on using Firefox, try adding some options to your WebDriver setup:
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument(‘–headless’)
options.add_argument(‘–no-sandbox’)
options.add_argument(‘–disable-dev-shm-usage’)
driver = webdriver.Firefox(options=options)
These flags can help with stability issues in some cases. Hope this helps!