Hey everyone! I’m trying to make a tool to export Group Bookshelves from Goodreads. It’s for groups that require membership, so I need to log in first. I’m using Python with a headless Selenium driver. My tool was functioning a while back, but after some refactoring, it’s not detecting the login link anymore. I’m uncertain if this is due to my XPATH or simply a timing issue. Testing the XPATH multiple times in the developer console confirms that it works, so I suspect a timing problem.
Here’s a snippet of my code:
def get_login_link(self):
try:
login_button = WebDriverWait(self.driver, 15).until(
EC.presence_of_element_located((By.XPATH, "//button[text()='Log in with email']"))
)
login_link = login_button.find_element(By.XPATH, "..").get_attribute("href")
return login_link
except TimeoutException:
print("Could not locate the login button")
return None
def login(self):
login_link = self.get_login_link()
if login_link:
self.driver.get(login_link)
# Additional login steps
else:
print("Login failed")
I would appreciate any pointers on resolving this timing issue. Also, if anyone has insights on bypassing Goodreads’ 100 page limit, that would be incredibly helpful. Thanks!
I’ve encountered similar issues with Selenium and Goodreads. Have you tried increasing the wait time or using a different locator strategy? Sometimes, the site’s structure changes, causing XPATHs to fail. You might want to try using CSS selectors instead, as they’re often more robust.
For the 100-page limit, you could implement a pagination system in your script. Alternatively, consider using the Goodreads API if it’s available for your use case. It might provide a more stable and efficient way to access the data you need.
Lastly, ensure you’re not hitting the site too frequently, as that could trigger anti-scraping measures. Implementing delays between requests might help if that’s the case.
hey mate, i’ve had simlar issues. try using a different browser like Firefox instead of Chrome. sometimes goodreads acts weird with certain browsers. also, check if ur using the latest selenium version. older versions can be buggy af. good luck!
I’ve been working with Goodreads automation for a while now, and I can tell you it can be tricky. One thing that’s helped me is using explicit waits instead of implicit ones. Try something like this:
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
login_button = wait.until(EC.element_to_be_clickable((By.XPATH, “//button[text()=‘Log in with email’]”))
This waits for the element to be clickable, not just present. It’s been more reliable in my experience.
As for the 100 page limit, I’ve had some success by rotating user agents and IP addresses. It’s not perfect, but it helps. Just be careful not to overdo it and respect Goodreads’ terms of service.