Hello everyone! I’m trying to pull data automatically from a web-based software platform that shows information in a table format. I need to get this data into my Azure database regularly. The problem is that getting the data as a CSV file requires logging in with email verification codes and then waiting for the export to process through their website interface. Since they use email authentication, I can’t easily use automated browser tools. They also don’t offer any API for developers and don’t send the exported files by email. Are there any other methods I could try to get this data automatically?
I ran into something similar with a legacy inventory system that had zero API support. What ended up working was creating a virtual machine specifically for this task and running scheduled browser automation there. The key was using browser session persistence - instead of logging in fresh each time, I kept the browser session alive and just refreshed the data page periodically. This bypassed most of the authentication headaches since the session stayed valid for weeks. You could also try reaching out to other users of the same platform through LinkedIn or industry forums. Sometimes they have figured out workarounds or know about unofficial data export methods that aren’t documented. Another angle is checking if the platform has any mobile app version - mobile APIs are sometimes less protected and easier to reverse engineer than the main web platform.
have you thought about using selenium with a captcha solving service? email verification is tough, but some services can manage it. also, you might wanna try scraping the html table directly if the data loads without js - just inspect the page source first to check if it’s there.
I faced a similar situation with a financial platform that required two-factor authentication. One approach that worked was setting up a dedicated email account specifically for these exports and using IMAP automation to retrieve verification codes programmatically. You can combine this with Playwright or Puppeteer to handle the login flow. Another option is checking if the platform has any webhook functionality or if they offer different export formats that might be more automation-friendly. Sometimes contacting their support team directly can reveal undocumented API endpoints or alternative data access methods they use for enterprise clients. The email automation route requires some initial setup but once configured, it runs quite reliably for scheduled data extraction.