We’re currently using Fivetran but our costs are getting out of control. Two specific data pipelines from MySQL to Snowflake are eating up around 30M MAR each month. If we can find cheaper alternatives for these two pipelines, we can keep using Fivetran for our other data flows.
We’re considering these three approaches:
Self-hosted Airbyte on AWS EC2
DLTHub integration (we already run Airflow on EC2)
AWS DMS for MySQL to S3, then Snowpipe to load into Snowflake
Looking for feedback on these options or other suggestions. A few constraints to keep in mind:
We need SSO support for security compliance (rules out hosted solutions like Airbyte Cloud)
Our data team has 2 people with strong Python skills
Platform team has 4 engineers who handle EC2 management and already maintain our Airflow setup
totally agree - airbyte’s been great for us! we switched over and saw major cost savings. running it on ec2 with airflow is pretty straight forward. just watch out for dms complexity with binlog configurations, but airbyte should work well for what u need.
Honestly, sounds like your team’s already set up perfectly for option 2. DLTHub with Airflow will be way less overhead than spinning up Airbyte infrastructure. We migrated from Fivetran last year and DLTHub was surprisingly easy to get running - your Python devs will love it. Plus no extra services to maintain unlike Airbyte.
Go with AWS DMS + Snowpipe. We built this exact setup for a client with similar volumes and saved them a ton of money. DMS handles MySQL replication great, especially with binlog, and you get solid AWS reliability. Your platform team won’t struggle much since they’re already running EC2 stuff. Just heads up - you’ll need to handle data transformation between S3 and Snowflake if Fivetran’s doing heavy lifting there now. Once it’s set up, there’s barely any operational overhead and scaling’s easy through DMS task tweaks. We’re seeing sub-minute latency on most changes.
We hit the same cost issues with Fivetran 18 months ago. Since you’ve got strong Python skills, check out Meltano. It’s built on Singer taps and targets, gives you full pipeline control, and plays nice with Airflow. The MySQL to Snowflake connector handled our 25M+ monthly records without issues. Your platform team can run it on the same EC2 instances as Airflow, plus it does custom transformations through dbt. Our Python devs picked it up fast, and we’re saving about 70% vs Fivetran. Definitely worth looking at alongside DLTHub - both use your existing infrastructure without adding operational headaches.
I’d go with DLTHub since you’ve already got Airflow and solid Python skills. We rolled out something similar last year - way less maintenance headache than adding another service like Airbyte. DLT handles incremental loading from MySQL binlog really well, and the transformation features are solid. Best part? Your platform team doesn’t need to babysit extra infrastructure. It just runs as another Airflow DAG. We’re pushing 15M records monthly through DLT without any real bottlenecks. Our Python devs picked it up fast, and debugging’s much simpler when everything’s in your existing orchestration setup.