How do you handle live data synchronization from Shopify to external databases like MySQL, BigQuery, PostgreSQL? What patterns work best?

Hi everyone! I’m Sarah and I’ve been dealing with Shopify data sync problems for a few years now. I want to know how other devs are handling this stuff.

The Main Problem

Most of us need to get Shopify data into our own databases or warehouses for reporting, custom logic, stock control, etc. But there are so many ways to do it and they all have pros and cons.

What I Want to Know

How Are You Doing It Now:

  • Webhooks with your own code? How do you deal with missing events and wrong order?
  • Tools like Airbyte or Stitch? What kind of delays do you get?
  • Polling with API calls? How do you stay under rate limits?
  • Message queues like RabbitMQ? Is it too complex?
  • No-code tools like Zapier? Does it get expensive fast?

Problems I Keep Seeing:

  • API limits: Does the 2 requests per second thing cause real issues? Any tricks?
  • Data getting out of sync: What happens when your systems don’t match?
  • Webhook issues: Missing events, duplicates, wrong timing?
  • Testing: How do you test without breaking live data?

Scale Questions:

  • How much data are you moving around?
  • Do you need instant updates or can you wait a bit?
  • How much time does your team spend fixing sync issues?

The Big Question

If you could build the perfect Shopify sync system, what would it look like? And how close is what you have now?

I really want to hear from people who’ve done this with lots of data or tried different ways. What actually works? What was a waste of time?

I can share some patterns I’ve found too if people want.

Thanks for any help!

i totally get it! missing data sucks! we rely on webhooks for the important updates and then do some api polling to grab anything that might’ve been missed. using timestamps really helps keep everything aligned.

Been dealing with this exact mess for years at our company. We process millions of Shopify events daily across multiple stores.

Here’s what actually works:

Webhooks are your main pipeline but you need bulletproof retries. We use dead letter queues - failed webhooks get retried with exponential backoff. After 3 failures, they hit a manual review queue.

For sync gaps, we run daily reconciliation comparing our database against Shopify’s data using updated_at timestamps. Catches about 0.1% of missed webhook events.

The 2 requests per second limit sucks. We use token bucket with Redis to track rate usage across workers. Hit the limit? Requests go into a priority queue.

Biggest lesson: always store raw webhook payloads before processing. When your transformation logic changes (it will), you can replay events without losing data.

For testing, we built a webhook simulator that replays production events against staging. Saved tons of headaches rolling out changes.

Message queues add complexity but they’re worth it at scale. We use AWS SQS with separate queues for different event types. Orders get higher priority than product updates.

Skip Zapier for high volume. Gets expensive fast and you lose control when things break.