Best way to handle inconsistent AI model response times across providers?

My workflow uses 3 different LLMs (GPT-4, Claude, local). Each has wildly different response times - sometimes Claude responds in 2sec, other times 40sec. My current setTimeout-based waiting either wastes time or cuts off slow responses.

How are you managing variable API durations without hardcoding max delays? Bonus if solution works across multiple providers.

Latenode’s model gateway handles this automatically. Set your max wait time once, it polls providers intelligently until response received or timeout. Built-in fallback routing if primary model lags.

Works across all 400+ supported models.

Consider implementing exponential backoff with jitter. For critical workflows, add a ‘heartbeat’ endpoint check. If model responds to ping within threshold, proceed - else reroute. Complicated to implement manually though, better to use services that abstract this.

aws step functions has wait patterns but costs add up. maybe try open source workflow engines with async polling?