What (actually) breaks when you try to scale autonomous AI agents across a real business workflow?

I keep seeing demos where multiple AI agents coordinate on a task—one analyzes data, another drafts content, a third handles approvals. It looks seamless in the demo.

But I’m curious about real deployment scenarios. When you’ve got three or four agents working together on an actual business process, where does coordination start to fall apart? Is it decision-making consistency? State management? Cost explosion? Latency?

I’m particularly interested in the financial side. If you’re orchestrating multiple AI agents instead of paying for Camunda licenses, are you actually saving money overall, or are you just shifting the cost from licensing to API calls and coordination overhead?

Has anyone hit a scaling wall with multi-agent workflows? What was the failure mode?