I’m dealing with a situation where my manager wants us to stick with crewAI for building our multi-agent system. I spent the entire past week working with this framework and ran into several issues that are making me question if it’s ready for real world use.
First off, the virtual environment became massive - nearly 1GB in size. This makes deployment a nightmare. The framework feels very limiting and lacks proper monitoring tools. I can’t see what’s happening under the hood or what prompts are actually being sent to the language model.
The agents have weird behavior too - they’ll trigger the same function multiple times consecutively. A single crew execution takes around 10 minutes to complete. The documentation isn’t great either, I get better help from community forums.
What really concerns me is that I haven’t found any companies publicly stating they use crewAI in production. Meanwhile, other frameworks like LangGraph have clear case studies and success stories on their websites.
I’m considering suggesting we switch to LangGraph since it has better observability tools like LangSmith. Yes, there’s a learning curve with new abstractions, but it might be worth the investment.
Has anyone here actually deployed crewAI successfully in production? What has been your experience?
We tested crewAI six months ago for customer service automation and ultimately decided against deploying it for production use. You’re right about the lack of monitoring; we couldn’t determine the decision-making processes of the agents, which is crucial for troubleshooting when production issues arise. We also encountered the same performance problems you mentioned, as our proof of concept would often freeze without providing any useful error messages. While it may be adequate for prototyping multi-agent applications, the operational challenges make it difficult to endorse for critical functions. Instead, we opted to create our own solution utilizing established tools that allowed us to implement proper logging, metrics, and circuit breakers. Additionally, our architecture team was concerned about vendor lock-in, given the framework’s nascent status, which made it difficult to place our trust in it at an enterprise level.
Your manager’s probably pushing crewAI because it looks simple on paper, but I’ve seen this before with hyped frameworks that aren’t ready.
Hit something similar two years ago with another agent framework. Management loved the demos, but scaling it was a disaster. Those 10 minute execution times you’re seeing? Huge red flag. Production users want seconds, not minutes.
That 1GB environment will destroy your deployment costs. We learned this when our container registry bills went through the roof.
Here’s what worked - I built a quick proof of concept with LangGraph next to the crewAI version. Same functionality, tracked memory usage, execution time, and deployment size for both. Showed leadership the side-by-side comparison and the decision was obvious.
The lack of public case studies for crewAI production deployments isn’t coincidental. Most companies that tried it hit the same walls you’re seeing.
LangGraph has a real learning curve but it’s manageable. The observability alone makes it worth switching. You can actually debug issues instead of guessing.
Been using crewAI for about 8 months now - mostly internal automation, nothing customer-facing. The performance problems you’re seeing are real. We get the same slow execution times and yeah, the dependency bloat sucks for containers. I did find some workarounds though. Tweaking agent configs and getting really specific with tool definitions cut down on those annoying repetitive function calls. It works okay for us since we don’t need real-time responses and can live with slower speeds. But honestly? I wouldn’t touch it for anything mission-critical unless you build serious custom monitoring around it. The observability gap is huge - that’s the real killer for production use. Your gut feeling about LangGraph is probably right if you need better visibility and more predictable performance.
CrewAI might work for small internal projects, but the ecosystem feels pretty immature compared to other options. We tried it last year and the deployment issues were a nightmare - we bailed pretty quickly. If your manager’s pushing for it, maybe put together a side-by-side comparison with LangGraph performance metrics? Sometimes you need hard numbers when they’re not listening to technical debt concerns.