I’m trying to build something where people can create AI-generated content through Telegram and Discord bots. The backend would use ComfyUI hosted on cloud services. This is meant to be a community-scale project with these features:
Image generation from text prompts (both general and restricted content)
Face replacement for photos and videos
Quick AI video creation
Permission-based access for certain content types
Maybe some kind of credits or points system
Since this is for a bigger community and not just me, I need it to work smoothly and handle lots of requests automatically.
Anyone built something like this before? Would love to hear about your experience or get some advice on the technical side. Also open to working together or getting help from someone who knows this stuff well.
Been running something similar for 6 months but went with Telegram first - their inline keyboards are perfect for parameter selection. I used Docker containers with auto-scaling on AWS ECS instead of load balancers. Handles traffic spikes way better and keeps costs down when it’s quiet. Wish I’d known earlier that ComfyUI workflow files get corrupted when multiple instances try modifying them at once. Made them read-only and pass parameters through the API instead. The permission system’s trickier than you’d think - you’ve got to validate at both the bot level and ComfyUI level or people will bypass restrictions. For face replacement, preprocessing images to detect and crop faces before sending to ComfyUI really improved speed and quality. Credit system integration with Stripe was actually pretty easy using webhooks.
I built something like this 8 months ago for my gaming community. I started with Discord since their bot API is much easier than Telegram’s. Queue management will be your biggest headache as ComfyUI struggles with multiple simultaneous requests. I employed Redis for job queuing and operated several ComfyUI instances behind a load balancer. For a credits system, I stored user balances in PostgreSQL and monitored usage by generation type. Video processing is considerably more resource-intensive than basic image generation. A crucial tip is to filter content before sending it to ComfyUI, as I learned that lesson the hard way. Ensure robust error handling since cloud instances often experience random failures. Face replacement was challenging and required extensive tuning for satisfactory results. Performance-wise, you should expect 30-45 seconds per image and about 3-5 minutes for short videos, depending on your cloud configuration.
Telegram’s webhook setup is way more annoying than Discord’s gateway. I switched to serverless with AWS Lambda + SQS - handles traffic spikes much better than running instances constantly. Watch out for ComfyUI’s memory leaks though, they’ll crash your containers after a few hundred generations. Set up auto-restart. Cache your common workflows in Redis - cuts loading time significantly.