How to detect and block spam bots in Twitch chat

I’m running a moderation bot for my Twitch stream using JavaScript, and I’m facing issues with spam bots attacking my chat. These bots inundate the chat with messages that follow a specific pattern, often employing character substitutions to bypass basic filters. The messages typically contain inappropriate content with a mix of Latin and Cyrillic characters, along with random numbers and references to messaging apps.

Current detection method:

if (
    message.toLowerCase().includes('x123') ||
    message.toLowerCase().includes('y456') ||
    message.toLowerCase().includes('z789')
) {
    await twitchApi.moderateUser(
        streamerId,
        {
            action: 'ban',
            userId: chatMessage.senderId,
            reason: `Spam detected: ${message}`
        }
    );
}

This method only works after I add identifiers from each spam wave manually. I’m looking for a more effective way to automatically identify these spam messages before they take over the chat. What patterns should I consider to enhance my spam detection?

Mixed character detection works great for catching these bots. I check for messages mixing Latin and Cyrillic characters in weird ways - catches most of them. This regex helps: /[\u0400-\u04FF].*[a-zA-Z]|[a-zA-Z].*[\u0400-\u04FF]/ Message entropy is another solid method. Spam has crazy high randomness from character swaps and random numbers. Calculate Shannon entropy on incoming messages and flag anything above your threshold. Timing patterns matter too. These bots post within seconds of each other or at super regular intervals. Track timestamps and flag clusters of similar messages from different accounts. The technique that saved me? Follower-only delay for new accounts. Most spam bots won’t follow first, so requiring a short follow period before posting kills tons of them without hurting real viewers.

Pattern matching alone won’t cut it. You need multiple signals working together.

I’ve tackled this before - combining message frequency, character entropy, and account age beats static keyword matching every time.

For character mixing, watch for weird Unicode ranges in the same message. Spam bots love mixing Latin with Cyrillic or throwing in random symbols.

Account age matters big time. Fresh accounts posting right away? Red flag.

Message similarity across users is golden. Multiple accounts posting nearly identical messages in a short window screams coordinated spam.

But managing all these detection rules manually? Total nightmare. Spam evolves constantly, so you’re always tweaking thresholds and adding new patterns.

I went full automation on the detection pipeline. When suspicious patterns pop up, the system auto-updates detection rules, pulls data from multiple sources, and connects with external spam databases.

Everything runs automatically - real-time analysis, updating ban lists across platforms. Way better than chasing patterns by hand.

You can build something like this at https://latenode.com

Rate limiting works best against these attacks. I set harsh limits for new accounts - 1 message per 30 seconds for viewers under a week old. Real users never hit these limits, but bots always do. I built a similarity checker that compares new messages to recent chat using string distance algorithms. When multiple accounts post 80%+ similar messages within minutes, they get auto-timed out. The real game changer? Tracking message templates. Spam ops reuse the same structure with different variables. I store normalized versions of flagged messages (strip numbers and common swaps) and match against this database. Once you spot a template, you catch every variant automatically. Don’t sleep on Twitch’s AutoMod either. Crank up the aggression to catch obvious spam while your custom stuff handles the clever attempts. This combo beats either method alone.