I’m building a chat moderation system for my streaming platform using JavaScript. Recently we’ve been getting attacked by spam bots that flood our chat with inappropriate content. These messages all follow similar patterns but use character substitution to avoid detection.
The spam messages typically contain random numbers at the end and use mixed characters to hide certain words. Right now I’m using a basic keyword detection approach but it’s not very effective:
function detectSpam(message) {
const spamIndicators = ['x123', 'y456', 'z789'];
for (let indicator of spamIndicators) {
if (message.toLowerCase().includes(indicator)) {
return true;
}
}
return false;
}
if (detectSpam(userMessage)) {
moderationAPI.timeoutUser({
userId: currentUser.id,
duration: 600,
reason: 'Spam detected'
});
}
This method only works after I manually add the identifiers from spam messages I’ve already seen. I need a better way to identify these patterns automatically before they happen. What would be a more robust approach to catch these types of spam messages?
ML might be overkill, but hash-based detection absolutely works. I built a rolling hash system that fingerprints message structure instead of content. Bots doing character substitution still keep the same length and word patterns. Hash the message length, word count, and character distribution - spam bots generate almost identical hashes even with different characters. Try shadow banning instead of instant timeouts. Let suspected bots think they’re posting while hiding their messages from everyone else. They won’t adapt as quickly since they don’t know they’re caught. Add a check for whether users actually read chat before posting. This combo cut our bot traffic by 90% with zero false positives.
regex patterns work great for this. try /\w+\d{3,}/g to catch words with multiple digits. I’d also set up a bayesian filter - feed it spam examples and it’ll learn patterns on its own. rate limiting helps too - cap new accounts at 3 messages per 10 seconds. way more effective than trying to guess every spam trick.
I’ve dealt with similar bot attacks and timing patterns work great with content detection. Bots have weird behavioral signatures - they post at regular intervals or spam multiple messages right after joining. I set up a reputation system where new users get stricter filtering until they build legitimate activity. Track message frequency, account age, and whether they actually respond to other users naturally. Real people don’t send identical message lengths over and over or post every 30 seconds like robots. For character substitution, just normalize the text first - replace @ with a, 3 with e, etc., then run detection. Not perfect but catches about 80% of obvious attempts without hitting legitimate users who use numbers or special characters.