I’m working on a Discord moderation bot and need help with the word filtering system. Right now my bot can catch basic bad words, but users are getting around it by putting spaces, dots, underscores and other symbols between the letters.
import discord
from discord.ext import commands
client = commands.Bot(command_prefix='!', intents=discord.Intents.all())
filter_enabled = True
with open('blocked_terms.txt', 'r') as file:
forbidden_words = file.read().splitlines()
@client.event
async def on_message(msg):
await client.process_commands(msg)
if msg.author.bot or msg.author == client.user:
return
if filter_enabled:
for term in forbidden_words:
if term in msg.content.lower():
await msg.delete()
await msg.channel.send(f'{msg.author.mention}, please watch your language!')
break
The current setup only catches exact matches. How can I modify this to detect words like ‘b@d’ or ‘w o r d’ or ‘te-st’ that are clearly trying to bypass the filter? I tried adding spaced versions to my text file but that’s not practical since there are so many possible combinations.
just use Levenshtein distance against your wordlist. it counts character changes needed to turn one string into another. someone types ‘b4d.w0rd’? that’s 3-4 changes from ‘badword’. set your threshold to 2-3 edits max and you’ll catch most attempts without messy regex patterns.
You’re dealing with the classic cat and mouse game of content moderation. Been there way too many times.
You need a smart preprocessing pipeline that handles sneaky variations automatically. Strip all non-alphabetic characters first, then apply character substitutions (@ to a, 3 to e, 1 to i, etc.), convert to lowercase, and check against your word list.
But maintaining all this logic becomes a nightmare fast. Keeping substitution dictionaries updated, handling edge cases - you’ll spend countless hours tweaking regex patterns and testing false positives.
I’ve automated this entire headache using Latenode. Set up a workflow that takes Discord messages, runs them through multiple cleaning steps, checks against updated word lists, and even uses AI for context checking. The bot just sends the message to the workflow and gets back a clean/dirty response.
No more maintaining complex filtering code in your bot. Just one API call handles all the heavy lifting. The workflow can even log attempts and update your word lists automatically when new bypass patterns emerge.
Had this exact issue building my server’s moderation system. Here’s what worked for me: normalize the message text before running it against your word list. Strip out common separators and convert symbols back to letters - ‘@’ becomes ‘a’, ‘3’ becomes ‘e’, ‘0’ becomes ‘o’. Remove spaces, dots, dashes, and underscores completely. Then check the cleaned text against your banned words. Watch out though - I made my character mapping too aggressive at first and caught tons of legit words. Also check substrings instead of exact matches since people love adding random junk at the start or end. It’s all about balancing catching the sneaky stuff without flagging innocent messages.
Regex will handle this way better than character mapping. Build a pattern that matches your banned words with optional separators between letters. For example: re.compile(r'b[^a-z]*a[^a-z]*d[^a-z]*w[^a-z]*o[^a-z]*r[^a-z]*d', re.IGNORECASE) for each word you want to catch. The [^a-z]* grabs any non-letter junk people stick between the actual letters. I’ve used this on my 15k server for 8 months - catches about 90% of bypass attempts with zero false positives. Build the patterns dynamically from your word list instead of hardcoding everything. Performance is solid since regex engines are built for this stuff. Just escape special regex characters in your banned words first.