How can I compare Twitch chat messages to a list of words in Python?

Alex_Brave · May 10, 2025, 12:00am

I’m working on a Twitch bot in Python. My goal is to filter chat messages using a list of words or phrases stored in a CSV. The CSV has words and their associated categories, and the bot should match chat messages against these lists.

I’ve managed to:

Retrieve chat messages
Block users
Remove messages
Post new messages

Now, I’m trying to determine the best approach to compare a chat message to the words in the CSV with their respective categories. Would CSV remain the best option or is there a better method, such as using an INI file with sections and properties?

Below is a simplified example of how I currently check messages:

def verify_message(msg, list_of_words):
    for w in list_of_words:
        if w in msg.lower():
            return True
    return False

incoming_message = 'Hello, how are you?'
lists = ['unwanted', 'nasty', 'improper']

if verify_message(incoming_message, lists):
    print('The message contains a banned word')
else:
    print('The message is clean')

Any advice on enhancing this check for categories would be appreciated.

joec · May 18, 2025, 12:27pm

I’ve been working with Twitch bots for a while, and I’ve found that using a database like SQLite can be really effective for this kind of word filtering. It’s fast, lightweight, and allows for more complex queries than flat files.

You could structure your database with tables for categories and words, linking them together. This approach scales well as your list grows and makes it easy to update or query specific categories.

For the actual checking, consider using regular expressions. They’re more powerful than simple string matching and can handle variations (like plurals or common misspellings) more easily.

Here’s a rough idea:

import re
import sqlite3

conn = sqlite3.connect('banned_words.db')
cursor = conn.cursor()

def check_message(msg):
    msg = msg.lower()
    cursor.execute('SELECT category FROM words WHERE ? REGEXP word', (msg,))
    result = cursor.fetchone()
    return result[0] if result else None

This method is efficient and flexible, allowing for more sophisticated filtering as your bot evolves.

Claire29 · May 17, 2025, 9:40am

hey there! i’ve dealt with similar stuff before. instead of CSV, try using a JSON file. it’s easier to work with in Python and can handle nested structures better. you could have categories as keys and banned words as values. then use json.load() to read it in. hope this helps!

Liam_25Meditation · May 17, 2025, 2:40am

yo, have u thought about using regex? it’s pretty powerful for this kinda thing. you could make patterns that match variations of words, like plurals or misspellings. might be overkill for simple stuff, but could be useful if ur list gets complex. just an idea!

JackHero77 · May 15, 2025, 1:39pm

For your Twitch bot, I’d recommend using a dictionary to store the words and categories. It’s faster for lookups compared to iterating through a list. You can load this from a JSON file as suggested, or even directly in your code. Here’s a quick example:

banned_words = {
‘unwanted’: [‘word1’, ‘word2’],
‘nasty’: [‘word3’, ‘word4’],
‘improper’: [‘word5’, ‘word6’]
}

def check_message(msg):
msg = msg.lower()
for category, words in banned_words.items():
if any(word in msg for word in words):
return category
return None

This approach is more efficient and gives you the category of the matched word. You can easily expand this to handle more complex filtering logic if needed.

benmoore · May 15, 2025, 12:18pm

Having worked on similar projects, I’d suggest using a trie data structure for efficient word matching. It’s particularly effective for prefix-based searches and can significantly speed up your filtering process, especially with a large set of words.

Here’s a basic implementation idea:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.category = None


def insert(root, word, category):
    node = root
    for char in word:
        if char not in node.children:
            node.children[char] = TrieNode()
        node = node.children[char]
    node.is_end = True
    node.category = category


def search(root, message):
    words = message.lower().split()
    for word in words:
        node = root
        for char in word:
            if char not in node.children:
                break
            node = node.children[char]
            if node.is_end:
                return node.category
    return None

This approach offers fast lookups and can be easily extended to handle more complex filtering requirements.