Implementing wildcard search in Discord bot for image filenames

I’m working on a Discord bot that looks for images by filename. Right now it only finds exact matches. For example, if I use -s house_art, it’ll only find house_art in filenames.

I want to add wildcard functionality. So if I type house%art, it should find filenames like house_anything_art or house__art. The % would represent any characters in between.

Here’s what I’m using now:

required_keywords.add(keyword.lower())
has_required_keywords = all(keyword in filename for keyword in required_keywords)

I tried this approach with regex, but it’s not working when I use %:

pattern = keyword.replace('%', '.*')
required_keywords.add(pattern.lower())

has_required_keywords = all(keyword.search(filename) if isinstance(keyword, re.Pattern) else keyword in filename for keyword in required_keywords)

Any ideas on how to make this work? Thanks!

hey, have you tried using fnmatch? it’s built for this kinda stuff. here’s a quick example:

from fnmatch import fnmatch

has_required_keywords = all(fnmatch(filename.lower(), keyword.lower().replace('%', '*')) for keyword in required_keywords)

this should work with your % wildcard. just remember to convert everything to lowercase first!

I’ve worked on a similar issue before, and here’s what worked for me:

Instead of using ‘%’ as a wildcard, I found it more effective to use ‘*’. This aligns better with standard wildcard conventions in many systems.

Here’s a modified version of your code that should do the trick:

import re

def create_pattern(keyword):
    return re.compile(keyword.replace('*', '.*').lower())

required_keywords = set(create_pattern(keyword) for keyword in user_input.split())

has_required_keywords = all(pattern.search(filename.lower()) for pattern in required_keywords)

This approach compiles the patterns once, which is more efficient. It also uses re.search() instead of re.match(), allowing matches anywhere in the filename.

Remember to update your user instructions to use ‘*’ instead of ‘%’ for wildcards. Hope this helps!

Having dealt with similar search functionality, I can suggest an alternative approach. Instead of using ‘%’ or ‘*’, consider implementing a more flexible fuzzy search algorithm like Levenshtein distance or cosine similarity. This would allow users to find matches even with slight misspellings or variations in filenames.

For a simpler solution, you could split the search term on ‘%’ and check if all parts exist in the filename in the correct order. Here’s a quick example:

def wildcard_match(filename, search_term):
    parts = search_term.lower().split('%')
    return all(part in filename.lower() for part in parts)

has_required_keywords = all(wildcard_match(filename, keyword) for keyword in required_keywords)

This method is more intuitive for users and doesn’t require regex knowledge. It’s also easier to extend with additional wildcard characters if needed in the future.