Problem with Sending Files from Python to Telegram Bot

I need assistance with a problem I’m facing. I’m developing a script that monitors files and sends them to a Telegram bot. The naming convention for my files is as follows:

fantasyworld.xyz_8080_🅺🅰️🅼🅱️🅾️_A‌b‌c‌D‌e‌F‌g‌

However, when I transmit these files using Python, they appear as:

fantasyworld.xyz_8080___AbcDef
# (Check that special characters are absent)

I’ve tried various encoding options and consulted multiple AI tools for solutions, but haven’t found any successful outcomes. This isn’t a critical issue, but it is quite frustrating since the rest of the script functions perfectly. It’s possible that I may need to eliminate such characters entirely, or perhaps I can resolve it through coding adjustments. Below is the section of code where I suspect the problem resides:

def upload_file(path_to_file, file_name):
    try:
        if system_is_windows:
            with open(path_to_file, 'rb') as file:
                bot.send_document(chat_id, file, caption=file_name)
        else:
            endpoint = f'https://api.telegram.org/bot{api_token}/sendDocument'
            with open(path_to_file, 'rb') as file:
                files_data = {'document': file}
                data_payload = {'chat_id': chat_id, 'caption': file_name}
                response = requests.post(endpoint, files=files_data, data=data_payload)
                if response.status_code != 200:
                    raise Exception(f'Error: {response.text}')
        print(f'Successfully sent {file_name}!')
    except Exception as error:
        print(f'Error sending {file_name}: {error}')

I’ve run into similar issues in the past where special characters in filenames cause unexpected behavior when sent over APIs. I found that sometimes the underlying problem is with how the file paths are constructed in different OS environments before the file even gets to the part of the code where you send it. Double-check if the path or filename is being altered earlier in your script, or by any middleware, and consider logging the exact filename format right before it’s used in the API call. If you find discrepancies, a temporary workaround is to replace problematic characters with URL-safe or ASCII equivalents just before sending, ensuring you maintain some sense of the original filename structure, and then mapping them back if necessary upon receipt.

Based on my experience, the issue you’re experiencing might be attributed to how Python or the environment handles Unicode characters, especially when sending filenames or any text across different platforms or APIs. To tackle this, try ensuring that the script explicitly encodes your filenames in UTF-8 before they are sent. You can use something like this: file_name.encode('utf-8') when constructing your payload. Additionally, verify if Telegram API supports these characters. You might also benefit from checking how your system’s default locale handles encoding. I used to face similar issues and adjusting the encoding usually did the trick.

u might wanna check if windows is striping special chars from filenames by default. maybe try using a diff extension or zip the files before sending,-like they say ‘when in doubt, zip it out’. this might help preserve the file’s original name structure.