I’m building a Telegram bot with aiogram that uses Google’s generative AI to respond to user messages. The bot works when I wait for the full response, but I want to stream the response in real-time using chunks.
The issue happens when I try to update the message with each new chunk using bot.edit_message_text(). I get this error: aiogram.exceptions.TelegramBadRequest: Telegram server says - Bad Request: can't parse entities: Can't find end of the entity starting at byte offset
Here’s my code:
from aiogram import Bot, Dispatcher, types
from aiogram.enums import ParseMode
import google.generativeai as genai
bot = Bot(token=BOT_TOKEN)
dp = Dispatcher()
@dp.message()
async def process_user_input(msg: types.Message):
user_input = msg.text
loading_msg = await msg.answer("Processing your request...")
full_response = ""
try:
async for text_chunk in get_ai_response(user_input):
full_response += text_chunk
await loading_msg.edit_text(
full_response,
parse_mode=ParseMode.MARKDOWN
)
except Exception as error:
print(f"Error occurred: {error}")
await loading_msg.edit_text("Something went wrong")
# ai_service.py
from typing import AsyncGenerator
import google.generativeai as genai
genai.configure(api_key=API_KEY)
ai_model = genai.GenerativeModel("gemini-1.5-flash")
async def get_ai_response(user_prompt: str) -> AsyncGenerator[str, None]:
try:
stream_response = convert_to_async(ai_model.generate_content(user_prompt, stream=True))
async for response_part in stream_response:
yield response_part.text
except Exception as error:
print(f"AI error: {error}")
yield "Error generating response"
The problem is that markdown symbols get split across chunks. When a chunk has an opening * or _ but not the closing one, Telegram can’t parse it properly. I tried both MARKDOWN and MARKDOWN_V2 modes but same issue occurs.
How can I handle this streaming while avoiding the markdown parsing errors? Should I buffer the chunks somehow or strip markdown formatting?
honestly just disable markdown parsing entirely for streaming responses and use plain text mode. i had the same headache and wasted hours trying to fix broken markdown chunks. set parse_mode=None in your edit_message_text call and the streaming works flawlessly. you loose formatting but gain reliability - users care more about getting fast responses than bold text anyway.
I ran into this exact problem last year and found that buffering is the most reliable approach. Instead of updating the message with every chunk, I accumulate the text and only update when the markdown is balanced. Here’s what worked for me - create a function that checks if markdown symbols are properly paired before sending updates. Count opening and closing asterisks, underscores, and backticks. Only call edit_message_text() when they match or when you hit a sentence boundary. Alternatively, you can strip all markdown formatting from streamed responses using regex and add it back later, though this loses some formatting. Another option is to update less frequently - maybe every 3-5 chunks instead of every single one. Users still get the streaming feel but with fewer parsing errors. The key insight is that Telegram validates markdown on each update, so incomplete formatting will always fail. You need to ensure each update contains valid markdown or none at all.
The root cause is that streaming chunks break markdown syntax mid-way through formatting tokens. What I discovered while working on a similar implementation is using a debounced update approach combined with text validation. Set up a timer that delays the message update by 200-300ms after receiving each chunk. If another chunk arrives before the timer expires, reset the timer and append the new text. This way you avoid rapid-fire updates while still maintaining responsiveness. For the markdown validation part, create a simple parser that tracks nested formatting states - when you detect incomplete markdown pairs, either wait for the next chunk or temporarily close them with placeholder text like “…” until the closing token arrives. I also recommend implementing a fallback mechanism where if parsing fails multiple times consecutively, switch to plain text mode for that particular response. This prevents the entire conversation from breaking due to malformed markdown. The performance impact is minimal since you’re just delaying updates slightly, but the user experience improvement is significant.