How to format real-time text chunks from OpenAI API in Next.js application

I’m working on a Next.js project that integrates with OpenAI’s API. When I disable streaming (stream: false), the response comes back with proper markdown formatting that I can easily parse and display using existing libraries.

However, when I enable streaming to get real-time responses, the text arrives in small chunks that are difficult to format properly. The streaming data doesn’t maintain the markdown structure consistently across chunks.

How can I handle formatting for streaming text responses from OpenAI? I need to process each chunk as it arrives while maintaining proper text structure.

Here’s my current implementation:

const processStreamingResponse = async (messageData) => {
  const apiResponse = await fetch('/api/chat/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(messageData),
  });
  
  if (!apiResponse.body) {
    throw new Error('Response body not available');
  }
  
  const streamReader = apiResponse.body.getReader();
  const textDecoder = new TextDecoder();
  let textBuffer = '';
  
  while (true) {
    const { value, done } = await streamReader.read();
    if (done) break;
    
    const textChunk = textDecoder.decode(value, { stream: true });
    textBuffer += textChunk;
    
    const dataLines = textBuffer.split('\n');
    textBuffer = dataLines.pop() || '';
    
    for (const dataLine of dataLines) {
      if (dataLine.trim()) {
        const processedLine = dataLine
          .replace(/id:\s*\d+\s*/g, '')
          .replace(/^data:\s*/, '')
          .replace(/\\n/g, '\n')
          .replace(/\\"/g, '"')
          .replace(/\\t/g, '\t')
          .replace(/\d+\\\nid:\s*\d+\\ndata:\s*/g, '')
          .replace(/\*\*/g, '')
          .replace(/\`\`\`/g, '')
          .trim();
        
        if (processedLine) {
          handleFormattedChunk(processedLine);
        }
        
        if (formattedContent) {
          updateMessageDisplay({
            messageId: currentMessageId,
            content: formattedContent,
          });
          formattedContent = "";
        }
      }
    }
  }
};

I faced a similar challenge when creating a streaming chat application. The key is to avoid formatting each chunk immediately as they come in, because they often don’t represent complete markdown structures. Instead, I suggest using a dual buffering system; one buffer collects the raw incoming chunks while the other contains the formatted text ready for display. You should only format the content when you have a complete markdown block, such as a full paragraph or code snippet. Implementing a simple state management to determine when you’re within different markdown sections can help with this. Additionally, consider using a debounced approach to format the text after a short pause, such as 50ms. This strategy not only provides better performance by limiting unnecessary re-renders but also maintains a smooth user experience. Remember, streaming doesn’t equate to instant formatting—minimal delays may go unnoticed as long as the content flows uninterrupted.