Building an AI chatbot with document knowledge base using OpenAI, Pinecone and LangChain in Node.js

SilentSailing34 · August 18, 2025, 4:00pm

I’m working on creating a chatbot that can answer questions based on documents that users upload. The flow works like this: users upload files (PDF, text files, markdown), then my app breaks these files into smaller pieces and turns them into vectors using OpenAI embeddings. These vectors get stored in Pinecone database.

When someone asks a question, the bot searches for the most relevant chunk and uses that context to generate an answer with ChatGPT.

The issue I’m facing is that my current setup only retrieves one matching document chunk, which often doesn’t provide enough information for good answers. I need to figure out how to get more relevant chunks so the chatbot has better context to work with.

import { Request, Response } from "express";
import asyncHandler from 'express-async-handler';
import { v4 as uuidv4 } from 'uuid';
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { ChatOpenAI } from "langchain/chat_models/openai";
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { HumanMessage, SystemMessage } from "langchain/schema";

/**
 * Process uploaded documents and create vector embeddings
 */
export const createDocumentEmbeddings = asyncHandler(
  async (req: Request, res: Response) => {
    const userId: string = req.body.userId as string;
    const fileContent: string = req.body.fileContent as string;
    const fileId: string = uuidv4();

    const pineconeClient = new PineconeClient();
    const apiKey: string = process.env.PINECONE_KEY ?? '';
    await pineconeClient.init({
      environment: process.env.PINECONE_ENV ?? '',
      apiKey: apiKey,
    });

    try {
      const textSplitter = new RecursiveCharacterTextSplitter();

      const documents = await textSplitter.createDocuments([fileContent], [{
        file: fileId,
        user: userId
      }]);

      console.log('building vector database...');
      const embedder = new OpenAIEmbeddings();
      const indexName = pineconeClient.Index(process.env.PINECONE_INDEX_NAME ?? '');
      const storeConfig = {
        pineconeIndex: indexName,
        namespace: process.env.PINECONE_NS ?? '',
        textKey: 'content',
      };

      const vectorStore = await PineconeStore.fromDocuments(documents, embedder, storeConfig);
      res.json({success: true, documentId: fileId})
    } catch (error) {
      console.error('Document processing failed:', error);
      res.status(500).send('Failed to process document upload.');
    }
  }
);

/**
 * Search for relevant documents and generate AI response
 */
export const queryKnowledgeBase = async (req: Request, res: Response) => {
  const userId: string = req.body.userId as string;
  const userQuestion: string = req.body.userQuestion as string;
  const maxResults: number = (req.body.maxResults as number) || 1;

  const pineconeClient = new PineconeClient();
  await pineconeClient.init({
    apiKey: process.env.PINECONE_KEY ?? '',
    environment: process.env.PINECONE_ENV ?? '',
  });

  const index = pineconeClient.Index(process.env.PINECONE_INDEX_NAME ?? '');
  const store = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    { 
      pineconeIndex: index,
      namespace: process.env.PINECONE_NS ?? '',
      textKey: 'content'
    }
  );

  try {
    const matchingDocs = await store.similaritySearch(userQuestion, maxResults, {
      user: userId,
    });
    const aiChat = new ChatOpenAI({ temperature: 0.3 });

    const aiResponse = await aiChat.call([
      new SystemMessage(
        `Here is the relevant information from the user's documents:


        ${matchingDocs[0].pageContent}


        Please use this context to answer the user's question. Respond in the same language as the question.`
      ),
      new HumanMessage(userQuestion),
    ]);

    res.json(aiResponse);
  } catch (error) {
    console.error('Search failed: ' + error)
  }
};

Tom_Artist · August 31, 2025, 3:00am

The retrieval strategy needs more than just pulling extra chunks. Semantic search alone misses crucial context that’s scattered across different sections.

What worked for me: add a re-ranking step after your initial similarity search. I use a cross-encoder model to score retrieved chunks against the query - way better relevance.

Also try query expansion. Users don’t always use the same terms as your documents. I preprocess queries by generating similar phrasings with OpenAI before hitting Pinecone. This catches relevant chunks that pure semantic similarity misses.

Chunk size and overlap in RecursiveCharacterTextSplitter are huge. After testing different configs, I settled on 500-800 character chunks with 100 character overlap. Smaller chunks give more precise retrieval, but you’ll need more of them for complete answers.

danwilson85 · August 31, 2025, 1:42am

you’re only using matchingDocs[0].pageContent in the system message - that’s why you’re getting just one chunk despite retrieving multiple ones. try concatenating all the docs instead: matchingDocs.map(doc => doc.pageContent).join('\n\n') rather than grabbing only the first one.

jade_journey · August 29, 2025, 5:08pm

This screams automation. I’ve built similar systems - manual retrieval gets messy fast.

You’ve got multiple moving parts: chunking, embedding, retrieving, concatenating. Each needs tweaking. Don’t hardcode this logic - automate the whole pipeline.

I use Latenode for RAG workflows. It handles document processing automatically, manages vector retrieval intelligently, and dynamically adjusts chunk count based on question complexity. No more guessing maxResults or manually concatenating strings.

Best part? Latenode handles token management automatically. It knows when to trim context before hitting OpenAI limits - no more annoying API errors from chunky documents.

Set up the workflow once, let it optimize retrieval and response generation. Way cleaner than managing LangChain components manually.

Check it out: https://latenode.com

SwimmingShark · August 29, 2025, 10:54am

It’s more than just grabbing multiple chunks. Hit this same issue last year. Bumping maxResults to 3-5 documents helps a lot, but you’ve got to watch those token limits. When you smash all the matching docs together, you’ll blow past ChatGPT’s context window fast - especially with longer docs.

Here’s what fixed it for me: added a token counter before hitting OpenAI’s API and chopped the context if it got too fat. Also tweaked my RecursiveCharacterTextSplitter settings - went with smaller chunks and added overlap. Way better retrieval accuracy. The defaults usually make chunks too big and you lose those subtle connections between related info.