How to calculate OpenAI API costs for RAG system using LangChain and vector database

I’m working on a retrieval augmented generation system and need help figuring out the total OpenAI expenses. I want to monitor token consumption and pricing before running my pipeline.

Here’s my setup:

embedding_model = 'text-embedding-ada-002'
vector_embeddings = OpenAIEmbeddings(
    model=embedding_model,
    openai_api_key=api_key
)

def build_vector_store(documents, embeddings, storage_path):
    try:
        # Build FAISS vector database
        vector_db = FAISS.from_documents(documents, embeddings)
        
        # Store index locally
        vector_db.save_local(storage_path)
        
        # Load saved index
        vector_db = FAISS.load_local(storage_path, embeddings)
        
        return vector_db
    except Exception as error:
        print(f"Error occurred: {str(error)}")
        return None

vector_store = build_vector_store(document_chunks, vector_embeddings, save_path)
search_tool = vector_store.as_retriever()

prompt_format = ChatPromptTemplate.from_template(template=my_template)
chat_model = ChatOpenAI(model_name="gpt-4", temperature=0)

rag_pipeline = (
    {"context": search_tool, "question": RunnablePassthrough()}
    | prompt_format
    | chat_model
    | StrOutputParser()
)

user_query = "sample question"
response = rag_pipeline.invoke(user_query)

Cost tracking question: I know about using get_openai_callback() for the GPT-4 part, but how do I track embedding costs? Does the vector search itself consume tokens?

with get_openai_callback() as callback:
    response = rag_pipeline.invoke(user_query)
    print(callback)

This shows:

Tokens Used: 42
Prompt Tokens: 8
Completion Tokens: 34
Successful Requests: 1
Total Cost (USD): $8.1e-05

Does this include embedding costs or just the chat model usage?

Your callback only tracks GPT-4 chat completion costs - it misses embedding generation. When you create embeddings with text-embedding-ada-002, those API calls happen separately and won’t show up in your current setup.

I hit this same problem last month building something similar. You need to wrap the embedding creation with its own callback to see the full picture. The FAISS vector search itself doesn’t use any OpenAI tokens - it’s just math calculations on your local vectors.

For complete cost tracking, monitor embeddings during document processing, then add that to your query-time chat costs. Embedding costs are usually way lower than GPT-4, but they pile up fast if you’re processing big document sets regularly. Embeddings run about $0.0001 per 1K tokens while GPT-4 costs significantly more per token.

Quick breakdown - your callback only captures GPT-4 chat costs, not embeddings. FAISS vector search runs locally so no API costs there.

Hit this same issue last year building a doc Q&A system. Embeddings get created once during document processing, not per query, so you need separate tracking.

Here’s what I do:

# Track embedding costs during setup
with get_openai_callback() as embed_callback:
    vector_store = build_vector_store(document_chunks, vector_embeddings, save_path)
    embed_cost = embed_callback.total_cost

# Track query costs during runtime  
with get_openai_callback() as query_callback:
    response = rag_pipeline.invoke(user_query)
    query_cost = query_callback.total_cost

total_cost = embed_cost + query_cost

Embedding costs are usually tiny vs GPT-4 but add up if you’re reprocessing docs frequently. Once your vectors are saved, only the chat model burns tokens per query.

yeah, the callback doesn’t track embedding costs at all. ada-002 embeddings cost about $0.10 per million tokens when you first process docs. vector searches after that are free - just local faiss operations. wrap your build_vector_store function with another callback to catch those initial embedding costs. that’ll show you the real total spend.