How to calculate OpenAI API costs for embedding and retrieval in RAG systems using LangChain and vector databases

I’m working on a RAG system and need help figuring out the total OpenAI costs. I want to monitor token usage and expenses for both the embedding creation and the chat completion parts.

Here’s my current setup:

embedding_model = 'text-embedding-ada-002'
vector_embeddings = OpenAIEmbeddings(
    model=embedding_model,
    openai_api_key=api_key
)

def build_vector_store(documents, embeddings, storage_path):
    try:
        vector_db = FAISS.from_documents(documents, embeddings)
        vector_db.save_local(storage_path)
        loaded_db = FAISS.load_local(storage_path, embeddings)
        return loaded_db
    except Exception as error:
        print(f"Error occurred: {str(error)}")
        return None

vector_store = build_vector_store(document_chunks, vector_embeddings, save_path)
doc_retriever = vector_store.as_retriever()

chat_template = """..."""
prompt = ChatPromptTemplate.from_template(template=chat_template)

chat_model = ChatOpenAI(model_name="gpt-4", temperature=0)

rag_pipeline = (
    {"context": doc_retriever, "question": RunnablePassthrough()}
    | prompt
    | chat_model
    | StrOutputParser()
)

user_query = "..."
result = rag_pipeline.invoke(user_query)

Question about cost tracking: I know about using get_openai_callback() to track GPT-4 costs during the RAG phase. But how do I track the embedding costs from the text-embedding-ada-002 model? Also, when GPT-4 gets context from the vector database, that doesn’t count as additional tokens right?

with get_openai_callback() as callback:
    result = rag_pipeline.invoke(user_query)
    print(callback)

This gives me output like:

Tokens Used: 37
Prompt Tokens: 4  
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05

How can I get the complete cost breakdown for my entire RAG workflow?

Had the same problem tracking costs in my RAG pipeline. The tricky part is embedding costs happen when you build the vector store - that’s completely separate from your retrieval queries.

You need to wrap your build_vector_store function with get_openai_callback() too. text-embedding-ada-002 charges per token, so when you’re creating embeddings for document chunks, those tokens get billed separately. I just count total tokens in my source docs before chunking to estimate this.

You’re right about the second question - retrieved context doesn’t hit OpenAI’s API again. Once embeddings are in FAISS, similarity search runs locally. The context just becomes part of your prompt tokens to GPT-4, which you’re already seeing in the callback.

For complete cost tracking, run the callback twice: once during vector store creation for embedding costs, once during each query for completion costs. Remember embedding costs are usually one-time unless you’re constantly adding new documents.