I’m working on a RAG system and need help figuring out the total OpenAI costs. I want to monitor token usage and expenses for both the embedding creation and the chat completion parts.
Here’s my current setup:
embedding_model = 'text-embedding-ada-002'
vector_embeddings = OpenAIEmbeddings(
model=embedding_model,
openai_api_key=api_key
)
def build_vector_store(documents, embeddings, storage_path):
try:
vector_db = FAISS.from_documents(documents, embeddings)
vector_db.save_local(storage_path)
loaded_db = FAISS.load_local(storage_path, embeddings)
return loaded_db
except Exception as error:
print(f"Error occurred: {str(error)}")
return None
vector_store = build_vector_store(document_chunks, vector_embeddings, save_path)
doc_retriever = vector_store.as_retriever()
chat_template = """..."""
prompt = ChatPromptTemplate.from_template(template=chat_template)
chat_model = ChatOpenAI(model_name="gpt-4", temperature=0)
rag_pipeline = (
{"context": doc_retriever, "question": RunnablePassthrough()}
| prompt
| chat_model
| StrOutputParser()
)
user_query = "..."
result = rag_pipeline.invoke(user_query)
Question about cost tracking: I know about using get_openai_callback() to track GPT-4 costs during the RAG phase. But how do I track the embedding costs from the text-embedding-ada-002 model? Also, when GPT-4 gets context from the vector database, that doesn’t count as additional tokens right?
with get_openai_callback() as callback:
result = rag_pipeline.invoke(user_query)
print(callback)
This gives me output like:
Tokens Used: 37
Prompt Tokens: 4
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05
How can I get the complete cost breakdown for my entire RAG workflow?