Hugging Face local embeddings still requesting OpenAI API key in llama-index setup

I’m building a document-based Q&A application with llama-index and trying to switch from OpenAI to a completely local setup using Hugging Face models. I followed the documentation for setting up local embeddings but I keep getting an error asking for OpenAI API keys even though I’m not using OpenAI anymore.

Here’s my implementation:

from pathlib import Path
import gradio as gr
import logging
from llama_index.llms import HuggingFaceLLM
from llama_index.prompts.prompts import SimpleInputPrompt
from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext

def build_vector_index(docs_folder):
    context_size = 4096
    output_tokens = 512
    overlap_ratio = 0.1
    max_chunk_size = 600

    custom_prompt = """<|SYSTEM|># StableLM Assistant
    - StableLM is a helpful open-source AI developed by StabilityAI.
    - StableLM provides accurate information and refuses harmful requests.
    - StableLM can write creative content and answer questions.
    """

    input_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

    local_llm = HuggingFaceLLM(
        context_window=4096,
        max_new_tokens=256,
        generate_kwargs={"temperature": 0.7, "do_sample": False},
        system_prompt=custom_prompt,
        query_wrapper_prompt=input_prompt,
        tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
        model_name="StabilityAI/stablelm-tuned-alpha-3b",
        device_map="auto",
        stopping_ids=[50278, 50279, 50277, 1, 0],
        tokenizer_kwargs={"max_length": 4096}
    )
    
    context = ServiceContext.from_defaults(chunk_size=1024, llm=local_llm)
    docs = SimpleDirectoryReader(docs_folder).load_data()
    vector_index = VectorStoreIndex.from_documents(docs, service_context=context)
    
    return vector_index

def process_query(user_input):
    engine = vector_index.as_query_engine(streaming=True)
    result = engine.query(user_input)
    return result.get_response()

vector_index = build_vector_index("documents")

The error I’m getting is:

ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.

What am I missing to make this completely local without any OpenAI dependencies?

This happens because llama-index defaults to OpenAI embeddings even when you’re using a local LLM. You’ve got your local model set up, but the embeddings are still hitting OpenAI’s API. You need to add a local embedding model to your ServiceContext:

from llama_index.embeddings import HuggingFaceEmbedding

embedding_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2")

context = ServiceContext.from_defaults(
    chunk_size=1024, 
    llm=local_llm,
    embed_model=embedding_model
)

I ran into this same issue when switching from OpenAI to local models. The all-mpnet-base-v2 works great for document retrieval and downloads pretty fast. Just make sure you’ve got enough RAM since you’ll be running both the LLM and embedding model at once.

You’re experiencing this issue because llama-index defaults to OpenAI’s text-embedding-ada-002 for embeddings, even when using a local LLM. To resolve this, you must explicitly specify a HuggingFace embedding model. Import the HuggingFaceEmbedding and update your ServiceContext accordingly. Here’s what you need to add:

from llama_index.embeddings import HuggingFaceEmbedding

local_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

context = ServiceContext.from_defaults(
    chunk_size=1024, 
    llm=local_llm,
    embed_model=local_embed_model
)

I have utilized this setup in production successfully. The model BAAI/bge-small-en-v1.5 is efficient for most document retrieval tasks. If you require a smaller model, consider sentence-transformers/all-MiniLM-L6-v2, but note that quality may vary based on your application.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.