I’m working on building a completely offline retrieval-augmented generation system using Google Colab that processes files from my Google Drive. My goal is to avoid any external API calls to language model services.
I’ve downloaded a language model to my Drive folder and set up a Chroma vector database with my documents. However, I keep running into path-related errors when trying to load the local model.
Here’s my current setup:
from google.colab import drive
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from llama_cpp import Llama
drive.mount('/content/drive')
# Model configuration
gpu_layers = 1
batch_size = 512
# Local model file path
model_file_path = "my_folder/llama-model-7b.Q4_K_M.gguf"
# Set up the Llama model
llm_instance = Llama(
model_path=model_file_path,
n_gpu_layers=gpu_layers,
n_batch=batch_size,
n_ctx=2048,
f16_kv=True,
verbose=True,
)
# Function for handling conversations
def handle_chat(user_input: str, history: list) -> tuple:
try:
buffer_memory = ConversationBufferMemory(
memory_key='chat_history',
return_messages=False
)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm_instance,
retriever=doc_retriever,
memory=buffer_memory,
get_chat_history=lambda h: h,
)
response = conversation_chain({'question': user_input, 'chat_history': history})
history.append((user_input, response['answer']))
return '', history
except Exception as error:
history.append((user_input, str(error)))
return '', history
I’m getting these errors:
Repo id must be in the form 'repo_name' or 'namespace/repo_name'
and
OSError: Incorrect path_or_model_id. Please provide either the path to a local folder or the repo_id of a model on the Hub.
I also attempted using AutoModelForCausalLM with a repo_type parameter set to “local”, but encountered the same issues. When I switch to downloading models from Hugging Face Hub directly, everything works fine.
Is it actually feasible to run a completely local RAG system within Google Drive storage? What’s the correct way to reference local model files?