Getting help with custom RAG implementation using Mistral 7B for theater booking system

Hey everyone, I’m pretty new to Langchain and need some guidance.

I’m building an AI assistant for a movie theater app called CinemaHub. The theater has two screening rooms (Room X and Room Y) and shows movies twice daily - matinee and evening shows for specific date periods.

I want the assistant to help users with movie info, pricing, showtimes, seat availability, and age ratings. Eventually I’d like it to handle bookings and cancellations too, but right now I’m stuck on basic retrieval.

I thought about using function calling but it seems like most of that stuff costs money. Am I wrong about this?

I’m using Mistral 7B as my chat model and built this setup:

query_pipeline = pipeline(
    model=mistral_llm,
    tokenizer=token_processor,
    task="text-generation",
    temperature=0.0,
    do_sample=False,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=800,
)
query_llm = HuggingFacePipeline(pipeline=query_pipeline)

answer_pipeline = pipeline(
    model=mistral_llm,
    tokenizer=token_processor,
    task="text-generation",
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=4000,
)
answer_llm = HuggingFacePipeline(pipeline=answer_pipeline)

Then I chain everything together with memory:

DOC_TEMPLATE = PromptTemplate.from_template(template="{page_content}")

def merge_docs(documents, doc_template=DOC_TEMPLATE, separator="\n\n"):
    formatted_docs = [format_document(doc, doc_template) for doc in documents]
    return separator.join(formatted_docs)

chat_memory = ConversationBufferMemory(
    return_messages=True, output_key="response", input_key="user_query"
)

with_memory = RunnablePassthrough.assign(
    history=RunnableLambda(chat_memory.load_memory_variables) | itemgetter("history"),
)

rewrite_query = {
    "rewritten_query": {
        "user_query": lambda x: x["user_query"],
        "history": lambda x: get_buffer_string(x["history"]),
    }
    | QUERY_REWRITE_PROMPT
    | query_llm,
}

get_docs = {
    "documents": itemgetter("rewritten_query") | doc_retriever,
    "user_query": lambda x: x["rewritten_query"],
}

format_inputs = {
    "context": lambda x: merge_docs(x["documents"]),
    "user_query": itemgetter("user_query"),
}

generate_response = {
    "response": format_inputs | RESPONSE_PROMPT | answer_llm,
    "user_query": itemgetter("user_query"),
    "context": format_inputs["context"]
}

rag_chain = with_memory | rewrite_query | get_docs | generate_response

I call it with this function:

def ask_rag_system(user_question, chain, memory):
    query_input = {"user_query": user_question}
    output = chain.invoke(query_input)
    print(output)
    memory.save_context(query_input, {"response": output["response"]})
    return output

For my data, I have a JSON file with 21 movies that looks like this:

[
  {
    "title": "Inception Dreams",
    "tagline": "Reality Is Just The Beginning",
    "plot": "A mind-bending thriller about a team of specialists who enter people's dreams to steal secrets. When they're tasked with the impossible mission of planting an idea instead of stealing one, reality and dreams blur together.",
    "actors": [
      "Leonardo DiCaprio",
      "Marion Cotillard",
      "Tom Hardy"
    ],
    "category": "Sci-Fi Thriller",
    "duration": "148 minutes",
    "showDates": {
      "from": "2024-07-01",
      "to": "2024-07-15"
    },
    "matinee": {
      "startTime": "15:30",
      "regularPrice": 28,
      "discountPrice": 15
    },
    "evening": {
      "startTime": "20:30",
      "regularPrice": 38,
      "discountPrice": 20
    },
    "standardSeats": {
      "open": 180,
      "booked": 145
    },
    "accessibleSeats": {
      "open": 9,
      "booked": 6
    },
    "allSeats": {
      "open": 189,
      "booked": 151
    },
    "screeningRoom": "Room X",
    "minAge": "16+"
  }
]

I convert each movie to text like this and create documents:

def create_movie_text(movie_data):
    movie_text = f"""Movie Information: {movie_data['title']}
Tagline: {movie_data['tagline']}
Plot Summary: {movie_data['plot']}
Starring: {', '.join(movie_data['actors'])}
Genre: {movie_data['category']}
Runtime: {movie_data['duration']}
Showing from {movie_data['showDates']['from']} to {movie_data['showDates']['to']}
Screening Times:
- Matinee at {movie_data['matinee']['startTime']}: Regular tickets {movie_data['matinee']['regularPrice']}€, Discount tickets {movie_data['matinee']['discountPrice']}€
- Evening at {movie_data['evening']['startTime']}: Regular tickets {movie_data['evening']['regularPrice']}€, Discount tickets {movie_data['evening']['discountPrice']}€
Seating Availability:
Standard seats: {movie_data['standardSeats']['open']} available, {movie_data['standardSeats']['booked']} sold
Accessible seats: {movie_data['accessibleSeats']['open']} available, {movie_data['accessibleSeats']['booked']} sold
Screening in {movie_data['screeningRoom']}
Age restriction: {movie_data['minAge']}"""
    return movie_text

chunker = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-large-en-v1.5")
vector_db = Chroma.from_documents(movie_docs, embedding_model)
doc_retriever = vector_db.as_retriever(k=4)

My main issues:

  1. The retrieval doesn’t seem to find the right documents
  2. Sometimes the model gets confused with long prompts
  3. I’m not sure if I’m doing something fundamentally wrong

I’ve seen others do this with more complex data and it works fine. What am I missing here? Any help would be awesome!

I had the same retrieval problems when I started with RAG. Your setup looks good, but I think it’s your chunking strategy. With just 21 movies and chunk_size=1000, you’re probably getting each movie as one huge chunk. That confuses the embedding model when people ask about specific stuff like pricing or showtimes. Drop your chunk_size to 300-400 and add chunk_overlap of 50. This gives specific info like showtimes and pricing their own embedding space. Your k=4 retrieval might be grabbing too much irrelevant stuff - try k=2 first and see if that’s more precise. What really helped me was adding metadata to each chunk (movie title, screening room, etc.) so the retriever can filter better. The long prompt issue usually fixes itself once retrieval gets more accurate since you’re not dumping irrelevant context into the model.

You’re hitting a common wall with structured data. I ran into the same thing last year building a restaurant recommendation system.

Your query rewriting step is probably the culprit. Mistral 7B sucks at understanding what specific info to pull from movie data when rewriting queries. Skip that completely - just pass the original user question straight to retrieval.

Your movie text format’s too dense. When someone asks “what’s showing tonight?”, the embedding model has to dig through plot summaries and actor lists to find showtimes. Split each movie into separate docs:

  • Basic info (title, genre, plot)
  • Pricing and showtimes
  • Seating availability

Now “show me ticket prices” hits pricing docs directly instead of getting tangled up with plot summaries.

You’re using two different LLMs for query rewriting and answering, but they’re the same model with different configs. That’s just adding complexity for no reason. Use one pipeline with temperature=0.1.

For function calling - yeah, most hosted APIs cost money, but you can roll your own simple version. Add some if/else logic after retrieval to catch booking requests and route them to your booking system.

Try these changes and your retrieval should get way more accurate. The confusion with long prompts should clear up once you’re pulling better context.

your embedding model might be the problem. baai/bge-large-en-v1.5 is solid, but i’ve had better luck with sentence-transformers/all-minilm-l6-v2 for structured data like this. try bumping your temperature to 0.1 instead of 0.0 - the model sometimes needs a tiny bit of randomness to generate better queries. what really helped me was adding explicit metadata when creating documents: metadata={'movie_title': title, 'room': room}. then switch to mmr search instead of basic similarity.