How to handle large documents with OpenAI API in multi-user application

Need help with document processing setup

I’m working on a web application that uses OpenAI’s API to analyze and extract information from lengthy documents. These files are quite substantial - around 40-50 pages each with roughly 40k+ words.

I’m running into two main issues:

  1. User isolation problem - When multiple users upload their documents simultaneously, the system seems to mix information between different files, leading to wrong results for users

  2. Token limitations - Given the size of these documents, I’m hitting daily token limits very fast, which makes the service unreliable

I’ve tested this with the GPT-4 model through Python API and the accuracy is good for single documents. But I’m stuck on how to properly architect this for a multi-user environment.

What’s the best approach to handle document isolation and manage token usage efficiently? Has anyone dealt with similar challenges?

Thanks for any guidance!

I’ve built similar document analysis systems and ran into these same issues. For user isolation, set up database-backed queues where each document gets a unique processing ID that follows it through your pipeline. Keeps different users’ requests from mixing up. For tokens, try a tiered approach - summarize the document first to find the important parts, then only do detailed analysis on those sections. Cut my token usage by 60-70% without losing accuracy. You can also use GPT-3.5-turbo for initial filtering, then send only the good stuff to GPT-4. Async processing with job queues saved me too. Users upload, get a processing ID, then get pinged when it’s done. No more timeouts and way better resource management when you’ve got multiple users hitting the system.

try chunking ur docs into smaller parts. process them one at a time and then combine results when done. for user isolation, give each request a unique session ID. maybe also start with cheaper models to save some tokens!

Processing docs at scale needs solid resource pooling and state management. I built something similar last year - you’ve got to sandbox each document job completely. Keep separate processing contexts per user session.

Don’t just rely on chunking to optimize tokens. Try a hybrid approach instead: use embedding models first to find the most relevant sections for your specific extraction needs. This preprocessing costs way less than dumping everything into GPT-4.

For multi-user isolation, set up proper request queuing with Redis or similar in-memory storage to track job states. Each doc gets its own worker thread with isolated memory space. Token management becomes way more predictable when you can estimate usage upfront through document analysis.

Also set up monitoring on your API usage patterns. You’ll probably find certain doc types or sections consistently burn more tokens, which helps with capacity planning.

Vector databases completely transformed how I handle document processing. I embed documents first with OpenAI’s models instead of sending full docs through the API - way cheaper. When someone searches, I grab only the relevant chunks for GPT-4 to analyze. Cuts token costs massively since you’re only processing what actually matters. For keeping users separate, I just namespace everything by user ID in the vector store. Dead simple and works perfectly. Each user’s docs stay in their own bubble with zero bleed-over. Background indexing does the heavy work after upload, so users see their doc is queued right away. Query responses are fast enough for real-time use. I’ve been running this setup for eight months with hundreds of users. No major problems.

Try streaming responses instead of processing everything at once. Prevents timeouts and shows users what’s happening. Cache common document patterns too - you’ll save a ton of tokens when people upload similar files.

I’ve hit this exact problem at scale - automation is your answer. Manual chunking gets messy fast with multiple users.

You need an automated workflow that handles everything end-to-end. Build a pipeline that auto-splits documents, processes them in isolated containers, and manages token usage across different API keys.

I built something similar with three stages: extraction and categorization first, then heavy analysis on relevant sections only, finally combining everything back together.

Isolation problems vanish when each document gets its own processing thread with unique identifiers. Token management becomes predictable - just queue requests and spread them across time or different API accounts.

Users upload and get notified when done. No waiting or refresh button spam. Set it up right and it runs itself.

Latenode handles this multi-step document processing perfectly. Build the entire pipeline visually and it manages all the API calls, queuing, and user isolation automatically.

The Problem: You’re struggling to efficiently process large documents (40-50 pages, 40k+ words) using OpenAI’s API in a multi-user web application. You’re encountering two main issues: data mixing between users’ documents and hitting daily token limits quickly. Current methods are inefficient and lack robust error handling and user feedback.

:thinking: Understanding the “Why” (The Root Cause):

The core issues stem from a lack of proper process isolation and intelligent token management. Processing large documents directly with powerful models like GPT-4 is expensive and resource-intensive. Without proper queuing and isolation, concurrent requests can lead to data contamination, resulting in incorrect analysis for users. Naive chunking alone might not solve token limitations without a comprehensive strategy for processing relevant information efficiently.

:gear: Step-by-Step Guide:

  1. Implement an Automated Workflow: This is the most crucial step. Instead of managing document processing within your main application logic, create a separate, automated workflow. This workflow will handle the entire process from document upload to result delivery, including:

    • Document Ingestion and Queuing: Each uploaded document should receive a unique ID and be added to a queue (e.g., using Redis or a similar message broker). This ensures isolation between users’ requests.
    • Preprocessing and Intelligent Chunking: Before sending the document to the OpenAI API, implement an intelligent preprocessing stage. This might involve:
      • Embedding Models: Use a less expensive embedding model (like Ada or Babbage) to create embeddings for the entire document.
      • Section Identification: Based on the embeddings, identify the most relevant sections of the document based on the user’s analysis goals. Avoid sending irrelevant information to more powerful models.
      • Chunking Based on Relevance: Divide the document into smaller, relevant chunks suitable for GPT-4 processing. This dramatically reduces token usage.
    • Asynchronous Processing: Process each chunk asynchronously to maximize throughput. Assign each document/chunk its own processing thread or worker to avoid data mixing. Consider using a task queue system like Celery to manage this efficiently.
    • GPT-4 Analysis: Send only the identified relevant chunks to the GPT-4 API. This focused approach drastically minimizes token consumption.
    • Result Aggregation: After the chunks are processed, aggregate the results to provide a complete analysis for the user.
    • Result Delivery and Notification: Once the analysis is complete, notify the user and deliver the results. This notification should include the document ID/tracking ID.
    • Error Handling and Retries: Implement robust error handling at every stage. This includes retries for API failures, handling corrupted documents, and alerting on critical errors.
  2. Choose the Right Tools: Consider using workflow automation tools like Latenode (or similar services) to simplify this complex pipeline. These tools provide visual interfaces to build pipelines and handle queue management, API interactions, error handling, and scaling much more easily than manual coding.

  3. Monitor and Optimize: Continuously monitor resource usage (token consumption, processing time) and API call patterns to further optimize your workflow. Identify patterns of documents or sections that are more costly and refine your preprocessing and chunking algorithms to address them.

:mag: Common Pitfalls & What to Check Next:

  • Queue Management: Choose a robust and scalable message queue. Incorrectly managed queues can lead to processing delays and data loss.
  • Error Handling: Implement comprehensive error handling and logging throughout the workflow. Track the frequency and type of errors to quickly diagnose and address bottlenecks or failures.
  • Token Management: Continuously monitor token consumption and adjust your processing strategy accordingly. Experiment with different embedding models and chunking algorithms to find the optimal balance between cost and accuracy.
  • Scaling: Ensure your chosen workflow automation tool and infrastructure can scale to handle increasing numbers of users and document uploads.

:speech_balloon: Still running into issues? Share your (sanitized) config files, the exact command you ran, and any other relevant details. The community is here to help!

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.