Can I use OpenAI API to query multiple uploaded documents including PDF and Word files?

Can OpenAI handle PDF and Word document uploads?

I’m trying to upload several files to OpenAI using their file upload endpoint. I managed to get JSON lines format working, but I’m wondering if it also accepts PDF and DOCX files.

What OpenAI API should I use to query multiple uploaded documents?

I have a lot of lengthy documents that I need to process. Reading through all of them manually and then asking questions isn’t practical for my use case. I need an automated way to upload these files and then ask questions about their content.

Can someone point me to the right API endpoints or methods to accomplish this? I’d really appreciate any guidance on handling multiple file uploads and querying them effectively.

Thanks in advance!

you cant upload docx or pdf files directly - they need to be converted to jsonl format first. use pypdf2 for pdfs or python-docx for word docs to handle the conversion. after that, you can use the api to work with your data. good luck!

The Files API can upload documents, but there are limits to know about. It accepts different file formats like text files, but processing depends on what you’re doing. For document querying, check out the Assistants API - it’s got built-in file search that works great. I’ve found it helps to preprocess documents first. Extract the text content, then chunk it properly before uploading. You get way more control over how everything gets processed. RAG works really well with multiple long documents since it searches all your uploaded files for relevant info before generating responses. Just remember there are token limits and costs for processing lots of text, so factor in your budget before uploading huge document collections.

OpenAI’s Assistant API with file search is your best bet here. The API doesn’t handle PDFs or DOCX files directly, but there’s an easy workaround. I’ve done this with research papers - extract the text first and you’ll get way better results than uploading raw files. Just make sure you keep the document structure intact so the AI understands the context. Once you’ve got text files, the Assistant searches across all of them at once when answering questions. Heads up though - file search has monthly limits and gets pricey with big document collections. If you’ve got really long docs, split them into logical chunks before uploading to save money.