token
For your issue:
Small file → answer comes
Large file → no answer
The solution is usually do not send the whole file to Ollama.
Use this approach:
Large File
↓
Extract text
↓
Split into chunks
(500–1000 tokens)
↓
Create embeddings
↓
Store in Vector DB
(FAISS / Chroma)
↓
Retrieve only relevant chunks
↓
Send to Ollama
↓
Get answer
If you want a quick fix (without redesign)
Try these:
1. Increase context size
Example:
OLLAMA_CONTEXT_LENGTH=8192 ollama run llama3
Or use models with larger context.
But this is only temporary.
---
2. Reduce input size
Instead of:
Send 200 page PDF
Do:
Page 1–20
Page 21–40
Page 41–60
Process separately.
---
3. Add chunking in code
Example logic:
text = extract_pdf()
chunk_size = 1000
chunks = split(text)
for chunk in chunks:
send_to_ollama(chunk)
---
4. Use RAG (recommended for company/intranet AI)
Stack:
Ollama
LangChain
FAISS or ChromaDB
Embedding model (nomic-embed-text)
Install example:
ollama pull llama3
ollama pull nomic-embed-text
pip install langchain chromadb
---
5. Check RAM
If using:
7B → preferably 8–16 GB RAM
13B → 16–32 GB
Bigger models need more
Check:
Windows → Task Manager → Memory
If memory hits 100%, large files may fail.
---
Industry recommendation
For internal enterprise AI on intranet:
User Upload
↓
Parser
↓
Chunking
↓
Embedding
↓
Vector DB
↓
Retriever
↓
Ollama
Do not use:
PDF → Ollama directly
Use:
PDF → RAG → Ollama
That is the actual solution.
If you tell me file type (PDF/DOCX), file size, and Ollama model name, I can point to the exact bottleneck.
Comments
Post a Comment