Part 3 of 8

RAG Pipeline — Qdrant + Document Ingestion

Make your AI answer questions from your actual documents, codebase, and knowledge base.

40 minutes

4GB+ recommended

Prerequisites

Completed Parts 1–2 (Ollama + Open WebUI), Docker, basic Python familiarity

Time to Complete

35–45 minutes

Recommended Plan

4GB ($20/mo) minimum. 8GB ($40/mo) for larger document sets

Introduction

LLMs are powerful but they hallucinate and don't know your data. RAG (Retrieval-Augmented Generation) fixes this by feeding your actual documents into the LLM's context at query time. This part builds a production-grade pipeline — not a toy demo.

💰 Cost comparison: Pinecone Serverless costs $70+/month for vector storage. Qdrant on your VPS: $0 — unlimited vectors, unlimited queries.

RAG Architecture Overview

RAG Pipeline Flow

Documents → Chunking → Embedding → Vector DB → Query → Context Injection → LLM Response

┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌─────────┐
│  Upload   │───▸│ Chunking │───▸│  Embedding   │───▸│ Qdrant  │
│  PDFs,    │    │ Split    │    │ nomic-embed  │    │ Vector  │
│  Docs,    │    │ into     │    │ via Ollama   │    │   DB    │
│  Code     │    │ segments │    │              │    │         │
└──────────┘    └──────────┘    └──────────────┘    └────┬────┘
                                                         │
┌──────────┐    ┌──────────┐    ┌──────────────┐         │
│   LLM    │◂───│ Context  │◂───│   Retrieve   │◂────────┘
│ Response │    │ Inject   │    │  Top-K docs  │
└──────────┘    └──────────┘    └──────────────┘

The embedding model (nomic-embed-text) runs locally via Ollama — no external API calls needed.

Pull the embedding model

ollama pull nomic-embed-text

Deploying Qdrant

Add Qdrant to your AI stack:

mkdir -p ~/ai-stack/qdrant && cd ~/ai-stack/qdrant

docker-compose.yml

version: "3.8"

services:
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    ports:
      - "6333:6333"
      - "6334:6334"
    environment:
      - QDRANT__SERVICE__API_KEY=your-qdrant-api-key-change-this
    volumes:
      - qdrant-data:/qdrant/storage

volumes:
  qdrant-data:

docker compose up -d

Qdrant dashboard is available at http://your-server-ip:6333/dashboard.

Why Qdrant over ChromaDB?

Feature	Qdrant	ChromaDB
Performance at scale	Excellent	Degrades
Filtering	Advanced payload filtering	Basic metadata
Production readiness	Built for production	Better for prototyping
Memory efficiency	On-disk + quantization	Primarily in-memory

Document Ingestion Pipeline

Create a Python-based ingestion script using LangChain:

Install dependencies

pip install langchain langchain-community qdrant-client pypdf2 pdfplumber

ingest.py

#!/usr/bin/env python3
"""Document ingestion pipeline for Qdrant + Ollama RAG."""

import os
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid

# Configuration
OLLAMA_URL = "http://localhost:11434"
QDRANT_URL = "http://localhost:6333"
QDRANT_API_KEY = "your-qdrant-api-key-change-this"
COLLECTION_NAME = "documents"
EMBEDDING_MODEL = "nomic-embed-text"
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50

# Initialize
embeddings = OllamaEmbeddings(
    base_url=OLLAMA_URL,
    model=EMBEDDING_MODEL
)
qdrant = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

# Create collection if it doesn't exist
collections = [c.name for c in qdrant.get_collections().collections]
if COLLECTION_NAME not in collections:
    qdrant.create_collection(
        collection_name=COLLECTION_NAME,
        vectors_config=VectorParams(size=768, distance=Distance.COSINE)
    )

# Text splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP,
    separators=["\n\n", "\n", ". ", " "]
)

def ingest_file(filepath: str):
    """Ingest a single file into the vector database."""
    ext = os.path.splitext(filepath)[1].lower()
    
    if ext == ".pdf":
        loader = PyPDFLoader(filepath)
    elif ext in [".txt", ".md", ".py", ".js", ".ts"]:
        loader = TextLoader(filepath)
    else:
        print(f"Skipping unsupported file: {filepath}")
        return
    
    docs = loader.load()
    chunks = splitter.split_documents(docs)
    
    for chunk in chunks:
        vector = embeddings.embed_query(chunk.page_content)
        qdrant.upsert(
            collection_name=COLLECTION_NAME,
            points=[PointStruct(
                id=str(uuid.uuid4()),
                vector=vector,
                payload={
                    "text": chunk.page_content,
                    "source": filepath,
                    "page": chunk.metadata.get("page", 0)
                }
            )]
        )
    
    print(f"Ingested {len(chunks)} chunks from {filepath}")

if __name__ == "__main__":
    import sys
    for path in sys.argv[1:]:
        ingest_file(path)

Usage

# Ingest a single PDF
python3 ingest.py /path/to/document.pdf

# Ingest multiple files
python3 ingest.py docs/*.pdf docs/*.md

Chunking Strategy

Chunk Size	Overlap	Best For
200–300	30	FAQ, short-form Q&A
500 (default)	50	General documents, best balance
1000–1500	200	Technical docs, code files

Connecting RAG to Open WebUI

Configure Open WebUI to use your Qdrant-based RAG pipeline:

Navigate to Admin Panel → Settings → Documents
Set the Embedding Model to nomic-embed-text
Configure the vector database connection to Qdrant
Set Top-K to 5 (retrieve 5 most relevant chunks)
Set Similarity Threshold to 0.7 (filter out weak matches)

Test by uploading a document through Open WebUI and asking questions about its content. Compare the quality of responses with and without RAG.

Optimizing Retrieval Quality

Troubleshooting Common Issues

Problem	Cause	Fix
Irrelevant results	Chunks too large	Reduce chunk size to 300–500
Missing context	Chunks too small	Increase chunk size, add overlap
Hallucination despite RAG	Low similarity threshold	Increase threshold to 0.75+
Slow retrieval	Large collection	Enable on-disk storage, add indexes

Batch Ingestion Script

A production-ready script that watches a directory for new documents:

watch-and-ingest.sh

#!/bin/bash
# Watch a directory and auto-ingest new documents
WATCH_DIR="/home/user/documents"
LOG_FILE="/var/log/rag-ingestion.log"

inotifywait -m -r -e create -e moved_to "$WATCH_DIR" |
while read dir action file; do
    filepath="$dir$file"
    echo "$(date): New file detected: $filepath" >> "$LOG_FILE"
    python3 /home/user/ai-stack/rag/ingest.py "$filepath" >> "$LOG_FILE" 2>&1
done

sudo apt install -y inotify-tools
chmod +x watch-and-ingest.sh

What's Next?

Your AI now answers from your actual documents — company wikis, technical docs, contracts, code. Zero data sent to third parties. In Part 4: AnythingLLM, we'll put this power in the hands of non-technical team members with a no-code AI app builder.

← Part 2: Open WebUI Part 4: AnythingLLM