Self-Hosted Paperless-ngx + Optional Local AI: Private Documents and Improves OCR & Search (Full Setup)
Build a complete Paperless-ngx stack in Docker and take control of your documents. We’ll get Paperless running first (works great on its own), then optionally add local AI with Ollama + Open WebUI and upgrade OCR using Paperless-GPT and Paperless-AI for more accurate, searchable text and tags - no cloud required.
This post walks through a complete, repeatable Docker Compose for Paperless-ngx, plus an optional local AI layer that runs entirely on your own hardware.
Paperless-ngx works great without AI. The AI pieces are optional add-ons you can easily disable by commenting out those parts in the stack.
What Paperless-ngx Is
Paperless-ngx is a self-hosted document inbox that:
- Ingests scans and PDFs (via upload, a “consume” folder, or even apps)
- Runs OCR so documents become searchable
- Organizes using tags, correspondents, and document types
- Lets you search your archive like a personal document index
If you want to keep personal paperwork (tax docs, medical, contracts, receipts) under your control, Paperless is the way to go.
What We’re Building
This stack is intentionally “batteries included” and breaks up parts of the backend into their own services, yet still easy to run.
Core services
- Paperless-ngx – the document scanning and index system
- Postgres – database backend
- Redis – queue/cache
- Gotenberg – document conversion (Office/Excel → PDF, etc.)
- Tika – text extraction helpers
Optional local AI services
- Ollama – local model runtime
- Open WebUI – model management + testing UI + chat
- Paperless-AI – metadata suggestions (tags/titles/etc.)
- Paperless-GPT – vision-model OCR + metadata suggestions
Optional utility
- Dozzle – lightweight log viewer, highly recommended for container troubelshooting
Document flow
Data flow (no AI):
Documents → Paperless-ngx → OCR/index/search
Data flow (optional AI):
Paperless-ngx ↔ (Paperless-AI / Paperless-GPT) ↔ Ollama
- Paperless stores and indexes your docs.
- Ollama runs the LLMs locally.
- Paperless-AI / Paperless-GPT are add-ons that call Ollama and write results back to Paperless.
- AI can enhance correspondents, tags, and text recognition (OCR)
Prerequisites
- A Linux server (VM, mini PC, NAS — really anything that can run Docker)
- Docker + Docker Compose
- Optional but recommended for vision OCR: (an NVIDIA GPU)
- If using an NVIDIA GPU, be sure you have the proper drivers and NVIDIA Container Toolkit installed
- You can run Docker with a GPU other than NVIDIA, though some extra configuration is needed
Ports Used in This Guide
This is the exact port map used in the stack below:
| Service | Port | URL |
|---|---|---|
| Paperless-ngx | 8000 | http://<server-ip>:8000 |
| Paperless-AI | 3000 | http://<server-ip>:3000 |
| Open WebUI | 3001 | http://<server-ip>:3001 |
| Paperless-GPT | 3002 | http://<server-ip>:3002 |
| Dozzle (logs) | 8080 | http://<server-ip>:8080 |
Docker Compose Stack
Everything runs from a single Compose file. I keep a separate .env per service so secrets stay scoped to only services that need them.
I’m not going to paste every .env file inline here (it gets long fast). Instead, the repository includes copy/paste-ready .env files for each service.
Complete configuration + files: https://github.com/timothystewart6/paperless-stack
Compose file
Create a compose.yaml file in a folder like paperless-stack and place the contents of the compose file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
services:
# paperless-ngx main service
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless-ngx
restart: unless-stopped
env_file:
- ./paperless/.env
depends_on:
- postgres
- redis
- gotenberg
- tika
ports:
- "8000:8000"
volumes:
- ./paperless/data:/usr/src/paperless/data
- ./paperless/media:/usr/src/paperless/media
- ./paperless/export:/usr/src/paperless/export
- ./paperless/consume:/usr/src/paperless/consume
# postgres database for paperless-ngx
postgres:
image: postgres:18
restart: unless-stopped
container_name: postgres
env_file:
- ./postgres/.env
volumes:
- ./postgres/data:/var/lib/postgresql
# redis database for paperless-ngx
redis:
image: docker.io/library/redis:8
container_name: redis
restart: unless-stopped
env_file:
- ./redis/.env
volumes:
- ./redis/data:/data
# gotenberg service that paperless uses for document conversion
gotenberg:
image: docker.io/gotenberg/gotenberg:8.25
container_name: gotenberg
env_file:
- ./gotenberg/.env
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
# tika service that paperless uses for document text extraction
tika:
image: docker.io/apache/tika:latest
container_name: tika
restart: unless-stopped
env_file: ./tika/.env
# open-webui service for LLM interaction
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
env_file:
- ./open-webui/.env
depends_on:
- ollama
ports:
- "3001:8080"
volumes:
- ./open-webui/data:/app/backend/data
# ollama service for local LLMs
ollama:
image: ollama/ollama:latest
container_name: ollama
env_file:
- ./ollama/.env
volumes:
- ./ollama/data/:/root/.ollama
- ./ollama/models:/ollama-models
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# paperless-ai service
paperless-ai:
image: clusterzx/paperless-ai:latest
container_name: paperless-ai
restart: unless-stopped
depends_on:
- ollama
- paperless
ports:
- "3000:3000"
env_file:
- ./paperless-ai/.env
volumes:
- ./paperless-ai/data:/app/data
# paperless-gpt service
paperless-gpt:
image: icereed/paperless-gpt:latest
container_name: paperless-gpt
restart: unless-stopped
depends_on:
- ollama
- paperless
ports:
- "3002:8080"
env_file:
- ./paperless-gpt/.env
volumes:
- ./paperless-gpt/prompts:/app/prompts
# optional but helpful log viewer
dozzle:
image: amir20/dozzle:latest
restart: unless-stopped
container_name: dozzle
env_file:
- ./dozzle/.env
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./dozzle/data:/data
ports:
- "8080:8080"
Without local AI
If you don’t want your stack to use local AI, comment out or remove:
open-webuiollamapaperless-aipaperless-gpt
Create the folder structure
If you’re using bind mounts (like this stack does), it’s nice to create the folder structure up front so Docker doesn’t create directories as root.
1
2
3
4
5
6
7
8
9
10
11
mkdir -p \
paperless/{data,media,export,consume} \
postgres/data \
redis/data \
gotenberg \
tika \
paperless-ai/data \
open-webui/data \
ollama/{data,models} \
paperless-gpt/prompts \
dozzle/data
Project layout
Here’s the folder/file layout used in this guide, with the project root named paperless-stack:
paperless-stack/
├── docker-compose.yml
├── dozzle/
│ ├── .env
│ └── data/
├── gotenberg/
│ ├── .env
├── ollama/
│ ├── .env
│ ├── data/
│ └── models/
├── open-webui/
│ ├── .env
│ └── data/
├── paperless/
│ ├── .env
│ ├── consume/
│ ├── data/
│ ├── export/
│ └── media/
├── paperless-ai/
│ ├── .env
│ └── data/
├── paperless-gpt/
│ ├── .env
│ └── prompts/
├── postgres/
│ ├── .env
│ └── data/
├── redis/
│ ├── .env
│ └── data/
└── tika/
└── .env
Step 1 — Start Everything
Inside of the folder with your compose file:
1
docker compose up -d
If you want to watch logs while services come up:
1
docker compose logs -f
Or open Dozzle at http://<server-ip>:8080.
Step 2 — Set Up Paperless-ngx
Open Paperless:
http://<server-ip>:8000
Create your admin account and sign in.
At this point, Paperless-ngx is fully functional without any AI. You can upload documents and start building your library.
Step 3 — Optional: Local AI Backend (Ollama + Open WebUI)
If you want local AI features, start by confirming Ollama is running and then use Open WebUI to pull a model.
Open WebUI:
http://<server-ip>:3001
Pull a model
In Open WebUI:
- Admin Panel → Settings → Models
A good lightweight model to start with:
llama3.2:3b
Once it’s downloaded, do a quick chat test to confirm it responds.
If you’re using a GPU other than NVIDIA, you’ll need to update your .env and docker compose accordingly.
Step 4 — Optional: Paperless-AI (Tagging/Titles/Metadata)
Paperless-AI needs an API token from Paperless.
Create a Paperless API token
In Paperless:
- Profile → API Tokens → Generate
Copy the token and place it in your paperless-ai/.env.
Then restart:
1
docker compose restart
Open Paperless-AI:
http://<server-ip>:3000
Create an account and confirm it can connect to Paperless.
Step 5 — Optional: Paperless-GPT (Vision OCR + Metadata)
This is the piece that made the biggest difference for me. OCR is night and day.
Paperless-GPT can run OCR using a vision-capable LLM (via Ollama), which can be dramatically more accurate on certain scans than traditional OCR.
Be sure a vision model like minicpm-v:8b is set in your .env
You will also need to download this model for Ollama using Open WebUI
http://<server-ip>:3001- In Admin Panel → Settings → Models, pull down model
minicpm-v:8b
Open Paperless-GPT:
http://<server-ip>:3002
How Paperless-GPT finds documents
Paperless-GPT watches for tags. Add a tag like paperless-gpt-ocr-auto (or your configured tag) to a document in Paperless, and it will show up in the Paperless-GPT UI.
Manual OCR test
- Open a document in Paperless
- Copy the document ID from the URL
- In Paperless-GPT, go to OCR and run a job using that ID
- Save the result back to the document
Now check the Paperless Content tab — the extracted text should be far more accurate and actually searchable.
Automate OCR with Workflows
If you want OCR to happen automatically, create a Paperless workflow that applies tags on upload.
Example workflow:
- Trigger: On document added
Action: Add tags:
paperless-gpt-autopaperless-gpt-ocr-auto
Once those tags are applied, Paperless-GPT can automatically process new documents. This will process both tags and content (OCR)
Security Note: Remote Access
Paperless contains personal documents. I recommend using a VPN for remote access rather than exposing this directly to the internet. The same goes for all other services in this stack.
Troubleshooting
A few common checks:
- It doesn’t work: open Dozzle at
http://<server-ip>:8080and read the logs. - Paperless won’t load:
docker compose psand check logs forpaperless - AI add-ons can’t connect: verify URLs are using Compose service names (like
http://paperless:8000) when running inside Docker - Models are slow: start with a small model, and consider GPU acceleration for vision OCR
- Nothing shows in Paperless-GPT: confirm the correct tag is applied to a document in Paperless
Summary
With this you now have:
- A full self-hosted document archive with OCR + search
- A repeatable Docker Compose deployment
- Optional local AI features that doesn’t depend on the cloud
- Vision-model OCR (via Paperless-GPT) for improved text extraction on tough scans
Join the conversation
Self-hosting just keeps getting better! I finally took the time to set up PaperlessNGX. Local AI really took it to the next level with improved OCR. https://t.co/STQr6UgglH
— Techno Tim (@TechnoTimLive) January 27, 2026
🤝 Support the channel and help keep this site ad-free
🛍️ Check out all merch: https://shop.technotim.com/
⚙️ See all the hardware I recommend at https://l.technotim.com/gear
