Paperless-ngx: A Paperless Office That Actually Works
Scan once, find anything, never lose a document again

I have owned three filing cabinets in my life. Each one followed the same arc: pristine and hopeful for a fortnight, then a graveyard of bank statements I will never read, slowly fossilising under a pile of takeaway menus. The promise of the “paperless office” was sold to me decades ago and never delivered, because the missing piece was never the scanner. It was knowing where anything went afterwards.
Paperless-ngx is the piece that was missing. It is the first system I have used that turns a heap of scanned PDFs into something I can actually search, and it has quietly replaced every filing cabinet, shoebox, and “important_FINAL_v2.pdf” folder I once relied on.
1 What it actually is
Paperless-ngx is a self-hosted document management system. You feed it documents — scans, PDFs, the odd email export — and it does three useful things in a row. It runs OCR over every page so the text inside is searchable. It indexes that text for full-text search. And it lets you organise everything with tags, correspondents (who sent it), and document types (invoice, payslip, warranty, that sort of thing).
The clever bit is that you rarely file anything by hand. Paperless watches a consume folder: drop a file in, walk away, and a minute later it appears in your library, OCR’d and tagged. Your scanner does the scanning, Paperless does the filing.
It is the spiritual successor to the original Paperless and Paperless-ng projects, both of which went quiet. The “-ngx” fork is the one that is actively maintained, and the one you want.
2 Standing it up
It runs in Docker, and it needs a few moving parts: the app itself, a Redis broker for its task queue, and a database. You can run on SQLite for a small single-user setup, but Postgres is the sensible default once you have more than a few hundred documents. Here is a trimmed docker-compose.yml that gets you the lot:
services:
broker:
image: redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: postgres:16
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: change-me
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBPASS: change-me
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_URL: https://paperless.example.com
PAPERLESS_CONSUMPTION_DIR: /usr/src/paperless/consume
volumes:
redisdata:
pgdata:
data:
media:Bring it up with docker compose up -d, then create your first user from the host:
docker compose run --rm webserver createsuperuserLog in on port 8000 and you have an empty, slightly intimidating library staring back at you. The PAPERLESS_OCR_LANGUAGE matters — set it to whatever your documents are actually in (eng, deu, fra, or several at once like eng+deu), because OCR quality is the foundation everything else is built on.
3 The actual workflow
Here is the loop I run every Sunday with the week’s post.
- Scan to the consume folder. My document scanner dumps PDFs straight onto a network share that maps to
./consume. No app, no manual upload. If you have not got a fancy scanner, your phone’s scan feature and a synced folder do the same job. - Paperless consumes. Within a minute, the file is OCR’d and pulled into the library. The original is preserved untouched; Paperless adds a searchable text layer.
- Auto-tagging. This is where it earns its keep. You teach Paperless matching rules on tags, correspondents, and types. A correspondent like “British Gas” can be set to match automatically whenever those words appear in a document. There is also an “auto” matching mode that learns from how you have filed things before, so the more you correct it, the less you have to.
- Search. Full-text, instant, fuzzy enough to forgive a wonky OCR character. “council tax 2023”, “boiler warranty”, “that invoice from the plumber” — all of it findable in seconds, by content, not by remembering which folder I buried it in.
The thing that genuinely changed my habits is that I stopped caring about folder structure entirely. Tags are not a hierarchy you have to plan; they are labels you can stack, and search covers everything else.
4 The honest trade-offs
It is not free in effort. The first few weeks involve correcting the matcher, and bulk-importing a backlog of old documents is a tedious afternoon you will not enjoy. OCR is CPU-hungry, so a Raspberry Pi will technically work but will chew through a big batch slowly — anything with a real processor is happier. And because it is self-hosted, your documents are now your responsibility: back up the media volume and a database dump, or you have built a single point of failure for your entire paper life. I run a nightly export to a second machine and sleep fine.
There is also a soft lock-in worry that is, on inspection, unfounded — Paperless stores your original files on disk in plain folders. If the project vanished tomorrow, your PDFs are still right there.
5 Is it worth it?
If you deal with more than a trickle of paper — anyone running a household with bills and warranties, a freelancer drowning in invoices, a small business without a fancy DMS — yes, unreservedly. The payoff is the day you need a receipt for a warranty claim and you find it in ten seconds instead of forty minutes and a strop.
If you get three letters a year and live happily out of a Gmail account, this is overkill and you should not bother. Paperless-ngx rewards people with a volume problem and a tolerance for a weekend of Docker. I am, regrettably, exactly that person, and my filing cabinets have gone to the tip. Good riddance.




