Self-Hosted AI Search: Replacing Google with Perplexica and a Local Model
An answer engine that runs on your own box and doesn't sell your queries

Searching the web in 2025 has become a chore. You type a question, scroll past a screen of ads, then past four articles that are themselves just AI-generated SEO sludge, and somewhere on the second page you find the actual answer. The cloud “answer engines” fix the experience but trade away your privacy: every query goes to someone else’s server to be logged, profiled, and monetised. I wanted the good bit — a model that reads the web and answers the question — without the surveillance. That’s where Perplexica comes in.
Perplexica is a fairly new open-source project, an answer engine you host yourself. Point it at a local language model and a self-hosted search backend, and you get a private little research assistant that reads live web results and writes you a cited answer. I’ve been running it for a few weeks, and it’s earned its place in my browser’s keyword bar.
1 What it actually is
Perplexica is not a search engine of its own and not a chatbot. It’s the glue between three things: a metasearch engine that fetches real web results, a language model that reads and synthesises them, and an embedding model that helps rank which results are actually relevant to your question.
The flow is essentially retrieval-augmented generation pointed at the live web. You ask a question; it queries the search backend; it pulls the top results; it ranks them with embeddings; it feeds the relevant chunks to the language model; and the model writes an answer with citations back to the sources. Those citations are the difference between a useful tool and a confident liar — you can click through and check, which you absolutely should.
2 The pieces you need
The search half is provided by SearXNG, a self-hosted metasearch engine that queries dozens of sources and returns results without tracking you. Perplexica leans on its JSON API. The model half can point at a cloud provider, but the whole appeal here is keeping it local, so I point Perplexica at Ollama running on the same network. One model does the chat synthesis; a small embedding model does the relevance ranking.
The cleanest way to stand it all up is Docker Compose, with Perplexica and SearXNG side by side and Ollama reachable on the host:
services:
searxng:
image: searxng/searxng:latest
ports:
- "4000:8080"
volumes:
- ./searxng:/etc/searxng
perplexica:
image: itzcrazykns1337/perplexica:main
ports:
- "3000:3000"
environment:
- SEARXNG_API_URL=http://searxng:8080
depends_on:
- searxng
extra_hosts:
- "host.docker.internal:host-gateway"The extra_hosts line matters: it lets the container reach Ollama running on the host machine. In Perplexica’s own config you then set the Ollama API base to http://host.docker.internal:11434 and choose your chat and embedding models from the dropdown.
3 Choosing the model
This is where expectations need managing. The quality of the answer is bounded by the model doing the synthesis, and a small local model is not going to match a frontier cloud model at complex reasoning. But — and this is the pleasant surprise — the task here is easier than open-ended chat. The model isn’t being asked to know things; it’s being handed relevant text and asked to summarise and cite it. That’s a job a modest local model does perfectly well.
I run a 7-to-8-billion-parameter instruction model for synthesis and a dedicated small embedding model for ranking. On a machine with a mid-range GPU, answers come back in a handful of seconds. The embedding model is doing quiet but important work: get the ranking wrong and the synthesiser is summarising irrelevant pages, so don’t skip it in favour of letting the chat model do everything.
4 The honest limitations
It is slower than Google. There’s no getting around the fact that fetching results, embedding them, ranking them and generating prose takes longer than a cached search page. For a quick “what’s the capital of Peru” lookup, it’s overkill and you’ll feel the latency.
Where it shines is the messy, multi-source question — “compare these three approaches,” “what changed between these two versions,” “summarise the current thinking on X” — the sort of query where you’d otherwise open eight tabs. There, the cited synthesis genuinely saves time. And it’s only as good as the underlying search; if SearXNG’s sources are thin on a topic, the answer is thin too. Garbage in, confident garbage out.
5 Is it worth it?
If you value your search privacy and already self-host, this is one of the more rewarding things you can stand up. The whole stack runs on hardware you own, no query ever leaves your network, and the cited-answer format is genuinely better than wading through SEO sludge for the right kind of question. It won’t replace a quick reflexive search — keep a plain SearXNG bookmark for those — but as a private research assistant for the questions that actually need thinking about, it’s excellent.
For the privacy-indifferent who just want fast answers, the cloud engines are faster and smarter and you should use them. For everyone who’d rather their curiosity wasn’t a data point in someone else’s ad model, Perplexica plus a local model is a quietly brilliant bit of kit.




