Semantic Search on Your Own Documents: Embeddings, Vector DBs, and Practical Limits

How to search by meaning, and where it quietly lets you down

Keyword search has a glaring weakness: it only finds documents containing the words you typed. Search your notes for “how to back up the database” and you’ll miss the page titled “nightly Postgres dump cron,” because it shares not a single word with your query. Semantic search fixes this by matching on meaning rather than spelling, and you can run the whole thing on your own hardware over your own documents. I did exactly that for a few thousand markdown notes, and it’s genuinely changed how I find things. It has also taught me, painfully, where the approach breaks.

Advertisement

The trick is embeddings: a model converts each chunk of text into a vector — a list of, typically, a few hundred numbers — positioned so that texts with similar meaning land near each other in that high-dimensional space. “Back up the database” and “nightly Postgres dump” produce vectors that are close together even with zero shared words. To search, you embed the query the same way and find the nearest document vectors. That’s the entire concept. Everything else is plumbing.

You don’t need an API for the embedding model. Small sentence-transformer models run fine on a CPU:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
docs = ["Nightly Postgres dump via cron", "Annual leave policy", "ZFS scrub schedule"]
embeddings = model.encode(docs)   # shape: (3, 384)
print(embeddings.shape)

all-MiniLM-L6-v2 is the workhorse: 384 dimensions, fast, and good enough for most homelab-scale corpora. Larger models like bge-base give better results at a speed cost.

For a few thousand documents you genuinely don’t need a database — a NumPy array and a cosine-similarity computation is instant. But once you’re past tens of thousands, or want metadata filtering and persistence, a proper vector database earns its place. I run Qdrant, which is a single Rust binary, self-hostable, and refreshingly free of ceremony.

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - ./qdrant-data:/qdrant/storage
    restart: unless-stopped

Inserting and querying is a few lines through its client:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

client = QdrantClient(url="http://localhost:6333")
client.recreate_collection(
    "notes",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)
client.upsert("notes", points=[
    PointStruct(id=i, vector=v.tolist(), payload={"text": d})
    for i, (v, d) in enumerate(zip(embeddings, docs))
])

hits = client.search("notes", query_vector=model.encode("how do I back up the db").tolist(), limit=3)
for h in hits:
    print(round(h.score, 3), h.payload["text"])
0.612 Nightly Postgres dump via cron
0.241 ZFS scrub schedule
0.118 Annual leave policy

Note the top hit shares no words with the query. That’s the whole payoff.

You don’t embed whole documents — you embed chunks, because a single vector can only represent so much meaning before it turns to mush. A 4,000-word page averaged into one vector retrieves badly: the signal you wanted is diluted by everything else on the page. So you split documents into passages of a few hundred tokens, often with a little overlap so a sentence isn’t cut mid-thought, and embed each one. Get chunking wrong and your search quality collapses regardless of how good the model is. This is the single biggest lever, and it’s almost never the model’s fault when results are poor — it’s the chunks.

Now the honest part, because the demos never show this.

Semantic search is bad at exact matches. Search for an error code, a part number, or a specific function name, and embeddings will cheerfully return things that are thematically similar while missing the literal string you needed. The fix is hybrid search: run a keyword (BM25) search alongside the vector search and merge the results. Qdrant and friends support this, but it’s more plumbing, and most “just use embeddings” tutorials skip it. For anything with identifiers, names, or codes, pure semantic search will betray you.

It’s also only as good as the embedding model’s training. A general model knows little about your domain’s jargon, so two terms that are synonyms in your world may sit far apart in vector space. And there’s no notion of recency or authority — the most semantically similar chunk might be an outdated note you wrote three years ago. Embeddings have no idea which document is correct, only which is similar.

Finally, it can feel slightly opaque. When keyword search misses, you know why. When semantic search returns something baffling, debugging means staring at cosine scores and second-guessing your chunk boundaries.

For searching a personal knowledge base — notes, documentation, a wiki, anything where you remember the gist but not the words — self-hosted semantic search is a genuine upgrade, and the whole stack runs on a modest machine with no cloud dependency. Build it if your problem is “I can never find the note I know I wrote.” But go in clear-eyed: combine it with keyword search for anything involving exact identifiers, sweat the chunking, and don’t expect it to know which answer is right — only which is related. Treat it as a better way to find documents, not a source of truth, and it’ll serve you well for years.

Advertisement

Related Content

Advertisement
Smarc
Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.