Loki: Log Aggregation for People Who Can't Afford Splunk

Grep-able logs from every box, indexed by labels instead of by every word

There are two kinds of homelabber: the ones who SSH into each box and run journalctl when something breaks, and the ones who got tired of doing that around the fourth machine. I crossed that line a while ago. Once you have a handful of hosts and a stack of Docker containers, “which log, on which box, from which container?” becomes a small archaeological dig every single time, usually conducted in a hurry while something is on fire.

The grown-up answer to this is log aggregation: ship every log line to one place, and search them all at once. The grown-up price for that has historically been Splunk, which is superb and costs roughly the GDP of a small island once your volume gets serious. Loki, from the Grafana people, is the answer for the rest of us.

Advertisement

The reason traditional log systems are expensive is that they index everything. Every word in every line gets put into a full-text index so you can search for it later, and that index is enormous — often bigger than the logs themselves. It’s powerful and it’s why your wallet hurts.

Loki does something deliberately different and a bit cheeky: it doesn’t index the log content at all. It only indexes a small set of labels — things like which host, which container, which job — exactly the way Prometheus indexes metrics. The actual log text is just compressed and dumped into chunks of object storage. When you search, you first narrow down by labels to a small set of streams, then Loki brute-force greps through only those chunks.

The trade-off is explicit: cheap storage and cheap ingestion, in exchange for searches that are fast if you label well and slow if you ask it to grep across everything. For a homelab, where “everything” is modest, this is a brilliant bargain.

A Loki setup has three moving parts, and it helps to name them:

  • Loki itself — the server that stores chunks and answers queries.
  • Promtail (or, increasingly, the Grafana Alloy agent) — the thing that runs on each host, tails log files, attaches labels, and ships lines to Loki.
  • Grafana — the same Grafana you already run for metrics, which gets a “Logs” view and a query language called LogQL.

That last point is the killer feature: logs and metrics live in the same Grafana, so you can spot a spike on a graph and pivot straight to the log lines from that exact minute. No context switch, no second tool.

Loki ships sensible single-binary defaults now, so you don’t need to understand its internal microservices. Here’s a compose file that runs Loki and a Promtail that scrapes Docker container logs:

services:
  loki:
    image: grafana/loki:latest
    command: -config.file=/etc/loki/config.yml
    volumes:
      - ./loki-config.yml:/etc/loki/config.yml
      - loki_data:/loki
    ports:
      - "3100:3100"

  promtail:
    image: grafana/promtail:latest
    command: -config.file=/etc/promtail/config.yml
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro

volumes:
  loki_data:

Promtail’s config is where you decide what labels exist — and this is the part that actually matters, because your labels are your whole search index:

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'

In Grafana you add Loki as a data source and start querying. LogQL looks pleasantly like a hybrid of grep and PromQL. A query to find errors from a specific container in the last hour:

{container="caddy"} |= "error" | json | status >= 500

That {container="caddy"} part picks the stream by label; |= "error" greps; the rest parses JSON and filters. You can even turn logs into metrics on the fly — count_over_time to graph error rates — which is genuinely magic the first time you do it.

The one rule you must internalise: never put high-cardinality values in labels. Putting a user ID, a request ID, or a timestamp into a label will explode the number of streams and bring Loki to its knees — it’s the single most common way people wreck their installation. Labels are for the handful of dimensions you slice by; everything else stays in the log line, where the grep can find it.

Loki is not Splunk, and it’s honest about that. If your daily job is needle-in-a-haystack full-text search across terabytes with no idea which service produced the line, Loki’s brute-force model will feel slow and you’d genuinely be better served by something with a full index. Its query language, while improving, still has rough edges. And early Loki setups had a reputation for fiddly configuration that scared people off — that’s much better now, but the folklore lingers.

For a self-hoster who already runs Grafana and Prometheus, Loki is close to a no-brainer. It’s cheap to run, it puts every log from every box behind one search bar, and it lives in the dashboard you already have open. The discipline it demands — keep labels low-cardinality — is the same discipline that keeps Prometheus healthy, so you probably already have the right instincts. I added it to my stack expecting a weekend of pain and got a working “search all my logs” box by lunchtime. The first time an outage took two minutes to diagnose instead of twenty, it had paid for itself.

Advertisement

Related Content

Advertisement
Smarc
Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.