AI & Machine Learning on vo.rs

Fine-Tuning vs Prompting vs RAG: Picking the Right Tool Without Wasting GPU Hours

Tue, 09 Jun 2026 12:00:00 +0000

I once watched someone spend the better part of a weekend, and roughly £40 of rented A100 time, fine-tuning a 7B model to “answer questions about our internal wiki.” The result was a model that confidently invented policy numbers that had never existed. The fix took twenty minutes: paste the relevant wiki page into the prompt. That was a retrieval problem the whole time, and no amount of training was ever going to solve it. It is the single most common and most expensive mistake I see people make with local LLMs, and it comes from a simple confusion about what each tool actually does.

Talking to Your Documents: A Practical RAG Pipeline with Open-Source Tools

Wed, 27 May 2026 09:00:00 +0000

There is a particular kind of frustration in knowing that the answer you need is somewhere in a forty-page PDF, and that finding it means reading all forty pages. Retrieval-Augmented Generation turns that pile of documents into something you can simply talk to. Ask a question in plain English, and the system finds the relevant passages and answers from them. The very best part is that you can build a working version yourself, on your own machine, using only open-source tools and a modest Python script. This guide walks through exactly that — a small but complete RAG pipeline that lets you interrogate your own documents.

Prompt Injection: The SQL Injection of the AI Era

Wed, 20 May 2026 10:30:00 +0000

Every generation of software gets the vulnerability it deserves. The web era handed us SQL injection, a flaw so persistent it still tops vulnerability lists decades after the fix was well understood. The large language model era has produced its own signature weakness, and it rhymes almost perfectly with the old one. It is called prompt injection, and if you are building anything that lets a model read untrusted text, you need to understand it.

What Is Agentic AI, and Why Is Everyone Suddenly Talking About It?

Wed, 13 May 2026 16:30:00 +0000

If you have spent any time near the technology press recently, you will have noticed that the word “agentic” has quietly taken over. Where last year everyone wanted a chatbot, this year everyone wants an agent. The shift is real, but the hype has run well ahead of the substance, and it is worth slowing down to ask what agentic AI actually means, what it can genuinely do today, and where the marketing outpaces reality.

Your First Local AI Coding Assistant: Wiring Ollama into Your Editor

Wed, 29 Apr 2026 14:30:00 +0000

Cloud coding assistants are wonderful right up until you remember where your code is going. Every keystroke, every half-finished function, every comment grumbling about a colleague’s API design is shipped off to someone else’s server. For a side project that scarcely matters; for proprietary code under a strict NDA it can be a genuine problem. The good news is that you can run a capable coding assistant entirely on your own machine, with no network round-trips and no data leaving the building. If you have already met Ollama in our introductory piece, this guide takes the next step: wiring a local model directly into your editor so it suggests code as you type.

When Your AI Agent Goes Rogue: Securing Autonomous Agents in Production

Tue, 21 Apr 2026 12:00:00 +0000

A chatbot answers a question and goes quiet. An agent reads the question, decides on a plan, calls a few tools, checks the result, and tries again until it considers the job done. That loop is enormously useful, and it is also exactly why a misbehaving agent can do real damage before anyone notices. When software can act on its own, securing it stops being a matter of sanitising inputs and becomes a question of bounding behaviour.

RAG Explained: How AI Stops Making Things Up

Tue, 07 Apr 2026 11:30:00 +0000

Imagine a brilliant colleague who has read most of the internet, speaks with unshakeable confidence, and occasionally invents a fact so smoothly that you only catch it because you happen to know the truth. That is a large language model on a bad day. It is not lying, exactly; it simply does not know what it does not know. Retrieval-Augmented Generation, or RAG, is the technique that hands that colleague a library card and a quiet instruction: before you answer, go and look it up. The result is an AI that grounds its words in real documents rather than in the foggy recollections of its training data.

What Is an AI Agent, and Should You Trust It with Your Inbox?

Sat, 28 Mar 2026 09:30:00 +0000

“AI agent” is the phrase of the moment, and like most phrases of the moment it is doing a lot of work for a term few people can define. The simplest way to understand it is by contrast: a chatbot talks, an agent acts. One answers your question; the other goes off and tries to get the job done. That difference sounds small and turns out to be enormous, especially once the job in question is something as personal and consequential as managing your email. Let us unpack what an agent really is, and then ask the question in the title properly.

What Is a Token, Really? How LLMs Read, Reason, and Bill You

Fri, 13 Mar 2026 16:00:00 +0000

Every conversation you have with a language model is quietly measured, chopped, and counted in a unit you almost never see. It is not the word, nor quite the letter. It is the token: the atom of AI text, the thing the model actually reads, the thing your bill is calculated from, and the reason your carefully crafted prompt sometimes behaves in ways that feel slightly arbitrary. Understand tokens and a great deal about how these systems read, reason, and charge suddenly clicks into place.

Local AI on Your Own Metal: Running LLMs Offline with Ollama

Tue, 24 Feb 2026 11:00:00 +0000

Not so long ago the idea of a capable language model running on the computer under your desk, with no internet connection and no monthly bill, sounded faintly absurd. The assumption baked into the whole industry was that the clever part lived in someone else’s datacentre, reachable only through an API and a credit card. That assumption no longer holds. A tool called Ollama has made running open-weight language models on your own hardware about as difficult as installing a music player. This guide shows you how to do it, what to expect from the machine you already own, and where the honest limits lie.

Label Studio: Self-Hosted Data Annotation for Training Your Own Models

Thu, 25 Dec 2025 10:00:00 +0000

There’s a comforting lie in machine learning circles that the model is the hard part. It isn’t. The model is the bit with the nice papers and the GitHub stars. The hard part — the part that determines whether your classifier works or quietly humiliates you in production — is the labels. Garbage labels, garbage model, no exceptions. And labelling is tedious, error-prone, and almost always done in some horror of a spreadsheet that loses your work when the browser crashes.

Semantic Search on Your Own Documents: Embeddings, Vector DBs, and Practical Limits

Sat, 28 Jun 2025 10:00:00 +0000

I once spent twenty minutes hunting for a note I knew I had written about backing up a database. I searched “how to back up the database” and got nothing, because the note was titled “nightly Postgres dump cron” and shared not a single word with my query. Keyword search has that glaring weakness baked in: it only finds documents that literally contain the words you typed. Semantic search fixes this by matching on meaning rather than spelling, and — this is the part that surprised me — you can run the whole stack on your own hardware over your own documents, with no cloud API in sight. I built exactly that over a few thousand markdown notes on a modest home server, and it has genuinely changed how I find things. It has also taught me, painfully and repeatedly, where the approach quietly breaks.

Running Gemma 3 Locally: Google's Small Model on Consumer Hardware

Wed, 25 Jun 2025 11:00:00 +0000

Every few months a new open-weights model lands and the homelab forums fill with breathless claims that this one finally dethrones the cloud. Most of the time it’s hype. Gemma 3, which Google released on 12 March 2025, is one of the rare cases where the claims are roughly fair — not because it beats the frontier models, but because it’s the first small model I’ve run that I actually leave switched on.

AI-Powered Git Commit Messages: Useful or Just Annoying

Sat, 14 Jun 2025 14:00:00 +0000

There’s a particular flavour of laziness that git commit messages bring out in people. You’ve just spent an hour on a fiddly change, the work is done, and now a text editor opens demanding you explain yourself. So you type “fix stuff” and move on, and three months later you’re spelunking through git log cursing your past self. We’ve all done it, and we’ve all been on the receiving end of someone else’s — the commit that touches forty files and explains itself with a single shrug of a word. The promise of offloading that drudgery to a model is genuinely appealing precisely because the task is one humans reliably do badly when tired. The pitch for AI commit messages is simple: feed the staged diff to a model, get back a tidy conventional-commit summary, accept it, done. I’ve been running this on my own repos for a while. It’s genuinely useful and quietly dangerous, and which one depends entirely on how you wire it up.

Open WebUI Pipelines: Chaining Local Models with Tools and RAG

Thu, 05 Jun 2025 16:00:00 +0000

Open WebUI is the front end most people slap in front of Ollama and call it a day — a tidy ChatGPT-alike that talks to local models. That’s fine right up until you want the model to do something: hit your internal docs, call an API, run a query, or chain a couple of models together. I hit that wall the week I wanted my local assistant to answer from a folder of PDFs instead of making things up. The built-in RAG and function features cover some of this, but the real escape hatch is Pipelines: a mechanism that lets you insert arbitrary Python into the request flow. It’s the difference between “chat with a model” and “wire a model into your systems.”

Stable Diffusion Workflows: Turning ComfyUI into an Image API

Mon, 02 Jun 2025 08:00:00 +0000

ComfyUI is usually sold as a node editor — a sprawling graph of boxes and wires you drag around to build a Stable Diffusion pipeline. That’s how most people meet it, and it’s genuinely the most flexible front end for local image generation. But the canvas is the boring part. The interesting part is that everything you build on it is just a JSON document, and ComfyUI happily executes that JSON over an HTTP API. Once you realise that, ComfyUI stops being a toy you click and becomes an image-generation service you can call from anything — a cron job, a build pipeline, a webhook handler.

Building Psychological Safety in DevOps: Lessons from Flight Decks and Firefighting

Mon, 26 May 2025 16:30:00 +0000

On 28 December 1978, United Airlines Flight 173 ran out of fuel and crashed into a Portland suburb while the crew was preoccupied with a landing-gear light. The flight engineer knew the fuel was critically low. He mentioned it. He did not push, because the captain outranked him and the cockpit culture of the era did not encourage a junior officer to insist. Ten people died within sight of the runway. That accident, more than any management theory, is why modern aviation drilled into itself a discipline called crew resource management — the deliberate construction of a cockpit where the most junior person can say “we are about to crash” and be heard.

AI-Driven Incident Response: Can Machine Learning Beat Human Intuition?

Thu, 22 May 2025 12:45:00 +0000

At 03:14 one night my monitoring stack lit up: a burst of failed SSH logins against a box that has no business accepting SSH from outside, followed by one success. A correlation rule had already fired, opened a ticket, and — because I’d wired it that way — pulled the host’s recent auth log into the alert. The “AI” part flagged it as a credential-stuffing pattern with 0.91 confidence. The human part (me, bleary, on the sofa) took one look and realised it was my own laptop on a flaky VPN reconnecting forty times before the password manager filled in correctly. The model was right about the pattern and wrong about the world. That gap is the whole subject of this post: machine learning is genuinely good at parts of incident response and confidently useless at others, and the engineering question is where you draw the line.

The Hidden Compliance Risks in Generative AI—and How to Mitigate Them

Fri, 21 Mar 2025 11:15:00 +0000

In June 2024 the Italian data-protection authority, the Garante, told OpenAI it had breached the GDPR by training ChatGPT on personal data without a valid legal basis and without telling anyone. The fine, confirmed in December 2024, was €15 million. What makes that number worth remembering is not its size — it is small by the standards of the regulation, which tops out at 4% of global annual turnover — but how ordinary the underlying mistake was. Nobody set out to break the law. They scraped the web, trained a model, and shipped it, exactly the way most teams now bolt a generative feature onto a product. The compliance risk did not arrive as a dramatic event. It was baked in from the first training run and nobody noticed until a regulator did.

Self-Hosted AI Search: Replacing Google with Perplexica and a Local Model

Wed, 19 Mar 2025 09:00:00 +0000

Searching the web has become a chore. You type a question, scroll past a screen of ads, then past four articles that are themselves just AI-generated SEO sludge, and somewhere on the second page you find the actual answer — if it’s there at all. The cloud “answer engines” fix the experience but trade away your privacy: every query goes to someone else’s server to be logged, profiled, and monetised, which for the kind of questions I actually search (health, finances, half-formed project ideas) is not a trade I want to make. I wanted the good bit — a model that reads the web and answers the question with sources — without the surveillance. That’s where Perplexica comes in.

MCP Servers: Giving Language Models Hands and Eyes

Tue, 18 Feb 2025 09:00:00 +0000

A language model on its own is a brain in a jar. It can reason, summarise, and write you a sonnet about your firewall rules, but it cannot read a file, query a database, or check whether your website is actually up. It only knows what was in its training data and whatever you paste into the chat window. The first time I asked a local model to “check my backups ran last night” and it confidently invented a plausible-sounding answer with fabricated timestamps, the problem crystallised: the gap between knowing things and doing things is the most interesting problem in applied AI right now, and the Model Context Protocol — MCP — is the most sensible attempt I’ve seen at closing it.

Running AI Inference on Kubernetes: GPU Scheduling, Ollama, and Resource Sharing

Thu, 09 Jan 2025 09:00:00 +0000

Kubernetes was designed for a world of stateless web services you could scale by adding more identical replicas. GPUs are the opposite of that: scarce, expensive, and absolutely not interchangeable with CPU. So the moment you decide to run model inference on your cluster, you discover that Kubernetes treats your graphics card as a curious unknown — it doesn’t schedule on it, it can’t see it, and your pods come up GPU-less and confused.

Local LLMs: A Practical Comparison of Llama, Mistral, and Gemma for Real Work

Tue, 24 Sep 2024 09:00:00 +0000

There is a particular flavour of disappointment unique to running a local LLM for the first time. You’ve read the benchmarks, you’ve stared at the leaderboard, you spin a model up on your own GPU, ask it something real — and it produces confidently structured nonsense. Then you try a different model and it nails the same task on the first go. The benchmarks didn’t lie, exactly. They just don’t tell you which model is good at your work, on your prompts, in your format. I’ve spent well over a year now using all three of the big open families as everyday tools rather than toys, and this is what I’ve learned about Llama, Mistral and Gemma once the novelty has thoroughly worn off.

LoRA Fine-Tuning on Consumer Hardware: Adding Skills to a Model Without Retraining It

Tue, 16 Jul 2024 09:00:00 +0000

“Fine-tuning” used to be a word that came with a server room attached. Retraining a multi-billion-parameter model meant a rack of data-centre GPUs, weeks of compute, and a budget that no homelab tinkerer was ever going to get signed off. Then a technique called LoRA quietly rewrote the maths, and now you can teach a large model a genuinely new skill on the same graphics card you otherwise use to render explosions. I’ve done it on a single 24GB GPU over one long evening — the fans loud, a pot of coffee going cold — and the result was good enough to actually put to work the next morning.

ComfyUI: Node-Based Image Generation for People Who Want Control

Tue, 21 May 2024 09:00:00 +0000

The first time you open ComfyUI, you will hate it. There’s no friendly prompt box waiting for your words, no big orange Generate button — just a tangle of boxes connected by coloured spaghetti, like someone wired up a modular synthesiser and walked off. I closed it the first time too, went back to the form-based UI I’d been using, and got on with my evening. Then I went back, a week later, because the people producing the most consistent, repeatable, genuinely controllable images locally were all using it, and there’s usually a reason a difficult tool refuses to die. The reason, it turned out, was worth the friction. This is the post I wish I’d read before I closed the tab.

LangChain vs LlamaIndex: Orchestrating LLMs Without Going Mad

Tue, 30 Apr 2024 09:00:00 +0000

The moment you try to build anything real with a language model, you discover the hard part isn’t the model. It’s everything around it: loading documents, splitting them sensibly, embedding them, stuffing the right context into a prompt, calling a tool, parsing the reply, and doing it all again. You can write this yourself — I did, twice, badly — or you can reach for a framework. The two that dominate are LangChain and LlamaIndex, and the internet will cheerfully tell you to use both, neither, or that one is bloated and the other is a toy. Here’s what I actually think after building with each.

Whisper: Self-Hosted Speech-to-Text That Runs on a Raspberry Pi

Tue, 26 Mar 2024 09:00:00 +0000

I have a drawer full of Raspberry Pis that I bought with grand plans and then quietly retired. So when OpenAI released Whisper as an open model — actual weights, MIT licence, no API key required — my first thought was not “this will revolutionise transcription”. It was “can I make the saddest Pi in the drawer earn its keep”. The answer, with some caveats I’ll be honest about, is yes.

Running Stable Diffusion on a Budget GPU: What Actually Works Below 8GB VRAM

Tue, 27 Feb 2024 09:00:00 +0000

Every thread about running Stable Diffusion locally eventually arrives at the same smug conclusion: just buy a 4090. This is wonderful advice if you have a spare grand and a power supply that doesn’t sound like a hairdryer. The rest of us are sitting on a 6GB laptop card, an old GTX 1060, or a 4GB GPU that the internet has decided is e-waste. Good news: the internet is wrong, and I have spent enough late nights proving it to write this down.

Harnessing the Power of ChatGPT to Generate Stunning Images with DALL-E 2

Tue, 28 Mar 2023 15:14:16 +0000

I want to be straight with you up front: as of May 2026, OpenAI has retired both DALL-E 2 and DALL-E 3 from its API, and ChatGPT moved its built-in image generation onto the newer gpt-image models back in late 2025. So this post is partly a period piece. But the workflow it describes — using a language model to write prompts for an image model, then iterating — has not died at all. It has become the default way everyone generates images, baked so deeply into the tools that most people no longer notice they’re doing it. Understanding the loop on its own terms, separate from whichever model is currently fashionable, is the durable skill. The model names change every eighteen months; the technique does not.

Why ChatGPT Can't Pick Stocks: The Limits of LLMs for Market Predictions

Wed, 15 Mar 2023 16:32:05 +0000

I asked a large language model, in a moment of weakness, which stock it would buy tomorrow. It gave me a confident, well-structured, thoroughly-reasoned answer complete with a target price and a neat paragraph on the company’s “strong fundamentals”. It was also, I am fairly sure, drawing on financial data that was over a year out of date, had no idea what the share had done that morning, and would have produced an equally confident answer if I had asked about a company that went bankrupt in 2019. That is the whole problem in one anecdote: the fluency is real, and it is precisely what makes the model dangerous as a trading tool.

Is ChatGPT at a tipping point on the hype scale?

Thu, 22 Dec 2022 16:46:52 +0000

ChatGPT went public on 30 November 2022 and crossed a million users in five days — faster than Instagram, faster than Spotify, faster than anything I can remember watching. By the time I sat down to write this, three weeks later, half my feed was either declaring the end of Google or declaring the whole thing a parlour trick. Both camps are wrong in the same way: they’re reacting to the demo instead of the mechanism. So let me try to do the unfashionable thing and ask what’s actually in the box, where it earns its keep, and where the hype is writing cheques the model can’t cash.

Brexit

Sun, 10 Apr 2022 11:57:56 +0000

Back in 2019, OpenAI released GPT-2, and for a few weeks it felt like the most exciting and faintly alarming thing on the internet. The largest model, 1.5 billion parameters, was held back at first — OpenAI staged the release across the year, citing worries about mass-produced misinformation, and only published the full weights on 5 November 2019. So naturally, the moment I could run it, I gave it the most misinformation-friendly prompt I could think of: a couple of sentences about Brexit, and a request to keep going.

Ibex: anatomy of a GPT-2 hallucination

Wed, 16 Mar 2022 11:24:03 +0000

A few years ago I fed the word “ibex” to a GPT-2 model and let it run. What came back was a few hundred words of confident, grammatically immaculate prose claiming, among other things, that the ibex is “the most common wild goat in North America”, that it was “primarily introduced by the Romans during the fourth century”, and that it can weigh “roughly 400 pounds”. Every one of those is false. The animal it described does not exist. And yet it reads like an encyclopaedia entry, which is precisely the problem worth dwelling on — because the failure mode that produced it is the same one sitting underneath today’s far larger models.

Playing around with GPT-2 - the sequel

Tue, 04 Jan 2022 08:24:06 +0000

When I played with the smaller GPT-2 models, the output was fun but obviously a machine talking to itself — grammatical, topically vague, and prone to collapsing into nonsense after a sentence or two. OpenAI had deliberately held back the full model, worried it was too good and too easy to abuse for spam and fake news. On 5 November 2019 they changed their minds and released the lot: the full 1.5-billion-parameter model, 1558M, having “seen no strong evidence of misuse so far”. So of course I ran it against exactly the same prompts, to see how much the extra parameters actually buy you.

Playing around with GTP-2

Wed, 06 Oct 2021 11:40:42 +0000

In early 2019 OpenAI announced a language model good enough that they were, they said, too nervous to release it: GPT-2, described in their paper Language Models are Unsupervised Multitask Learners. The worry was that a text generator this fluent would be a gift to spammers and fake-news mills. Rather than ship the full 1.5-billion-parameter model, they staged the release, starting with a much smaller subset. That smaller model is the toy I got my hands on, and this is what happened when I actually ran it.

Mediterranean Diet

Wed, 09 Jun 2021 16:28:00 +0000

A confession about this page, because the honesty matters more than the tidiness. The article that used to sit here was generated by GPT-2, an early language model, as a deliberate experiment in machine-written text back in 2021. It read plausibly and it was almost entirely wrong — a smooth cascade of invented citations, garbled biochemistry, and confident nonsense about “meat-lover’s fries” and cancers of the “colon, colon, liver.” I’ve kept the URL and rewritten the substance from real sources, partly because the diet deserves an accurate write-up and partly because that old text is a perfect, preserved specimen of why you cannot trust fluent prose to be true prose. If you want the longer version of that lesson, I wrote about the practical limits of running your own models in a comparison of Llama, Mistral and Gemma for real work; the short version is that a model’s confidence and a model’s correctness are entirely unrelated quantities. So here is the actual Mediterranean diet, with names, dates and studies you can check.