ComfyUI: Node-Based Image Generation for People Who Want Control

Trading a friendly form for a graph that shows you exactly what your pixels are doing

Smarc Included in

21-05-2024 1813 words 9 min read

ComfyUI: Node-Based Image Generation for People Who Want Control

Contents

The first time you open ComfyUI, you will hate it. There’s no friendly prompt box waiting for your words, no big orange Generate button — just a tangle of boxes connected by coloured spaghetti, like someone wired up a modular synthesiser and walked off. I closed it the first time too, went back to the form-based UI I’d been using, and got on with my evening. Then I went back, a week later, because the people producing the most consistent, repeatable, genuinely controllable images locally were all using it, and there’s usually a reason a difficult tool refuses to die. The reason, it turned out, was worth the friction. This is the post I wish I’d read before I closed the tab.

ComfyUI is a node-based interface for Stable Diffusion. Instead of a form that hides the pipeline behind sensible defaults, it exposes the pipeline itself as a graph you build by hand. Every stage — loading the model, encoding the prompt, sampling, decoding — is a node, and the wires between them are the data flowing through. It is more work, and that is entirely the point. The form-based tools optimise for getting a picture quickly; ComfyUI optimises for getting the picture you actually want, repeatably, and understanding why it came out the way it did. Those are different goals, and which one you care about decides whether you’ll love this tool or bounce off it.

Why a graph instead of a form

A traditional web UI is a wrapper around a fixed pipeline. You get the options the developers chose to surface, arranged the way they decided. That’s fine until you want to do something they didn’t anticipate: run two prompts through the same noise seed, swap the VAE only for the upscale pass, feed one image’s composition into another’s style. In a form, you’re stuck. In a graph, you just route a wire somewhere new.

The graph is also brutally honest about what’s happening. When a render looks wrong, you can see which node produced the bad latent, because you can drop a preview node anywhere in the chain and inspect the intermediate output. Is the problem the prompt encoding, the sampler, or the VAE decode at the end? In a form you guess; in a graph you look. I have debugged more bad generations in ten minutes of ComfyUI than in an hour of poking at a checkbox in a form and re-rolling the dice. This is the same instinct that makes good infrastructure tooling worthwhile — seeing the actual state instead of trusting a black box — and it’s why I value a debuggable graph the way I value a real editor over a magic IDE button, a preference I went into in my Neovim setup for people who also have work to do.

The anatomy of a basic workflow

The default text-to-image graph has about seven nodes, and once you understand them you understand the whole tool. A Load Checkpoint node reads the model and outputs three things: the model itself, the CLIP text encoder, and the VAE. Two CLIP Text Encode nodes turn your positive and negative prompts into conditioning. An Empty Latent Image node defines your canvas size. The KSampler is the heart — it takes the model, both conditionings, and the empty latent, and does the actual denoising. Finally a VAE Decode turns the finished latent into a picture, and Save Image writes it out.

That’s it. Every fancy workflow you’ll ever see — LoRA stacks, ControlNet rigs, multi-pass upscalers, regional prompting — is this seven-node skeleton with extra limbs bolted on. Learn to read the skeleton and the most intimidating community graph becomes legible: you trace the wires from Load Checkpoint to Save Image and everything in between is just a detour you can follow. The KSampler is where the parameters people obsess over live — steps, CFG scale, sampler and scheduler — and because they’re all visible inputs on one node rather than buried in a settings panel, you can wire them to controls, animate them, or feed one render’s seed into the next.

Workflows are just files

Here’s the feature that won me over completely: a ComfyUI workflow is a JSON file. The entire graph — every node, every connection, every setting — serialises to plain text you can save, diff, and share. Even better, ComfyUI embeds the full workflow into the PNG metadata of every image it generates.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 875623491,
      "steps": 25,
      "cfg": 7.0,
      "sampler_name": "dpmpp_2m",
      "scheduler": "karras",
      "denoise": 1.0
    }
  }
}

This means you can drag any ComfyUI image back into the canvas and it rebuilds the exact graph that made it. No more “what settings did I use for that one good render three weeks ago” — the answer is baked into the file. For a tinkerer who never writes anything down, this is borderline miraculous.

It also changes how you collaborate and learn. When someone posts a striking image from ComfyUI, they’re often posting the recipe along with it without realising — drop their PNG into your canvas and you have their exact pipeline to dissect, tweak, and learn from. And because the workflow is plain JSON, it lives happily in version control. I keep my serious workflows in a git repo, so I can see precisely what changed between “the version that worked” and “the version I broke at midnight,” and roll back with a single command. Image generation, made reproducible. That’s not a phrase you get to use about a form with a Generate button.

Getting it running

Installation is refreshingly undramatic. Clone, install dependencies, point it at your existing model folder, and go:

1
2
3
4
5
6
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
# reuse your existing checkpoints instead of re-downloading
ln -s /mnt/models/checkpoints models/checkpoints
python main.py --listen 0.0.0.0

The --listen 0.0.0.0 flag binds it to all interfaces so you can drive it from a laptop while the GPU box does the work in another room — exactly how I run mine, with the heat and fan noise safely elsewhere. One caveat worth stating plainly: that flag puts an unauthenticated web UI on your network. ComfyUI ships with no login, so don’t expose that port to anything you don’t trust. Keep it on the LAN, or front it with a reverse proxy that handles auth, or reach it over a private overlay like Tailscale rather than binding it somewhere the wider world can poke. Treating a tool with no auth as if it had some is how people end up with their GPU quietly generating images for a stranger.

There’s also ComfyUI Manager, a community extension that turns custom-node installation from a manual git-cloning chore into a search box, and it’s the first thing I install on any fresh setup. It also handles updating nodes and, importantly, flags missing nodes when you load a workflow that depends on something you don’t have — which saves you the puzzle of a graph full of red error boxes and no clue what they want.

When it breaks: troubleshooting

The failures are recognisable once you’ve hit them a few times.

Red nodes on loading a shared workflow. The graph references custom nodes you don’t have installed. Open ComfyUI Manager, choose “Install Missing Custom Nodes,” restart, and they go green. This is the single most common confusion for newcomers, and it isn’t a bug — it’s a dependency that didn’t travel with the file.
CUDA out of memory. The GPU ran out of VRAM, usually from too large a resolution or batch size, or a model that doesn’t fit. Drop the resolution, generate one image at a time, or launch with --lowvram (or --medvram) to trade speed for headroom. Tiled VAE decode nodes help with the spike at the decode stage specifically.
A custom node broke after an update. The community ecosystem moves fast and breaks things. ComfyUI Manager lets you pin or roll back a node to a known-good commit; do that rather than fighting the latest version at 1am. This is the tax for living on the community edge.
Everything is mysteriously slow on first run. ComfyUI loads models lazily and caches them — the first generation after launch pays the full model-load cost, subsequent ones don’t. That initial pause is expected, not a hang.

The honest downsides

ComfyUI is power-user software and makes no apology for it. The learning curve is a wall, not a slope. Sharing a workflow with someone who isn’t technical means handing them a JSON file and a prayer. Custom nodes from the community are fantastic right up until one breaks after an update and takes your whole graph down with it. And for the simple case — “give me a picture of a cat” — it is objectively slower and fiddlier than just typing into a box. None of these are dealbreakers if control is what you’re after, but pretending they don’t exist would be doing you a disservice.

Is it worth it?

It depends entirely on what you want from the machine. If you generate images occasionally and want them to be good, a form-based UI will serve you better and faster. If you want to understand what image generation is doing, build repeatable pipelines you can version-control, and chain together effects nobody packaged for you, ComfyUI is the best thing going and nothing else is close.

There’s a further payoff once you’re fluent: because a workflow is just JSON and ComfyUI exposes an HTTP API, you can stop clicking Generate entirely and call your graph from code. That’s how you turn a hand-built pipeline into a service that produces images on demand — feeding it prompts from a script, a queue, or another application — which is exactly the leap I cover in turning ComfyUI into an image API. The graph you debugged by hand becomes the backend you never have to touch again.

I think of it as the difference between driving an automatic and learning a manual gearbox. Most people are happier with the automatic, and that’s a perfectly reasonable choice — no judgement. But if you’re the sort who self-hosts more than is sensible and enjoys knowing exactly where every photon went, you already know which one you are. Who is ComfyUI for? Anyone who wants repeatable, version-controllable, automatable image generation and is willing to pay a steep up-front learning cost for it. Who should skip it? Anyone who generates the occasional image for fun and wants the path of least resistance. Both answers are correct; they’re just answers to different questions about what you want from the machine in front of you. If you’re in the first camp, give it the genuine second chance I very nearly didn’t.

Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.

Tagged#machine-learning #stable-diffusion #comfyui #self-hosting

Contents

ComfyUI: Node-Based Image Generation for People Who Want Control

Trading a friendly form for a graph that shows you exactly what your pixels are doing

Why a graph instead of a form

The anatomy of a basic workflow

Workflows are just files

Getting it running

When it breaks: troubleshooting

The honest downsides

Is it worth it?

Related Content

Running Stable Diffusion on a Budget GPU: What Actually Works Below 8GB VRAM

Label Studio: Self-Hosted Data Annotation for Training Your Own Models

Local LLMs: A Practical Comparison of Llama, Mistral, and Gemma for Real Work

LoRA Fine-Tuning on Consumer Hardware: Adding Skills to a Model Without Retraining It