Stable Diffusion Workflows: Turning ComfyUI into an Image API

Stop clicking the canvas and start POSTing JSON

Smarc Included in

02-06-2025 1909 words 9 min read

Stable Diffusion Workflows: Turning ComfyUI into an Image API

Contents

ComfyUI is usually sold as a node editor — a sprawling graph of boxes and wires you drag around to build a Stable Diffusion pipeline. That’s how most people meet it, and it’s genuinely the most flexible front end for local image generation. But the canvas is the boring part. The interesting part is that everything you build on it is just a JSON document, and ComfyUI happily executes that JSON over an HTTP API. Once you realise that, ComfyUI stops being a toy you click and becomes an image-generation service you can call from anything — a cron job, a build pipeline, a webhook handler.

The moment this clicked for me was the third evening in a row I’d sat clicking “Queue Prompt”, changing one word in the prompt, and clicking again. That’s not a creative act, it’s a for loop performed by a human. Every hero image on this blog is now generated by a script calling ComfyUI over HTTP, and the mouse never enters the picture. This post is how to get there.

This post assumes you already know what ComfyUI is and have a workflow that produces images you like. If you’re still building that workflow on the canvas, the companion piece on node-based image generation for people who want control is the place to start; here the goal is to stop touching the mouse and drive the whole thing headlessly.

The two JSON formats

This trips everyone up once, so let’s get it out of the way. When you save a workflow from the UI, you get the UI format — it includes node positions, link metadata, and other editor cruft. The API does not want that. It wants the API format, which is a flat map of node IDs to their class and inputs.

To get it: in the ComfyUI settings, enable “Dev mode,” then use Save (API Format). You’ll get something like this (trimmed):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 42,
      "steps": 25,
      "cfg": 7.0,
      "sampler_name": "euler",
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    }
  },
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": { "text": "a lighthouse at dusk", "clip": ["4", 1] }
  }
}

The ["4", 0] notation is a wire: “take output slot 0 of node 4.” That’s the entire graph, expressed as data. Anything you can edit in the UI — the prompt, the seed, the steps — is a field you can overwrite from code before you submit. The mental model that makes the rest of this easy: the API-format JSON is a template, and the handful of input fields you care about are its parameters. Everything else is fixed structure you leave alone.

Submitting a job

Generation is asynchronous. You POST the workflow to /prompt and get back a prompt_id; the actual render happens on the GPU queue. Here’s the minimal Python loop — load the template, patch the prompt and seed, queue it, then poll history for the result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import json, random, time, urllib.request

HOST = "http://127.0.0.1:8188"

def queue(workflow):
    data = json.dumps({"prompt": workflow}).encode()
    req = urllib.request.Request(f"{HOST}/prompt", data=data)
    return json.load(urllib.request.urlopen(req))["prompt_id"]

with open("workflow_api.json") as f:
    wf = json.load(f)

# Override the parts we care about
wf["6"]["inputs"]["text"] = "a foggy harbour, cinematic, golden hour"
wf["3"]["inputs"]["seed"] = random.randint(0, 2**32)

pid = queue(wf)

# Poll until the job shows up in history
while True:
    hist = json.load(urllib.request.urlopen(f"{HOST}/history/{pid}"))
    if pid in hist:
        outputs = hist[pid]["outputs"]
        break
    time.sleep(1)

# Fetch the rendered image
img = outputs["9"]["images"][0]
url = f"{HOST}/view?filename={img['filename']}&subfolder={img['subfolder']}&type={img['type']}"
urllib.request.urlretrieve(url, "out.png")
print("saved out.png")

That’s the whole API surface you need for batch work: /prompt to queue, /history/{id} to collect results, /view to download. No node editor in sight.

One habit worth adopting early: randomise the seed on every call, as the code above does. ComfyUI caches aggressively — if the entire workflow, seed included, is byte-identical to a previous run, it may hand you the cached result rather than re-rendering. That’s a feature when you want it and a baffling “why is it ignoring my prompt change” bug when you don’t. Changing the seed (or any input) invalidates the cache for that branch of the graph.

Handling errors properly

The minimal loop above is fine for a demo and dangerous in a service, because it assumes every job succeeds. In reality a job can fail — a bad prompt, an out-of-memory on the GPU, a missing model — and a naive poll on /history will loop forever waiting for outputs that never come. Two fixes matter.

First, POST errors surface immediately. If /prompt rejects the workflow (a malformed node, an unknown class), it returns a non-200 with a JSON body explaining what’s wrong. Catch it:

1
2
3
4
5
6
7
def queue(workflow):
    data = json.dumps({"prompt": workflow}).encode()
    req = urllib.request.Request(f"{HOST}/prompt", data=data)
    try:
        return json.load(urllib.request.urlopen(req))["prompt_id"]
    except urllib.error.HTTPError as e:
        raise RuntimeError(f"queue rejected: {e.read().decode()}") from e

Second, bound the poll. A job that errors after queuing shows up in history with a status that isn’t success, or never produces the expected output node. Cap the number of poll iterations and check the history entry’s status field rather than assuming presence-in-history means success. A render that should take twenty seconds and has been polling for five minutes has failed silently on the GPU; give up and report it, don’t wait forever.

Going properly headless

For a server you’ll want ComfyUI running without a display, listening on something other than localhost:

1
2
3
4
5
python main.py \
  --listen 0.0.0.0 \
  --port 8188 \
  --output-directory /srv/comfy/out \
  --disable-auto-launch

Put it behind a reverse proxy with authentication — the API has no auth of its own, so anything that can reach the port can run jobs on your GPU. This is not a hypothetical: an exposed ComfyUI port is a free GPU for whoever finds it, and image-generation endpoints get scanned. Bind it to a private interface, front it with a proxy that enforces auth, and never expose --listen 0.0.0.0 straight to the internet. The same reverse-proxy-plus-auth pattern I use for every other homelab service applies here without modification.

For real-time progress rather than polling, there’s a WebSocket at /ws that streams execution events and even preview images mid-render. It’s genuinely useful when you want a live progress bar or to abort a bad render early — you subscribe with the client ID you passed on the /prompt call and receive executing, progress, and executed events as the graph runs. For pure batch generation, though, the poll-the-history pattern above is simpler and perfectly adequate, and I reach for the WebSocket only when a human is watching.

The pattern that scales well: keep a library of API-format workflow templates as files, treat each as a function whose “arguments” are the handful of input fields you override, and wrap the queue-and-poll loop in a small service. Now “generate a hero image for this article” is one HTTP call from your publishing pipeline. If you’re doing this on limited hardware, it pairs naturally with the tuning in running Stable Diffusion on a budget GPU — the API doesn’t change your VRAM ceiling, it just lets you hit it unattended.

Parameterising templates cleanly

The one design decision that keeps this maintainable is not reaching into the workflow dict by raw node ID all over your codebase. Node 6 being the positive prompt is an accident of how you built the graph, and scattering wf["6"]["inputs"]["text"] through your application means a workflow redesign turns into a find-and-replace hunt. Give each overridable field a name once, in one place:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# a single table mapping friendly names to (node_id, input_key)
PARAMS = {
    "prompt":   ("6", "text"),
    "negative": ("7", "text"),
    "seed":     ("3", "seed"),
    "steps":    ("3", "steps"),
}

def apply(wf, **kwargs):
    for name, value in kwargs.items():
        node, key = PARAMS[name]
        wf[node]["inputs"][key] = value
    return wf

wf = apply(wf, prompt="a foggy harbour, cinematic",
               seed=random.randint(0, 2**32), steps=30)

Now the whole rest of your code says apply(wf, prompt=...) and never mentions a node ID. When you redesign the graph, you update the PARAMS table — one dictionary — and everything downstream keeps working. This is the difference between a script you rewrite every time you touch ComfyUI and a stable service you build on top of it.

The same idea extends to loading models by name, swapping schedulers, or wiring in a second sampler pass: each is just another entry in the parameter table pointing at the node and input that controls it. Keep the mapping honest and the workflow JSON becomes a genuine function you call, not a fragile blob you poke at.

Troubleshooting: the failures you’ll actually hit

KeyError on outputs["9"]. The output node ID is specific to your graph. Node 9 in my workflow is a SaveImage; in yours it might be node 14, or the images might be under a different key. Print the whole outputs dict once and read off the real structure rather than copying an ID from a blog post.

The API accepts the job but no file appears. Check that your workflow actually contains a SaveImage (or PreviewImage) node. A graph that ends at a VAEDecode with nothing saving the result renders happily and writes nothing to disk. The /view fetch then has nothing to fetch.

Everything worked, then a workflow redesign broke the script. This is the sharpest edge of the whole approach: the API format is tied to your exact node graph, so when you redesign a workflow the node IDs your code references can change out from under you. Keep the template JSON and the calling code together in the same repo, treat the node IDs as a contract, and re-export both at once whenever you touch the graph. A test that queues one known job and asserts an image comes back catches this the moment it breaks.

SQLITE_BUSY-style stalls under concurrency. If you fire many jobs at once, remember ComfyUI processes the GPU queue serially — the API is async but the GPU isn’t. Queuing a hundred jobs doesn’t parallelise them; it just makes a long queue. For real throughput you scale by running more ComfyUI instances behind the proxy, not by hammering one harder.

Is it worth it?

If you generate images one at a time for fun, stay on the canvas — the API buys you nothing. The moment you find yourself doing the same workflow repeatedly with different prompts, or wanting images produced by some other system, flipping to the API is transformative. The investment is small: enable dev mode, save the API-format JSON, and learn the three endpoints above.

The catch worth flagging, again because it’s the one that’ll bite you, is that the API format is welded to your exact node graph — keep the template and the calling code together, and re-export both at once. Handle errors properly, put it behind auth, and don’t expose the port. For anyone running ComfyUI on a homelab GPU and wanting it to do useful work unattended, this is the unlock: the same box that was a fun toy on the canvas becomes a genuine image service the rest of your infrastructure can call.

Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.

Tagged#comfyui #automation

Contents

Stable Diffusion Workflows: Turning ComfyUI into an Image API

Stop clicking the canvas and start POSTing JSON

The two JSON formats

Submitting a job

Handling errors properly

Going properly headless

Parameterising templates cleanly

Troubleshooting: the failures you’ll actually hit

Is it worth it?

Related Content

What Is an AI Agent, and Should You Trust It with Your Inbox?

AI-Powered Git Commit Messages: Useful or Just Annoying

MCP Servers: Giving Language Models Hands and Eyes

ComfyUI: Node-Based Image Generation for People Who Want Control