GPU - Tag - vo.rs

Running AI Inference on Kubernetes: GPU Scheduling, Ollama, and Resource Sharing

Thu, 09 Jan 2025 09:00:00 +0000

Kubernetes was designed for a world of stateless web services you could scale by adding more identical replicas. GPUs are the opposite of that: scarce, expensive, and absolutely not interchangeable with CPU. So the moment you decide to run model inference on your cluster, you discover that Kubernetes treats your graphics card as a curious unknown — it doesn’t schedule on it, it can’t see it, and your pods come up GPU-less and confused.

Running Stable Diffusion on a Budget GPU: What Actually Works Below 8GB VRAM

Tue, 27 Feb 2024 09:00:00 +0000

Every thread about running Stable Diffusion locally eventually arrives at the same smug conclusion: just buy a 4090. This is wonderful advice if you have a spare grand and a power supply that doesn’t sound like a hairdryer. The rest of us are sitting on a 6GB laptop card, an old GTX 1060, or a 4GB GPU that the internet has decided is e-waste. Good news: the internet is wrong, and I have spent enough late nights proving it to write this down.