Docker Image¶
What's inside the Docker image, how it's structured, and why.
Base Image¶
The devel variant is used (not runtime) because some components need CUDA compilation headers:
- SageAttention compiles Triton kernels
- FlashAttention builds from source
- llama-server compiles with CUDA support
Layer Order¶
The Dockerfile layers are ordered for optimal build cache efficiency. Layers that change less frequently are placed first, so that adding a custom node doesn't trigger a 60-minute llama-server recompilation.
Layer 1: Base CUDA image (changes: ~never)
Layer 2: System packages (apt-get) (changes: ~never)
Layer 3: llama-server (conditional, ~60 min) (changes: ~never — pinned version)
Layer 4: Python venv (changes: ~never)
Layer 5: PyTorch (changes: ~never)
Layer 6: xformers (changes: ~never)
Layer 7: SageAttention (conditional) (changes: ~never)
Layer 8: FlashAttention (conditional) (changes: ~never)
Layer 9: FastAPI + backend dependencies (changes: rarely)
Layer 10: ComfyUI core (changes: rarely)
Layer 11: COPY nodes.txt + install_nodes.sh ← CACHE BREAK when nodes change
Layer 12: Custom nodes group 1 (rebuilt if nodes.txt changes)
Layer 13: Custom nodes group 2 (rebuilt if nodes.txt changes)
Layer 14: Custom nodes group 3 (rebuilt if nodes.txt changes)
Layer 15: Custom nodes group 4 (rebuilt if nodes.txt changes)
Layer 16: Cleanup
Layer 17: COPY bootstrap.py ← CACHE BREAK when bootstrap changes
Layer 18: COPY start.sh ← CACHE BREAK when start.sh changes
Key insight: llama-server is at layer 3 — right after system packages. It only depends on CUDA + cmake + git, nothing from Python or ComfyUI. Adding a custom node, a pip package, or updating PyTorch never triggers llama-server recompilation. Only changing the base CUDA image or system packages does (which ~never happens).
What's Installed¶
System Packages¶
| Package | Purpose |
|---|---|
python3.12, python3.12-venv, python3.12-dev |
Python runtime |
build-essential, cmake, ninja-build |
Compile C/C++ extensions (dlib, llama.cpp) |
gfortran, libopenblas-dev, liblapack-dev |
BLAS/LAPACK for dlib, scipy, numpy |
ffmpeg |
Video encode/decode (VideoHelperSuite) |
libgl1, libglib2.0-0, libsm6, libxext6, libxrender1 |
OpenCV runtime dependencies |
git, wget, curl |
Download tools |
Python Packages¶
| Package | Purpose |
|---|---|
torch, torchvision, torchaudio |
PyTorch (matched to CUDA via PYTORCH_INDEX) |
xformers |
Memory-efficient attention (all GPUs) |
triton, sageattention |
SageAttention v1/v2 (Ampere+ only, conditional) |
flash-attn |
FlashAttention v2/v3 (Ampere+ only, conditional, built from source) |
fastapi, uvicorn[standard] |
Web framework for Studio backend |
httpx |
Async HTTP client (for ComfyUI API calls) |
python-multipart |
File upload handling |
huggingface_hub |
HuggingFace model downloads |
aiofiles |
Async file operations |
requests |
HTTP client (for downloads) |
pyyaml |
YAML parsing (workflow manifests) |
ComfyUI + Custom Nodes¶
ComfyUI is cloned from GitHub and installed with all its requirements. 38 custom nodes are installed from docker/production/nodes.txt, organized in categories:
| Category | Nodes | Purpose |
|---|---|---|
| Fundamentals / QoL | 10 | Manager, essentials, seed control, utilities, login |
| Image Generation | 14 | ControlNet, IP-Adapter, face swap, upscale, GGUF, segmentation |
| Performance | 1 | WaveSpeed (FBCache + torch.compile) |
| Video Generation | 8 | VHS, AnimateDiff, WAN, Hunyuan, LTX, CogVideo, Mochi |
| CivitAI Integration | 3 | CivitAI model loader, browser, toolkit |
llama-server (Optional)¶
When ENABLE_LLM=true (default), llama.cpp is compiled from source with CUDA support:
- Source: Cloned from
ggml-org/llama.cppat a pinned release tag (LLAMA_CPP_VERSION, defaultb8505) - CUDA architectures: SM 75 through SM 100 (Turing to Blackwell)
- Binary location:
/opt/llama-server - Build time: ~10 minutes with local GPU, ~60 minutes on CI without GPU
- Build flags: Based on llama.cpp official CUDA Dockerfile, using
--allow-shlib-undefinedfor CUDA stubs
Build Cache¶
The CI uses registry-based caching instead of GitHub Actions cache:
cache-from: type=registry,ref=ghcr.io/diego-devita/comfyui-studio:cache
cache-to: type=registry,ref=ghcr.io/diego-devita/comfyui-studio:cache,mode=max
This avoids the 10 GB limit of GHA cache, which was causing llama-server to be recompiled on every build (the layer is ~5-6 GB). With registry cache, there's no size limit.
Image Tags¶
| Tag | Meaning |
|---|---|
:latest |
Always points to the most recent successful build from main |
:sha-<commit> |
Pinned to a specific commit (e.g., :sha-4adfae8) |
:cache |
Internal — stores build cache layers, not meant for running |
Image Size¶
The compressed image is approximately 12 GB. Decompressed on disk, it's approximately 25-30 GB. This is managed separately by the container runtime — it does not count against the pod's container disk allocation.