Skip to content

Build Arguments

The Dockerfile accepts 7 build arguments that control CUDA version, attention optimizations, and LLM support. All have sensible defaults targeting the latest GPU generation (Blackwell/Hopper).

Reference

CUDA_VERSION

NVIDIA CUDA base image version. Determines the CUDA toolkit used for compilation and runtime.

Value Target GPUs Base Image
12.8.1 (default) Blackwell (B200, B100), Hopper (H200, H100) nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04
12.4.1 Ampere (A100, A6000, RTX 30xx), Ada Lovelace (L40S, RTX 40xx) nvidia/cuda:12.4.1-cudnn-devel-ubuntu24.04
12.1.1 Turing (RTX 20xx, T4), Volta (V100) nvidia/cuda:12.1.1-cudnn-devel-ubuntu24.04

Must be paired with the matching PYTORCH_INDEX.


PYTORCH_INDEX

PyTorch wheel index tag. Selects the correct pre-built PyTorch wheels for your CUDA version.

Value CUDA Version PyTorch Install URL
cu128 (default) 12.8.1 https://download.pytorch.org/whl/cu128
cu124 12.4.1 https://download.pytorch.org/whl/cu124
cu121 12.1.1 https://download.pytorch.org/whl/cu121

This must always match CUDA_VERSION. Mismatched pairs will produce a broken image (PyTorch compiled for one CUDA version running on another).


PYTHON_VERSION

Python interpreter version installed in the image.

Value Notes
3.12 (default) Recommended for ComfyUI as of 2026. Used in the venv at /opt/venv.

Change only if a specific custom node requires a different Python version. The Dockerfile installs python${PYTHON_VERSION}, python${PYTHON_VERSION}-venv, and python${PYTHON_VERSION}-dev from the Ubuntu package repositories.


ENABLE_SAGE_ATTENTION

Install SageAttention and Triton for optimized attention computation.

Value Behavior
true (default) Installs triton and sageattention via pip
false Skips installation entirely

GPU requirements:

GPU Generation Compute Capability SageAttention Version
Hopper, Blackwell SM 90+ v2 with FP8 kernels -- fastest, uses 8-bit floating point for attention
Ampere, Ada Lovelace SM 80-89 v1 -- optimized CUDA kernels, 2-3x faster than default attention
Turing, Volta SM 70-75 Not supported -- set to false

SageAttention provides 2-3x faster attention computation during image and video generation. It installs from pre-built wheels, so there is minimal impact on build time.

Set to false for SM 75 and below

On Turing (RTX 20xx, T4) and Volta (V100) GPUs, SageAttention will fail at runtime. Always set ENABLE_SAGE_ATTENTION=false for these GPUs.


ENABLE_FLASH_ATTENTION

Install FlashAttention for memory-efficient fused attention kernels.

Value Behavior
true (default) Installs flash-attn from source via pip (--no-build-isolation)
false Skips installation entirely

GPU requirements:

GPU Generation Compute Capability FlashAttention Version
Hopper, Blackwell SM 90+ v3 -- newest, optimized for Hopper architecture
Ampere, Ada Lovelace SM 80-89 v2 -- memory-efficient fused kernels
Turing, Volta SM 70-75 Not supported -- set to false

FlashAttention lets you run larger batch sizes or higher resolutions without running out of VRAM by reducing the memory footprint of attention computation.

Builds from source -- 20-30 minutes

Unlike SageAttention, FlashAttention builds from source during pip install. This adds 20-30 minutes to the Docker build. The CI/CD pipeline defaults to false for this reason.

Set to false for SM 75 and below

On Turing (RTX 20xx, T4) and Volta (V100) GPUs, FlashAttention is not supported. Always set ENABLE_FLASH_ATTENTION=false for these GPUs.


ENABLE_LLM

Compile llama.cpp server for local LLM inference on the pod GPU.

Value Behavior
true (default) Clones llama.cpp, compiles llama-server with CUDA, installs binary to /opt/llama-server
false Skips the entire llama.cpp clone and compilation step

The binary supports GPU-accelerated inference with GGUF models (Qwen, Llama, Mistral, etc.) and is managed through the ComfyUI Studio LLM page.

Build time impact

llama-server compilation takes ~10 minutes when a GPU is present during the build (single architecture) or ~60 minutes on CI runners without a GPU (compiles for all architectures SM 75 through SM 100). Set to false to save this time if you do not need LLM support.

See llama-server for details on the compilation process.


LLAMA_CPP_VERSION

The llama.cpp git tag to clone and build. Only relevant when ENABLE_LLM=true.

Value Behavior
b8505 (default) Pinned release, tested and known to build correctly. Pinned on 2026-03-24.
latest Clones HEAD of the llama.cpp repository (--depth 1, no tag)
Any tag (e.g., b8400) Clones that specific release tag

Pin for reproducibility

The default pinned version ensures reproducible builds. Using latest gets the newest features and fixes but risks build failures from upstream breaking changes. If a latest build fails, switch back to the pinned version or specify a known-good tag.

Build Examples

Default (Blackwell/Hopper, all features)

docker build -f docker/production/Dockerfile -t comfyui-studio .

Uses all defaults: CUDA 12.8.1, cu128, SageAttention v2, FlashAttention v3, llama-server b8505.

Ampere / Ada Lovelace (A100, RTX 30xx/40xx, L40S)

docker build -f docker/production/Dockerfile -t comfyui-studio \
  --build-arg CUDA_VERSION=12.4.1 \
  --build-arg PYTORCH_INDEX=cu124 \
  .

SageAttention v1 and FlashAttention v2 are installed (both default to true).

Turing / Volta (RTX 20xx, T4, V100)

docker build -f docker/production/Dockerfile -t comfyui-studio \
  --build-arg CUDA_VERSION=12.1.1 \
  --build-arg PYTORCH_INDEX=cu121 \
  --build-arg ENABLE_SAGE_ATTENTION=false \
  --build-arg ENABLE_FLASH_ATTENTION=false \
  .

No attention optimizations. xformers and torch.compile still work on these GPUs.

Fast CI build (no FlashAttention, no LLM)

docker build -f docker/production/Dockerfile -t comfyui-studio \
  --build-arg ENABLE_FLASH_ATTENTION=false \
  --build-arg ENABLE_LLM=false \
  .

Skips both source compilations. Build time drops from ~85-115 minutes to ~15-25 minutes.

Custom llama.cpp version

docker build -f docker/production/Dockerfile -t comfyui-studio \
  --build-arg LLAMA_CPP_VERSION=b8400 \
  .

Latest llama.cpp (bleeding edge)

docker build -f docker/production/Dockerfile -t comfyui-studio \
  --build-arg LLAMA_CPP_VERSION=latest \
  .

Argument Pairing Quick Reference

GPU Family CUDA_VERSION PYTORCH_INDEX SAGE_ATTENTION FLASH_ATTENTION
Blackwell / Hopper 12.8.1 cu128 true (v2 FP8) true (v3)
Ampere / Ada Lovelace 12.4.1 cu124 true (v1) true (v2)
Turing / Volta 12.1.1 cu121 false false

See GPU Compatibility for the full per-GPU breakdown.