MEMO #2: PyTorch + xformers Environment Setup on Blackwell Architecture GPU
As of 2025-07-21
- Only tested on
sdxl_gen_img.py
(inference) of sd-scripts.
Python 3.12.x + CUDA 12.8
Environment:
- Python 3.12.10
- CUDA 12.8
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -U xformers --index-url https://download.pytorch.org/whl/cu128
torch==2.7.1+cu128
torchvision==0.22.1+cu128
xformers @ git+https://github.com/facebookresearch/xformers.git@0f0bb9d93b466927d99fb43a311622b7682c6e9a
.env
: (Temporary set back to CUDA 12.8)
export CUDA_HOME=/usr/local/cuda-12.8
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:${PATH}
Python 3.12.x + CUDA 12.9 (Nightly build PyTorch)
Environment:
- Python 3.12.10
- CUDA 12.9
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0
Install PyTorch nightly build with CUDA 12.9 support:
pip install https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250720%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250720%2Bcu129-cp312-cp312-manylinux_2_28_x86_64.whl --index-url https://download.pytorch.org/whl/nightly --extra-index-url https://download.pytorch.org/whl/nightly/torch --extra-index-url https://download.pytorch.org/whl/nightly/torchvision
Build xformers
from the latest commit of Added Blackwell Support#1262
Reference: https://github.com/facebookresearch/xformers/issues/1251
export TORCH_CUDA_ARCH_LIST="12.0"
pip install ninja
pip install --no-build-isolation --pre -v -U git+https://github.com/facebookresearch/xformers.git@cbd127ce86f5a42319734ca219b2268e0926d895
After the build:
$ python -m xformers.info
xFormers 0.0.31+cbd127c.d20250721
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.fa2F@2.5.7-pt: available
memory_efficient_attention.fa2B@2.5.7-pt: available
memory_efficient_attention.fa3F@0.0.0: unavailable
memory_efficient_attention.fa3B@0.0.0: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm_search@0.7.1: available
sp24._cslt_sparse_mm@0.7.1: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.9.0.dev20250720+cu129
pytorch.cuda: available
gpu.compute_capability: 12.0
gpu.name: NVIDIA RTX PRO 6000 Blackwell Workstation Edition
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1209
build.hip_version: None
build.python_version: 3.12.10
build.torch_version: 2.9.0.dev20250720+cu129
build.env.TORCH_CUDA_ARCH_LIST: 12.0
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.9.86
source.privacy: open source
Test sd-script
I modified the sd-script
a little bit to make the requirements and import compatible.
Install:
pip install -U -r requirements.txt
pip install -e .
Python 3.13.x + CUDA 12.9
Environment:
- Python 3.13.5
- CUDA 12.9
Basically the same as above, only change the Python version of the wheels:
pip install https://download.pytorch.org/whl/nightly/cu129/torch-2.9.0.dev20250720%2Bcu129-cp313-cp313-manylinux_2_28_x86_64.whl https://download.pytorch.org/whl/nightly/cu129/torchvision-0.24.0.dev20250720%2Bcu129-cp313-cp313-manylinux_2_28_x86_64.whl --index-url https://download.pytorch.org/whl/nightly --extra-index-url https://download.pytorch.org/whl/nightly/torch --extra-index-url https://download.pytorch.org/whl/nightly/torchvision
$ python -m xformers.info
xFormers 0.0.31+cbd127c.d20250721
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.fa2F@2.5.7-pt: available
memory_efficient_attention.fa2B@2.5.7-pt: available
memory_efficient_attention.fa3F@0.0.0: unavailable
memory_efficient_attention.fa3B@0.0.0: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm_search@0.7.1: available
sp24._cslt_sparse_mm@0.7.1: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.9.0.dev20250720+cu129
pytorch.cuda: available
gpu.compute_capability: 12.0
gpu.name: NVIDIA RTX PRO 6000 Blackwell Workstation Edition
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1209
build.hip_version: None
build.python_version: 3.13.5
build.torch_version: 2.9.0.dev20250720+cu129
build.env.TORCH_CUDA_ARCH_LIST: 12.0
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.9.86
source.privacy: open source
Driver
$ nvidia-smi
Mon Jul 21 16:16:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03 Driver Version: 575.64.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... Off | 00000000:02:00.0 On | Off |
| 30% 36C P0 48W / 600W | 6163MiB / 97887MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+