Setup¶
For users that wish to make use of Docker or another container orchestration platform, see this document first.
Installation¶
For users operating on Windows 10 or newer, an installation guide based on Docker and WSL is available here this document.
Pip installation method¶
You can simply install SimpleTuner using pip, which is recommended for most users:
# for CUDA
pip install 'simpletuner[cuda]'
# for CUDA 13 / Blackwell (NVIDIA B-series GPUs)
pip install 'simpletuner[cuda13]' --extra-index-url https://download.pytorch.org/whl/cu130
# for ROCm
pip install 'simpletuner[rocm]' --extra-index-url https://download.pytorch.org/whl/rocm7.1
# for Apple Silicon
pip install 'simpletuner[apple]'
# for CPU-only (not recommended)
pip install 'simpletuner[cpu]'
# for JPEG XL support (optional)
pip install 'simpletuner[jxl]'
# development requirements (optional, only for submitting PRs or running tests)
pip install 'simpletuner[dev]'
Git repository method¶
For local development or testing, you can clone the SimpleTuner repository and set up the python venv:
git clone --branch=release https://github.com/bghira/SimpleTuner.git
cd SimpleTuner
# if python --version shows 3.11 or 3.12, you may want to upgrade to 3.13.
python3.13 -m venv .venv
source .venv/bin/activate
ℹ️ You can use your own custom venv path by setting
export VENV_PATH=/path/to/.venvin yourconfig/config.envfile.
Note: We're currently installing the release branch here; the main branch may contain experimental features that might have better results or lower memory use.
Install SimpleTuner with automatic platform detection:
# Basic installation (auto-detects CUDA/ROCm/Apple)
pip install -e .
# With JPEG XL support
pip install -e .[jxl]
Note: The setup.py automatically detects your platform (CUDA/ROCm/Apple) and installs the appropriate dependencies.
NVIDIA Hopper / Blackwell follow-up steps¶
Optionally, Hopper (or newer) equipment can make use of FlashAttention3 for improved inference and training performance when making use of torch.compile
You'll need to run the following sequence of commands from your SimpleTuner directory, with your venv active:
git clone https://github.com/Dao-AILab/flash-attention
pushd flash-attention
pushd hopper
python setup.py install
popd
popd
⚠️ Managing the flash_attn build is poorly-supported in SimpleTuner, currently. This can break on updates, requiring you to re-run this build procedure manually from time-to-time.
AMD ROCm follow-up steps¶
The following must be executed for an AMD MI300X to be useable:
apt install amd-smi-lib
pushd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install .
popd
ℹ️ ROCm acceleration defaults: When SimpleTuner detects a HIP-enabled PyTorch build it automatically exports
PYTORCH_TUNABLEOP_ENABLED=1(unless you already set it) so TunableOp kernels are available. On MI300/gfx94x devices we also setHIPBLASLT_ALLOW_TF32=1by default, enabling hipBLASLt’s TF32 paths without requiring manual environment tweaks.
All platforms¶
- 2a. Option One (Recommended): Run
simpletuner configure - 2b. Option Two: Copy
config/config.json.exampletoconfig/config.jsonand then fill in the details.
⚠️ For users located in countries where Hugging Face Hub is not readily accessible, you should add
HF_ENDPOINT=https://hf-mirror.comto your~/.bashrcor~/.zshrcdepending on which$SHELLyour system uses.
Multiple GPU training¶
SimpleTuner now includes automatic GPU detection and configuration through the WebUI. Upon first load, you'll be guided through an onboarding step that detects your GPUs and configures Accelerate automatically.
WebUI Auto-Detection (Recommended)¶
When you first launch the WebUI or use simpletuner configure, you'll encounter an "Accelerate GPU Defaults" onboarding step that:
- Automatically detects all available GPUs on your system
- Shows GPU details including name, memory, and device IDs
- Recommends optimal settings for multi-GPU training
-
Offers three configuration modes:
-
Auto Mode (Recommended): Uses all detected GPUs with optimal process count
- Manual Mode: Select specific GPUs or set a custom process count
- Disabled Mode: Single GPU training only
How it works:
- The system detects your GPU hardware via CUDA/ROCm
- Calculates optimal --num_processes based on available devices
- Sets CUDA_VISIBLE_DEVICES automatically when specific GPUs are selected
- Saves your preferences for future training runs
Manual Configuration¶
If not using the WebUI, you can control GPU visibility directly in your config.json:
This will restrict training to GPUs 0, 1, and 2, launching 3 processes.
- If you are using
--report_to='wandb'(the default), the following will help you report your statistics:
Follow the instructions that are printed, to locate your API key and configure it.
Once that is done, any of your training sessions and validation data will be available on Weights & Biases.
ℹ️ If you would like to disable Weights & Biases or Tensorboard reporting entirely, use
--report-to=none
- Launch training with simpletuner; logs will be written to
debug.log
⚠️ At this point, if you used
simpletuner configure, you are done! If not - these commands will work, but further configuration is required. See the tutorial for more information.
Run unit tests¶
To run unit tests to ensure that installation has completed successfully:
Advanced: Multiple configuration environments¶
For users who train multiple models or need to quickly switch between different datasets or settings, two environment variables are inspected at startup.
To use them:
envwill default todefault, which points to the typicalSimpleTuner/config/directory that this guide helped you configure- Using
simpletuner train env=pixartwould useSimpleTuner/config/pixartdirectory to findconfig.env config_backendwill default toenv, which uses the typicalconfig.envfile this guide helped you configure- Supported options:
env,json,toml, orcmdif you rely on runningtrain.pymanually - Using
simpletuner train config_backend=jsonwould search forSimpleTuner/config/config.jsoninstead ofconfig.env - Similarly,
config_backend=tomlwill useconfig.env
You can create config/config.env that contains one or both of these values:
They will be remembered upon subsequent runs. Note that these can be added in addition to the multiGPU options described above.
Training Data¶
A publicly-available dataset is available on Hugging Face Hub with approximately 10k images with captions as filenames, ready for use with SimpleTuner.
You can organize images in a single folder or neatly organize them into subdirectories.
Image Selection Guidelines¶
Quality Requirements: - No JPEG artifacts or blurry images - modern models will pick these up - Avoid grainy CMOS sensor noise (will appear in all generated images) - No watermarks, badges, or signatures (these will be learned) - Movie frames generally don't work due to compression (use production stills instead)
Technical Specifications: - Images optimally divisible by 64 (allows reuse without resizing) - Mix square and non-square images for balanced capabilities - Use varied, high-quality datasets for best results
Captioning¶
SimpleTuner provides captioning scripts for mass-renaming files. Caption formats supported:
- Filename as caption (default)
- Text files with --caption_strategy=textfile
- JSONL, CSV, or advanced metadata files
Recommended captioning tools: - InternVL2: Best quality but slow (small datasets) - BLIP3: Best lightweight option with good instruction following - Florence2: Fastest but some dislike outputs
Training Batch Size¶
Your maximum batch size depends on VRAM and resolution:
Key principles: - Use highest batch size possible without VRAM issues - Higher resolution = more VRAM = lower batch size - If batch size 1 at 128x128 doesn't work, hardware is insufficient
Multi-GPU Dataset Requirements¶
When training with multiple GPUs, your dataset must be large enough for the effective batch size:
Example: With 4 GPUs and train_batch_size=4, you need at least 16 samples per aspect bucket.
Solutions for small datasets:
- Use --allow_dataset_oversubscription to auto-adjust repeats
- Manually set repeats in your dataloader config
- Reduce batch size or GPU count
See DATALOADER.md for complete details.
Publishing to Hugging Face Hub¶
To automatically push models to Hub upon completion, add to config/config.json:
Login before training:
Debugging¶
Enable detailed logging by adding to config/config.env:
A debug.log file will be created in the project root with all log entries.