Skip to content

Model Guides

Step-by-step guides for training each supported model architecture.

Image Models

Flow Matching

Model Parameters Guide
Flux.1 12B Flux.1 Guide
Flux.2 32B Flux.2 Guide
Flux Kontext 12B Kontext Guide
Chroma 8.9B Chroma Guide
Stable Diffusion 3 2-8B SD3 Guide
Auraflow 6.8B Auraflow Guide
Sana 0.6-4.8B Sana Guide
Lumina2 2B Lumina2 Guide
HiDream 17B MoE HiDream Guide
Z-Image - Z-Image Guide

DiT / Transformer

Model Parameters Guide
PixArt Sigma 0.6-0.9B Sigma Guide
Cosmos2 2-14B Cosmos2 Guide
OmniGen 3.8B OmniGen Guide
Qwen Image 20B Qwen Guide
LongCat Image 6B LongCat Guide
Kandinsky 5 - Kandinsky Guide

U-Net

Model Parameters Guide
Stable Diffusion XL 3.5B SDXL Guide
Kolors 5B Kolors Guide
Stable Cascade - Cascade Guide

Image Editing

Model Guide
Qwen Edit Qwen Edit Guide
LongCat Edit LongCat Edit Guide

Video Models

Model Parameters Guide
Wan Video 1.3-14B Wan Guide
LTX Video 5B LTX Guide
LTX Video 2 19B LTX Video 2 Guide
Hunyuan Video 8.3B Hunyuan Guide
Sana Video - Sana Video Guide
Kandinsky 5 Video - Kandinsky Video Guide
LongCat Video - LongCat Video Guide
LongCat Video Edit - LongCat Video Edit Guide

Audio Models

Model Size / Version Guide
ACE-Step 3.5B / 1.5 ACE-Step Guide
HeartMuLa 3B HeartMuLa Guide

Choosing a Model

For beginners:

  • Start with Flux.1 for high-quality image generation
  • Use LoRA training to reduce memory requirements

For production:

  • SD3 or SDXL for broad compatibility
  • Flux.2 for maximum quality (requires more VRAM)

For video:

  • Wan Video for best quality/resource balance
  • Hunyuan Video for I2V with super-resolution

For specific use cases:

  • Flux Kontext for image editing/conditioning
  • ACE-Step for text-to-music LoRA training (v1 and v1.5)
  • HeartMuLa for autoregressive text-to-audio