Model Guides¶
Step-by-step guides for training each supported model architecture.
Image Models¶
Flow Matching¶
| Model | Parameters | Guide |
|---|---|---|
| Flux.1 | 12B | Flux.1 Guide |
| Flux.2 | 32B | Flux.2 Guide |
| Flux Kontext | 12B | Kontext Guide |
| Chroma | 8.9B | Chroma Guide |
| Stable Diffusion 3 | 2-8B | SD3 Guide |
| Auraflow | 6.8B | Auraflow Guide |
| Sana | 0.6-4.8B | Sana Guide |
| Lumina2 | 2B | Lumina2 Guide |
| HiDream | 17B MoE | HiDream Guide |
| Z-Image | - | Z-Image Guide |
DiT / Transformer¶
| Model | Parameters | Guide |
|---|---|---|
| PixArt Sigma | 0.6-0.9B | Sigma Guide |
| Cosmos2 | 2-14B | Cosmos2 Guide |
| OmniGen | 3.8B | OmniGen Guide |
| Qwen Image | 20B | Qwen Guide |
| LongCat Image | 6B | LongCat Guide |
| Kandinsky 5 | - | Kandinsky Guide |
U-Net¶
| Model | Parameters | Guide |
|---|---|---|
| Stable Diffusion XL | 3.5B | SDXL Guide |
| Kolors | 5B | Kolors Guide |
| Stable Cascade | - | Cascade Guide |
Image Editing¶
| Model | Guide |
|---|---|
| Qwen Edit | Qwen Edit Guide |
| LongCat Edit | LongCat Edit Guide |
Video Models¶
| Model | Parameters | Guide |
|---|---|---|
| Wan Video | 1.3-14B | Wan Guide |
| LTX Video | 5B | LTX Guide |
| LTX Video 2 | 19B | LTX Video 2 Guide |
| Hunyuan Video | 8.3B | Hunyuan Guide |
| Sana Video | - | Sana Video Guide |
| Kandinsky 5 Video | - | Kandinsky Video Guide |
| LongCat Video | - | LongCat Video Guide |
| LongCat Video Edit | - | LongCat Video Edit Guide |
Audio Models¶
| Model | Size / Version | Guide |
|---|---|---|
| ACE-Step | 3.5B / 1.5 | ACE-Step Guide |
| HeartMuLa | 3B | HeartMuLa Guide |
Choosing a Model¶
For beginners:
- Start with Flux.1 for high-quality image generation
- Use LoRA training to reduce memory requirements
For production:
- SD3 or SDXL for broad compatibility
- Flux.2 for maximum quality (requires more VRAM)
For video:
- Wan Video for best quality/resource balance
- Hunyuan Video for I2V with super-resolution
For specific use cases:
- Flux Kontext for image editing/conditioning
- ACE-Step for text-to-music LoRA training (v1 and v1.5)
- HeartMuLa for autoregressive text-to-audio