Hunyuan Video 1.5 Quickstart¶
This guide walks through training a LoRA on Tencent's 8.3B Hunyuan Video 1.5 release (tencent/HunyuanVideo-1.5) using SimpleTuner.
Hardware requirements¶
Hunyuan Video 1.5 is a large model (8.3B parameters).
- Minimum: 24GB-32GB VRAM is comfortable for a Rank-16 LoRA with full gradient checkpointing at 480p.
- Recommended: A6000 / A100 (48GB-80GB) for 720p training or larger batch sizes.
- System RAM: 64GB+ is recommended to handle model loading.
Memory offloading (optional)¶
Add the following to your config.json:
View example config
--group_offload_use_stream: Only works on CUDA devices.- Do not combine this with
--enable_model_cpu_offload.
Prerequisites¶
Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.13.
You can check this by running:
If you don't have python 3.13 installed on Ubuntu, you can try the following:
Container image dependencies¶
For Vast, RunPod, and TensorDock (among others), the following will work on a CUDA 12.2-12.8 image to enable compiling of CUDA extensions:
AMD ROCm follow-up steps¶
The following must be executed for an AMD MI300X to be useable:
apt install amd-smi-lib
pushd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install .
popd
Installation¶
Install SimpleTuner via pip:
pip install 'simpletuner[cuda]'
# CUDA 13 / Blackwell users (NVIDIA B-series GPUs)
pip install 'simpletuner[cuda13]' --extra-index-url https://download.pytorch.org/whl/cu130
For manual installation or development setup, see the installation documentation.
Required checkpoints¶
The main tencent/HunyuanVideo-1.5 repo contains the transformer/vae/scheduler, but the text encoder (text_encoder/llm) and vision encoder (vision_encoder/siglip) live in separate downloads. Point SimpleTuner at your local copies before launching:
export HUNYUANVIDEO_TEXT_ENCODER_PATH=/path/to/text_encoder_root
export HUNYUANVIDEO_VISION_ENCODER_PATH=/path/to/vision_encoder_root
If these are unset, SimpleTuner tries to pull them from the model repo; most mirrors do not bundle them, so set the paths explicitly to avoid startup errors.
Setting up the environment¶
Web interface method¶
The SimpleTuner WebUI makes setup fairly straightforward. To run the server:
This will create a webserver on port 8001 by default, which you can access by visiting http://localhost:8001.
Manual / command-line method¶
To run SimpleTuner via command-line tools, you will need to set up a configuration file, the dataset and model directories, and a dataloader configuration file.
Configuration file¶
An experimental script, configure.py, may allow you to entirely skip this section through an interactive step-by-step configuration.
Note: This doesn't configure your dataloader. You will still have to do that manually, later.
To run it:
If you prefer to manually configure:
Copy config/config.json.example to config/config.json:
Key configuration overrides for HunyuanVideo:
View example config
{
"model_type": "lora",
"model_family": "hunyuanvideo",
"pretrained_model_name_or_path": "tencent/HunyuanVideo-1.5",
"model_flavour": "t2v-480p",
"output_dir": "output/hunyuan-video",
"validation_resolution": "854x480",
"validation_num_video_frames": 61,
"validation_guidance": 6.0,
"train_batch_size": 1,
"gradient_accumulation_steps": 1,
"learning_rate": 1e-4,
"mixed_precision": "bf16",
"optimizer": "adamw_bf16",
"lora_rank": 16,
"enable_group_offload": true,
"group_offload_type": "block_level",
"dataset_backend_config": "config/multidatabackend.json"
}
model_flavouroptions:t2v-480p(Default)t2v-720pi2v-480p(Image-to-Video)i2v-720p(Image-to-Video)validation_num_video_frames: Must be(frames - 1) % 4 == 0. E.g., 61, 129.
Advanced Experimental Features¶
Show advanced experimental details
SimpleTuner includes experimental features that can significantly improve training stability and performance. * **[Scheduled Sampling (Rollout)](../experimental/SCHEDULED_SAMPLING.md):** reduces exposure bias and improves output quality by letting the model generate its own inputs during training. > ⚠️ These features increase the computational overhead of training. #### Dataset considerations Create a `--data_backend_config` (`config/multidatabackend.json`) document containing this:[
{
"id": "my-video-dataset",
"type": "local",
"dataset_type": "video",
"instance_data_dir": "datasets/videos",
"caption_strategy": "textfile",
"resolution": 480,
"video": {
"num_frames": 61,
"min_frames": 61,
"frame_rate": 24,
"bucket_strategy": "aspect_ratio"
},
"repeats": 10
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/hunyuan",
"disabled": false
}
]
Executing the training run¶
From the SimpleTuner directory:
Notes & troubleshooting tips¶
VRAM Optimization¶
- Group Offload: Essential for consumer GPUs. Ensure
enable_group_offloadis true. - Resolution: Stick to 480p (
854x480or similar) if you have limited VRAM. 720p (1280x720) increases memory usage significantly. - Quantization: Use
base_model_precision(bf16default);int8-torchaoworks for further savings at the cost of speed. - VAE patch convolution: For HunyuanVideo VAE OOMs, set
--vae_enable_patch_conv=true(or toggle in the UI). This slices 3D conv/attention work to lower peak VRAM; expect a small throughput hit.
Image-to-Video (I2V)¶
- Use
model_flavour="i2v-480p"ori2v-720p. - SimpleTuner automatically uses the first frame of your video dataset samples as the conditioning image during training.
I2V Validation Options¶
For validation with i2v models, you have two options:
-
Auto-extracted first frame: By default, validation uses the first frame from video samples in your dataset.
-
Separate image dataset (simpler setup): Use
--validation_using_datasets=truewith--eval_dataset_idpointing to an image dataset. This allows you to use any image dataset as the first-frame conditioning input for validation videos, without needing to set up the complex conditioning dataset pairing used during training.
Example config for option 2:
Text Encoders¶
Hunyuan uses a dual text encoder setup (LLM + CLIP). Ensure your system RAM can handle loading these during the caching phase.