Skip to content

ERNIE-Image [base / turbo] Quickstart

In this example, we'll be training an ERNIE-Image LoRA. ERNIE-Image is Baidu's single-stream flow-matching transformer family and uses the same Flux2-style VAE class inside diffusers. SimpleTuner supports both base and turbo flavours.

Hardware requirements

ERNIE is not a small model. Plan around the same general class of hardware you would reserve for other large single-stream transformers:

  • a realistic target is a 24G+ GPU when using int8 quantisation plus bf16 LoRA weights
  • 16G can work with aggressive offload, RamTorch, and slow iteration speed
  • multi-GPU, FSDP2, and additional CPU/RAM offload all help if you want to avoid a single large GPU

Apple GPUs are not recommended for training.

Memory offloading (optional)

RamTorch is already a good default for ERNIE because its text encoder is large. If you still need more VRAM headroom, grouped module offloading is also available:

--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream
  • Streams are only effective on CUDA.
  • Do not combine multiple unrelated CPU offload systems unless you know why.
  • Group offload is not compatible with Quanto quantisation.

Prerequisites

SimpleTuner works well with Python 3.10 through 3.13.

python --version

If needed on Ubuntu:

apt -y install python3.13 python3.13-venv

Container image dependencies

For CUDA 12.x images:

apt -y install nvidia-cuda-toolkit-12-8

Installation

Install SimpleTuner via pip:

pip install 'simpletuner[cuda]'

# CUDA 13 / Blackwell users
pip install 'simpletuner[cuda13]' --extra-index-url https://download.pytorch.org/whl/cu130

For manual installation or development setup, see the installation documentation.

AMD ROCm follow-up steps

For AMD MI300X:

apt install amd-smi-lib
pushd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install .
popd

Setting up the environment

Web interface method

Run the server:

simpletuner server

Then open http://localhost:8001 and select the ERNIE model family in the training wizard.

Manual / command-line method

You can start from the included example:

  • example config: simpletuner/examples/ernie.peft-lora/config.json
  • runnable local env: config/ernie-example/config.json

If you prefer to wire it manually, copy config/config.json.example to config/config.json and change the important values below.

Configuration file

cp config/config.json.example config/config.json

Recommended settings:

  • model_type: lora
  • model_family: ernie
  • model_flavour: base or turbo
  • pretrained_model_name_or_path:
  • base: baidu/ERNIE-Image
  • turbo: baidu/ERNIE-Image-Turbo
  • output_dir: where checkpoints and validation images should be written
  • train_batch_size: start at 1
  • resolution: start at 512
  • mixed_precision: bf16 on modern hardware, fp16 otherwise
  • gradient_checkpointing: true
  • ramtorch: true
  • ramtorch_text_encoder: true

The included example uses:

  • max_train_steps: 100
  • optimizer: optimi-lion
  • learning_rate: 1e-4
  • validation_guidance: 4.0
  • validation_num_inference_steps: 20

That exact example can be run with:

simpletuner train --env ernie-example

Assistant LoRA (Turbo)

SimpleTuner exposes assistant-LoRA support for ERNIE Turbo, but there is no default adapter path bundled for it yet.

  • supported flavour: turbo
  • default weight filename: pytorch_lora_weights.safetensors
  • required user input: assistant_lora_path

If you have a custom assistant adapter, set:

{
  "assistant_lora_path": "your-org/your-ernie-turbo-assistant-lora",
  "assistant_lora_weight_name": "pytorch_lora_weights.safetensors"
}

If you do not want to use one, disable it explicitly:

{
  "disable_assistant_lora": true
}

Dataset / caption setup

The example env uses a tiny DreamBooth-style Hugging Face dataset:

  • dataset_name: RareConcepts/Domokun
  • caption_strategy: instanceprompt
  • instance_prompt: 🟫

That works as a smoke test, but ERNIE responds better to real text than a one-token trigger. For actual training, prefer richer captions or a more descriptive instance prompt such as a studio photo of <token>.

Validation prompts

The same prompt library workflow used by other models works here:

{
  "nickname": "the prompt goes here",
  "another_nickname": "another prompt goes here"
}

Then point to it from config.json:

{
  "--user_prompt_library": "config/user_prompt_library.json"
}

Experimental features

ERNIE also supports the same advanced transformer-side features used by other single-stream families in SimpleTuner:

  • TREAD
  • LayerSync
  • REPA / CREPA-style hidden state capture
  • assistant LoRA loading for turbo

These features are optional. Get the base training run working first.

Notes

  • ERNIE uses a patched tokenizer/text-encoder loader because the upstream text encoder config needs minor fixes during load.
  • The trainer uses the ERNIE timestep convention expected by the upstream model, so keep custom experimentation aligned with the normal flow-matching schedule unless you are intentionally probing edge cases.
  • Start with the provided 512px example before scaling up dataset size or resolution.