T-LoRA (Timestep-dependent LoRA)¶
Background¶
Standard LoRA fine-tuning applies a fixed low-rank adaptation uniformly across all diffusion timesteps. When training data is limited (especially single-image customisation), this leads to overfitting — the model memorises noise patterns at high-noise timesteps where little semantic information exists.
T-LoRA (Soboleva et al., 2025) solves this by dynamically adjusting the number of active LoRA ranks based on the current diffusion timestep:
- High noise (early denoising, \(t \to T\)): fewer ranks are active, preventing the model from memorising uninformative noise patterns.
- Low noise (late denoising, \(t \to 0\)): more ranks are active, allowing the model to capture fine concept details.
SimpleTuner's T-LoRA support is built on the LyCORIS library and requires a LyCORIS version that includes the lycoris.modules.tlora module.
🟡 Experimental: T-LoRA with video models may produce subpar results because temporal compression blends frames across timestep boundaries.
Quick setup¶
1. Set your training config¶
In your config.json, use LyCORIS with a separate T-LoRA config file:
{
"model_type": "lora",
"lora_type": "lycoris",
"lycoris_config": "config/lycoris_tlora.json",
"validation_lycoris_strength": 1.0
}
2. Create the LyCORIS T-LoRA config¶
Create config/lycoris_tlora.json:
{
"algo": "tlora",
"multiplier": 1.0,
"linear_dim": 64,
"linear_alpha": 32,
"apply_preset": {
"target_module": ["Attention", "FeedForward"]
}
}
That is all you need to start training. The sections below cover optional tuning and inference.
Configuration reference¶
Required fields¶
| Field | Type | Description |
|---|---|---|
algo |
string | Must be "tlora" |
multiplier |
float | LoRA strength multiplier. Keep at 1.0 unless you know what you are doing |
linear_dim |
int | LoRA rank. This becomes max_rank in the masking schedule |
linear_alpha |
int | LoRA scaling factor (separate from tlora_alpha) |
Optional fields¶
| Field | Type | Default | Description |
|---|---|---|---|
tlora_min_rank |
int | 1 |
Minimum active ranks at the highest noise level |
tlora_alpha |
float | 1.0 |
Masking schedule exponent. 1.0 is linear; values above 1.0 shift more capacity toward fine-detail steps |
apply_preset |
object | — | Module targeting via target_module and module_algo_map |
Model-specific module targets¶
For most models the generic ["Attention", "FeedForward"] targets work. For Flux 2 (Klein), use the custom class names:
{
"algo": "tlora",
"multiplier": 1.0,
"linear_dim": 64,
"linear_alpha": 32,
"apply_preset": {
"target_module": [
"Flux2Attention", "Flux2FeedForward", "Flux2ParallelSelfAttention"
]
}
}
See the LyCORIS documentation for the full list of per-model module targets.
Tuning knobs¶
linear_dim (rank)¶
Higher rank = more parameters and expressivity, but more prone to overfitting with limited data. The original T-LoRA paper uses rank 64 for SDXL single-image customisation.
tlora_min_rank¶
Controls the floor for rank activation at the noisiest timestep. Increasing this lets the model learn coarser structure but reduces the overfitting benefit. Start with the default of 1 and raise only if convergence is too slow.
tlora_alpha (schedule exponent)¶
Controls the curve shape of the masking schedule:
1.0— linear interpolation betweenmin_rankandmax_rank> 1.0— more aggressive masking at high noise; most ranks only activate near the end of denoising< 1.0— gentler masking; ranks activate earlier
Schedule visualisation (rank vs. timestep)
With `linear_dim=64`, `tlora_min_rank=1`, for a 1000-step scheduler:alpha=1.0 (linear):
t=0 (clean) → 64 active ranks
t=250 (25%) → 48 active ranks
t=500 (50%) → 32 active ranks
t=750 (75%) → 16 active ranks
t=999 (noise) → 1 active rank
alpha=2.0 (quadratic — biased toward detail):
t=0 (clean) → 64 active ranks
t=250 (25%) → 60 active ranks
t=500 (50%) → 48 active ranks
t=750 (75%) → 20 active ranks
t=999 (noise) → 1 active rank
alpha=0.5 (sqrt — biased toward structure):
t=0 (clean) → 64 active ranks
t=250 (25%) → 55 active ranks
t=500 (50%) → 46 active ranks
t=750 (75%) → 33 active ranks
t=999 (noise) → 1 active rank
Inference with SimpleTuner pipelines¶
SimpleTuner's vendored pipelines have built-in T-LoRA support. During validation, the masking parameters from training are automatically reused at each denoising step — no extra configuration is needed.
For standalone inference outside of training, you can import SimpleTuner's pipeline directly and set the _tlora_config attribute. This ensures the per-step masking matches what the model was trained with.
SDXL example¶
import torch
from lycoris import create_lycoris_from_weights
# Use SimpleTuner's vendored SDXL pipeline (has T-LoRA support built in)
from simpletuner.helpers.models.sdxl.pipeline import StableDiffusionXLPipeline
from diffusers import AutoencoderKL, EulerDiscreteScheduler
from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTokenizer
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
dtype = torch.bfloat16
device = "cuda"
# Load pipeline components
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae = AutoencoderKL.from_pretrained(model_id, subfolder="vae", torch_dtype=dtype)
text_encoder = CLIPTextModel.from_pretrained(model_id, subfolder="text_encoder", torch_dtype=dtype)
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_id, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer = CLIPTokenizer.from_pretrained(model_id, subfolder="tokenizer")
tokenizer_2 = CLIPTokenizer.from_pretrained(model_id, subfolder="tokenizer_2")
from diffusers import UNet2DConditionModel
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", torch_dtype=dtype)
# Load and apply LyCORIS T-LoRA weights
lora_path = "path/to/pytorch_lora_weights.safetensors"
wrapper, _ = create_lycoris_from_weights(1.0, lora_path, unet)
wrapper.merge_to()
unet.to(device)
pipe = StableDiffusionXLPipeline(
scheduler=scheduler,
vae=vae,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
unet=unet,
)
# Enable T-LoRA inference masking — must match training config
pipe._tlora_config = {
"max_rank": 64, # linear_dim from your lycoris config
"min_rank": 1, # tlora_min_rank (default 1)
"alpha": 1.0, # tlora_alpha (default 1.0)
}
with torch.inference_mode():
image = pipe(
prompt="a sks dog riding a surfboard",
width=1024,
height=1024,
num_inference_steps=25,
guidance_scale=5.0,
).images[0]
image.save("tlora_output.png")
Flux example¶
import torch
from lycoris import create_lycoris_from_weights
# Use SimpleTuner's vendored Flux pipeline (has T-LoRA support built in)
from simpletuner.helpers.models.flux.pipeline import FluxPipeline
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKL
from diffusers.models.transformers.transformer_flux import FluxTransformer2DModel
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
device = "cuda"
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(bfl_repo, subfolder="scheduler")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=dtype)
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer_2 = T5TokenizerFast.from_pretrained(bfl_repo, subfolder="tokenizer_2")
vae = AutoencoderKL.from_pretrained(bfl_repo, subfolder="vae", torch_dtype=dtype)
transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=dtype)
# Load and apply LyCORIS T-LoRA weights
lora_path = "path/to/pytorch_lora_weights.safetensors"
wrapper, _ = create_lycoris_from_weights(1.0, lora_path, transformer)
wrapper.merge_to()
transformer.to(device)
pipe = FluxPipeline(
scheduler=scheduler,
text_encoder=text_encoder,
tokenizer=tokenizer,
text_encoder_2=text_encoder_2,
tokenizer_2=tokenizer_2,
vae=vae,
transformer=transformer,
)
# Enable T-LoRA inference masking
pipe._tlora_config = {
"max_rank": 64,
"min_rank": 1,
"alpha": 1.0,
}
with torch.inference_mode():
image = pipe(
prompt="a sks dog riding a surfboard",
width=1024,
height=1024,
num_inference_steps=25,
guidance_scale=3.5,
).images[0]
image.save("tlora_flux_output.png")
Note: You must use SimpleTuner's vendored pipeline (e.g.
simpletuner.helpers.models.flux.pipeline.FluxPipeline), not the stock Diffusers pipeline. Only the vendored pipelines contain the per-step T-LoRA masking logic.
Why not just use merge_to() and skip the masking?¶
merge_to() bakes the LoRA weights into the base model permanently — this is needed so the LoRA parameters are active during the forward pass. However, T-LoRA was trained with timestep-dependent rank masking: certain ranks were zeroed out depending on the noise level. Without reapplying that same masking during inference, all ranks fire at every timestep, producing over-saturated or burnt-looking images.
Setting _tlora_config on the pipeline tells the denoising loop to apply the correct mask before each model forward pass and clear it afterward.
How the masking works internally
At each denoising step, the pipeline calls:from simpletuner.helpers.training.lycoris import apply_tlora_inference_mask, clear_tlora_mask
_tlora_cfg = getattr(self, "_tlora_config", None)
if _tlora_cfg:
apply_tlora_inference_mask(
timestep=int(t),
max_timestep=self.scheduler.config.num_train_timesteps,
max_rank=_tlora_cfg["max_rank"],
min_rank=_tlora_cfg["min_rank"],
alpha=_tlora_cfg["alpha"],
)
try:
noise_pred = self.unet(...) # or self.transformer(...)
finally:
if _tlora_cfg:
clear_tlora_mask()
How SimpleTuner passes config during validation
During training, the T-LoRA config dict (`max_rank`, `min_rank`, `alpha`) is stored on the Accelerator object. When validation runs, `validation.py` copies this config onto the pipeline: This is fully automatic — no user configuration is required for validation images to use the correct masking.Upstream: the T-LoRA paper¶
Paper details and algorithm
**T-LoRA: Single Image Diffusion Model Customization Without Overfitting** Vera Soboleva, Aibek Alanov, Andrey Kuznetsov, Konstantin Sobolev [arXiv:2507.05964](https://arxiv.org/abs/2507.05964) — Accepted to AAAI 2026 The paper introduces two complementary innovations: ### 1. Timestep-dependent rank masking The core insight is that higher diffusion timesteps (noisier inputs) are more prone to overfitting than lower timesteps. At high noise, the latent contains mostly random noise with little semantic signal — training a full-rank adapter on this teaches the model to memorise noise patterns rather than learn the target concept. T-LoRA addresses this with a dynamic masking schedule that restricts the active LoRA rank based on the current timestep. ### 2. Orthogonal weight parametrisation (optional) The paper also proposes initialising LoRA weights via SVD decomposition of the original model weights, enforcing orthogonality through a regularisation loss. This ensures independence between adapter components. SimpleTuner's LyCORIS integration focuses on the timestep masking component, which is the primary driver of the overfitting reduction. The orthogonal initialisation is part of the standalone T-LoRA implementation but is not currently used by the LyCORIS `tlora` algorithm. ### Citation@misc{soboleva2025tlorasingleimagediffusion,
title={T-LoRA: Single Image Diffusion Model Customization Without Overfitting},
author={Vera Soboleva and Aibek Alanov and Andrey Kuznetsov and Konstantin Sobolev},
year={2025},
eprint={2507.05964},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.05964},
}
Common pitfalls¶
- Forgot
_tlora_configduring inference: Images look over-saturated or burnt. All ranks fire at every timestep instead of following the trained masking schedule. - Using stock Diffusers pipeline: The stock pipelines do not contain T-LoRA masking logic. You must use SimpleTuner's vendored pipelines.
linear_dimmismatch: Themax_rankin_tlora_configmust match thelinear_dimused during training, or the mask dimensions will be wrong.- Video models: Temporal compression blends frames across timestep boundaries, which can weaken the timestep-dependent masking signal. Results may be subpar.
- SDXL + FeedForward modules: Training FeedForward modules with LyCORIS on SDXL can cause NaN loss — this is a general LyCORIS issue, not specific to T-LoRA. See the LyCORIS documentation for details.