Self-Flow (internal alignment)¶
Self-Flow is a CREPA mode that replaces the external vision encoder with a cleaner EMA teacher view of the same model. It follows the Black Forest Labs paper idea closely: train the student on a mixed tokenwise noise schedule, run the EMA teacher on a cleaner view, and align internal hidden states while keeping the normal generative loss.
Compared with nearby SimpleTuner methods:
| Method | Teacher source | Noise asymmetry | Extra teacher model | Main idea |
|---|---|---|---|---|
| REPA / VIDEO_CREPA | Frozen external encoder, usually DINOv2 | No | Yes | Align model hidden states to external semantic features |
| LayerSync | Deeper layer from the same forward pass | No | No | Align an earlier layer to a stronger later layer |
| TwinFlow | EMA teacher and recursive trajectory targets | No tokenwise cleaner/noisier split | No external model | Few-step trajectory matching, optionally with negative-time sign semantics |
| Self-Flow | EMA teacher from the same model on a cleaner view | Yes | No external model | Learn stronger internal representations through dual-timestep scheduling |
Looking for external-encoder alignment? See IMAGE_REPA.md for REPA / U-REPA and VIDEO_CREPA.md for temporal CREPA.
When to use it¶
- You want the BFL-style self-supervised regularizer instead of an external encoder.
- You are training a transformer family that already exposes Self-Flow hooks in SimpleTuner.
- You want the same backbone regularizer to help standard generation, editing, and multimodal training without shipping DINO checkpoints.
- You already use EMA or are willing to enable it. Self-Flow requires an EMA teacher.
Supported families currently include:
- Image / edit:
flux,flux2,sd3,pixart,sana,qwen_image,chroma,hidream,auraflow,lumina2,z_image,z_image_omni,kandinsky5_image,longcat_image,omnigen,ace_step - Video / multimodal:
wan,wan_s2v,ltxvideo,ltxvideo2,sanavideo,kandinsky5_video,hunyuanvideo,longcat_video,cosmos,anima
Quick setup (WebUI)¶
- Open Training → Loss functions.
- Enable CREPA.
- Set CREPA Feature Source to
self_flow. - Set CREPA Block Index to an earlier student block. Start with
8on 24-layer DiTs and10on deeper/wider stacks. - Set CREPA Teacher Block Index to a deeper teacher block. Good starting points are
16or20. - Keep Weight at
0.5to start. - Set Self-Flow Mask Ratio to:
0.25for image models0.10for video models0.50for audio-heavy models such asace_step- Make sure EMA is enabled.
- Do not combine it with TwinFlow.
Logs will include the normal CREPA metrics (crepa_loss, crepa_alignment_score) plus the standard training loss.
Quick setup (config JSON / CLI)¶
{
"use_ema": true,
"crepa_enabled": true,
"crepa_feature_source": "self_flow",
"crepa_block_index": 8,
"crepa_teacher_block_index": 16,
"crepa_lambda": 0.5,
"crepa_self_flow_mask_ratio": 0.25
}
Legacy configs can still use:
Prefer crepa_feature_source=self_flow for new configs.
Tuning knobs¶
crepa_block_index: student block to supervise. Earlier blocks usually work better.crepa_teacher_block_index: deeper EMA teacher block. Required for Self-Flow.crepa_lambda: alignment strength. Start at0.5; reduce if generations look over-regularized.crepa_self_flow_mask_ratio: fraction of tokens that receive the alternate timestep. Must stay in[0.0, 0.5].crepa_scheduler,crepa_warmup_steps,crepa_decay_steps,crepa_lambda_end,crepa_cutoff_step: same scheduling controls as CREPA. They work well if you want Self-Flow to decay later in training.crepa_use_backbone_features: a different mode. Do not combine it with Self-Flow.crepa_feature_source=self_flow: preferred selector for the mode.
Sampling / validation¶
Self-Flow changes training, not the basic inference algorithm.
- Training uses mixed tokenwise noise on the student and a cleaner EMA teacher view.
- Validation loss still evaluates the requested homogeneous timestep schedule.
- Normal sampling stays unchanged. You do not run dual-timestep masking at inference.
If you want better sampler defaults, tune them after training as you would for any other model. Self-Flow itself does not require a new sampler.
How it works (practitioner)
- Sample two timesteps and assign them across tokens with a random mask. - Build a student view with mixed corruption and a teacher view with the cleaner timestep. - Run the student normally and the EMA teacher under `no_grad`. - Align an earlier student layer to a deeper teacher layer with cosine similarity while still training on the normal generative loss. - On edit / context architectures such as `flux2`, reference tokens stay clean while target image tokens receive the mixed schedule.Technical (SimpleTuner internals)
- Source selection lives in `simpletuner/helpers/training/crepa.py` via `CrepaFeatureSource.SELF_FLOW`. - Shared batch builders are in `ModelFoundation._prepare_image_crepa_self_flow_batch` and `_prepare_video_crepa_self_flow_batch`. - The EMA teacher pass is run from `ImageModelFoundation.auxiliary_loss` / `VideoModelFoundation.auxiliary_loss` through `_run_crepa_teacher_forward`. - Validation and training-time inference now rebuild homogeneous eval batches when `custom_timesteps` are requested, so eval loss is not polluted by the mixed Self-Flow training batch. - Families that support Self-Flow implement `supports_crepa_self_flow()` and a model-specific `_prepare_crepa_self_flow_batch()` when they need custom token handling.Common pitfalls¶
- EMA disabled: Self-Flow requires
use_ema=true. - Teacher block unset: Set
crepa_teacher_block_index; startup validation will reject missing values. - TwinFlow enabled: not supported together.
- Wrong family: only model families that implement
supports_crepa_self_flow()can use this mode. - Mask ratio too high: stay at or below
0.5; aggressive values can make training unstable. - Expecting a special sampler: inference stays standard. Self-Flow is a training regularizer, not a new generation-time schedule.
- Confusing it with backbone mode:
crepa_use_backbone_features=trueis not Self-Flow. Self-Flow requires the cleaner EMA teacher view.