Dataset configuration presets¶
For various large-scale datasets on Hugging Face Hub, configuration details are provided here to give a head start on making things work.
Tip: For large regularization datasets, you can use
max_num_samplesto limit the dataset to a deterministic random subset. See DATALOADER.md for details.
To add a new preset, use this template to submit a new pull-request.
- DALLE-3 1M
- bghira/photo-concept-bucket
- Midjourney v6 520k
- Nijijourney v6 520k
- Subjects200K, an example on using the
datasetslibrary