Skip to content

Dataset configuration presets

For various large-scale datasets on Hugging Face Hub, configuration details are provided here to give a head start on making things work.

Tip: For large regularization datasets, you can use max_num_samples to limit the dataset to a deterministic random subset. See DATALOADER.md for details.

To add a new preset, use this template to submit a new pull-request.