cc_staff
282
edits
No edit summary |
|||
Line 79: | Line 79: | ||
==Downloading Datasets== | ==Downloading Datasets== | ||
The exact method to download and use a dataset from the Hugging Face hub depends on a number of factors such as format and the type of task for which the data will be used. Regardless of the exact method used, any download must be performed '''on a login node'''. | The exact method to download and use a dataset from the Hugging Face hub depends on a number of factors such as format and the type of task for which the data will be used. Regardless of the exact method used, any download must be performed '''on a login node'''. See [https://huggingface.co/docs/datasets/loading the package's official documentation] for details on how to download different types of dataset. | ||
Once the | Once the dataset has been downloaded, it will be stored locally in a cache directory, which defaults to <tt>$HOME/.cache/huggingface/datasets</tt>. It is possible to change the default cache location by setting the environment variable <tt>HF_DATASETS_CACHE</tt> '''before''' you import anything from the Datasets package in your python script. | ||
To load a dataset in a job, where there is no internet connection, set the environment variable <tt>HF_DATASETS_OFFLINE=1</tt> and specify the location of the cache directory where the dataset is stored when calling <tt>load_dataset()</tt>: | To load a dataset in a job, where there is no internet connection, set the environment variable <tt>HF_DATASETS_OFFLINE=1</tt> and specify the location of the cache directory where the dataset is stored when calling <tt>load_dataset()</tt>: |