Huggingface: Difference between revisions
(Created page with "[http://huggingface.co Hugging Face] is an organization that builds and maintains several popular open-source software packages widely used in Artificial Intelligence research. In this article, you will find information and tutorials on how to use packages from the Hugging Face ecosystem on our clusters. =Transformers= Transformers is a python package that provides APIs and tools to easily download and train state-of-the-art pre-trained models on various tasks in multi...") |
m (→Using git lfs) |
||
Line 20: | Line 20: | ||
==Using git lfs== | ==Using git lfs== | ||
Pre-trained models are usually made up of fairly large binary files. The Hugging Face makes these files available for download via [https://git-lfs.com/ Git Large File Storage]. To download a model, load the <tt>git-lfs</tt> module and clone your chosen model repository from | Pre-trained models are usually made up of fairly large binary files. The Hugging Face makes these files available for download via [https://git-lfs.com/ Git Large File Storage]. To download a model, load the <tt>git-lfs</tt> module and clone your chosen model repository from the model hub: | ||
module load git-lfs/3.3.0 | module load git-lfs/3.3.0 | ||
git clone https://huggingface.co/bert-base-uncased | git clone https://huggingface.co/bert-base-uncased | ||
Line 28: | Line 28: | ||
model = AutoModel.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True) | model = AutoModel.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True) | ||
tokenizer = AutoTokenizer.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True) | tokenizer = AutoTokenizer.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True) | ||
==Using python== | ==Using python== |
Revision as of 13:48, 23 May 2023
Hugging Face is an organization that builds and maintains several popular open-source software packages widely used in Artificial Intelligence research. In this article, you will find information and tutorials on how to use packages from the Hugging Face ecosystem on our clusters.
Transformers
Transformers is a python package that provides APIs and tools to easily download and train state-of-the-art pre-trained models on various tasks in multiple domains.
Installing Transformers
Our recommendation is to install it using our provided Python wheel as follows:
- 1. Load a Python module, thus module load python
- 2. Create and start a virtual environment.
- 3. Install PyTorch in the virtual environment with
pip install
.
-
(venv) [name@server ~] pip install --no-index transformers
Downloading pre-trained models
To download a pre-trained model from the Hugging Face model hub, choose one of the options below and follow the instructions on the login node of the cluster you are working on.
Using git lfs
Pre-trained models are usually made up of fairly large binary files. The Hugging Face makes these files available for download via Git Large File Storage. To download a model, load the git-lfs module and clone your chosen model repository from the model hub:
module load git-lfs/3.3.0 git clone https://huggingface.co/bert-base-uncased
Now that you have a copy of the pre-trained model saved locally in the cluster's filesystem, you can load it with a python script inside a job with the local_files_only option to avoid attempts to download it from the web:
from transformers import AutoModel AutoTokenizer model = AutoModel.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True) tokenizer = AutoTokenizer.from_pretrained("/path/to/where/you/cloned/the/model", local_files_only=True)
Using python
It is also possible to download pre-trained models using Python instead of Git. The following must be executed on a login node as an internet connection is required to download the model files:
from transformers import AutoModel AutoTokenizer model = AutoModel.from_pretrained("bert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
This will store the pre-trained model files in a cache directory, which defaults to $HOME/.cache/huggingface. You can change the cache directory by setting the environment variable TRANSFORMERS_CACHE before you import anything from the transformers package in your Python script. For example, the following will store model files in the current working directory:
import os os.environ['TRANSFORMERS_CACHE']="./" from transformers import AutoModel AutoTokenizer model = AutoModel.from_pretrained("bert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
Whether you change the default cache directory location or not, you can load the pre-trained model from disk in a job by using the local_files_only option:
from transformers import AutoModel AutoTokenizer model = AutoModel.from_pretrained("/path/to/where/model/is/saved", local_files_only=True) tokenizer = AutoTokenizer.from_pretrained("/path/to/where/model/is/saved", local_files_only=True)