cc_staff
284
edits
(Added MPI4py page) |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
<!--T:1--> | |||
[https://mpi4py.readthedocs.io/en/stable/ MPI for Python] provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers. | [https://mpi4py.readthedocs.io/en/stable/ MPI for Python] provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers. | ||
__FORCETOC__ | __FORCETOC__ | ||
= Available versions = | = Available versions = <!--T:2--> | ||
<code>mpi4py</code> is available as a module, and not from the [https://docs.alliancecan.ca/wiki/Available_Python_wheels wheelhouse] as typical Python packages do. | <code>mpi4py</code> is available as a module, and not from the [https://docs.alliancecan.ca/wiki/Available_Python_wheels wheelhouse] as typical Python packages do. | ||
One can find available version using: | One can find available version using: | ||
{{Command|module spider mpi4py}} | {{Command|module spider mpi4py}} | ||
<!--T:3--> | |||
and look for more information on a specific version by using: | and look for more information on a specific version by using: | ||
{{Command|module spider mpi4py/X.Y.Z}} | {{Command|module spider mpi4py/X.Y.Z}} | ||
where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>. | where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>. | ||
= A simple Hello World = | = A simple Hello World = <!--T:4--> | ||
1. Run a short [https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs interactive job] : | 1. Run a short [https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs interactive job] : | ||
{{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}} | {{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}} | ||
<!--T:5--> | |||
2. Load the module: | 2. Load the module: | ||
{{Command|module load mpi4py/4.0.0 python/3.12}} | {{Command|module load mpi4py/4.0.0 python/3.12}} | ||
<!--T:6--> | |||
3. Run a Hello World test: | 3. Run a Hello World test: | ||
{{Command | {{Command | ||
Line 32: | Line 36: | ||
In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the tasks were distributed accross the available ressources. | In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the tasks were distributed accross the available ressources. | ||
= mpi4py as a package dependency = | = mpi4py as a package dependency = <!--T:7--> | ||
Often <code>mpi4py</code> is a dependency for another package. In order to fulfill this dependency : | Often <code>mpi4py</code> is a dependency for another package. In order to fulfill this dependency : | ||
<!--T:8--> | |||
1. Deactivate any Python virtual environment: | 1. Deactivate any Python virtual environment: | ||
{{Command|test $VIRTUAL_ENV && deactivate}} | {{Command|test $VIRTUAL_ENV && deactivate}} | ||
<!--T:9--> | |||
'''Note:''' If you had a virtual environment activated, it is important to deactivate it first, then load the module, before re-activating your virtual environment. | '''Note:''' If you had a virtual environment activated, it is important to deactivate it first, then load the module, before re-activating your virtual environment. | ||
<!--T:10--> | |||
2. Load the module: | 2. Load the module: | ||
{{Command|module load mpi4py/4.0.0 python/3.12}} | {{Command|module load mpi4py/4.0.0 python/3.12}} | ||
<!--T:11--> | |||
3. Check that it is visible by <code>pip</code>: | 3. Check that it is visible by <code>pip</code>: | ||
{{Command | {{Command | ||
Line 53: | Line 61: | ||
If no errors are raised, then everything is ok! | If no errors are raised, then everything is ok! | ||
<!--T:12--> | |||
4. [https://docs.alliancecan.ca/wiki/Python#Creating_and_using_a_virtual_environment Create a virtual env. and install your packages] | 4. [https://docs.alliancecan.ca/wiki/Python#Creating_and_using_a_virtual_environment Create a virtual env. and install your packages] | ||
= Running jobs with mpi4py = | = Running jobs with mpi4py = <!--T:13--> | ||
You can run mpi jobs distributed across multiple nodes or cores. | You can run mpi jobs distributed across multiple nodes or cores. | ||
For efficient MPI scheduling, please see: | For efficient MPI scheduling, please see: | ||
Line 61: | Line 70: | ||
* [[Advanced MPI scheduling]] | * [[Advanced MPI scheduling]] | ||
== CPU == | == CPU == <!--T:14--> | ||
1. Write your python code, for instance broadcasting a numpy array: | 1. Write your python code, for instance broadcasting a numpy array: | ||
{{File | {{File | ||
Line 70: | Line 79: | ||
import numpy as np | import numpy as np | ||
<!--T:15--> | |||
comm = MPI.COMM_WORLD | comm = MPI.COMM_WORLD | ||
rank = comm.Get_rank() | rank = comm.Get_rank() | ||
<!--T:16--> | |||
if rank == 0: | if rank == 0: | ||
data = np.arange(100, dtype='i') | data = np.arange(100, dtype='i') | ||
Line 78: | Line 89: | ||
data = np.empty(100, dtype='i') | data = np.empty(100, dtype='i') | ||
<!--T:17--> | |||
comm.Bcast(data, root=0) | comm.Bcast(data, root=0) | ||
<!--T:18--> | |||
for i in range(100): | for i in range(100): | ||
assert data[i] == i | assert data[i] == i | ||
Line 85: | Line 98: | ||
The example above is based on [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial]. | The example above is based on [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial]. | ||
<!--T:19--> | |||
2. Write your submission script: | 2. Write your submission script: | ||
<tabs> | <tabs> | ||
Line 94: | Line 108: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:20--> | |||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job | #SBATCH --time=08:00:00 # adjust this to match the walltime of your job | ||
Line 99: | Line 114: | ||
#SBATCH --mem-per-cpu=4G # adjust this according to the memory you need per process | #SBATCH --mem-per-cpu=4G # adjust this according to the memory you need per process | ||
<!--T:21--> | |||
# Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes | # Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes | ||
<!--T:22--> | |||
# Load modules dependencies. | # Load modules dependencies. | ||
module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12 | module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12 | ||
<!--T:23--> | |||
# create the virtual environment on each allocated node: | # create the virtual environment on each allocated node: | ||
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF | srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF | ||
Line 109: | Line 127: | ||
source $SLURM_TMPDIR/env/bin/activate | source $SLURM_TMPDIR/env/bin/activate | ||
<!--T:24--> | |||
pip install --no-index --upgrade pip | pip install --no-index --upgrade pip | ||
pip install --no-index numpy==2.1.1 | pip install --no-index numpy==2.1.1 | ||
EOF | EOF | ||
<!--T:25--> | |||
# activate only on main node | # activate only on main node | ||
source $SLURM_TMPDIR/env/bin/activate; | source $SLURM_TMPDIR/env/bin/activate; | ||
<!--T:26--> | |||
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables | # srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables | ||
srun python mpi4py-np-bc.py; | srun python mpi4py-np-bc.py; | ||
Line 121: | Line 142: | ||
</tab> | </tab> | ||
<!--T:27--> | |||
<tab name="Whole nodes"> | <tab name="Whole nodes"> | ||
{{File | {{File | ||
Line 128: | Line 150: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:28--> | |||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
#SBATCH --time=01:00:00 # adjust this to match the walltime of your job | #SBATCH --time=01:00:00 # adjust this to match the walltime of your job | ||
Line 134: | Line 157: | ||
#SBATCH --mem-per-cpu=1G # adjust this according to the memory you need per process | #SBATCH --mem-per-cpu=1G # adjust this according to the memory you need per process | ||
<!--T:29--> | |||
# Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes | # Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes | ||
<!--T:30--> | |||
# Load modules dependencies. | # Load modules dependencies. | ||
module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12 | module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12 | ||
<!--T:31--> | |||
# create the virtual environment on each allocated node: | # create the virtual environment on each allocated node: | ||
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF | srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF | ||
Line 144: | Line 170: | ||
source $SLURM_TMPDIR/env/bin/activate | source $SLURM_TMPDIR/env/bin/activate | ||
<!--T:32--> | |||
pip install --no-index --upgrade pip | pip install --no-index --upgrade pip | ||
pip install --no-index numpy==2.1.1 | pip install --no-index numpy==2.1.1 | ||
EOF | EOF | ||
<!--T:33--> | |||
# activate only on main node | # activate only on main node | ||
source $SLURM_TMPDIR/env/bin/activate; | source $SLURM_TMPDIR/env/bin/activate; | ||
<!--T:34--> | |||
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables | # srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables | ||
srun python mpi4py-np-bc.py; | srun python mpi4py-np-bc.py; | ||
Line 157: | Line 186: | ||
</tabs> | </tabs> | ||
<!--T:35--> | |||
2. Submit your job to the scheduler. | 2. Submit your job to the scheduler. | ||
<!--T:36--> | |||
2.1 Test your script. | 2.1 Test your script. | ||
<!--T:37--> | |||
Before submitting your job, it is important to test that your submission script will start without errors. | Before submitting your job, it is important to test that your submission script will start without errors. | ||
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]]. | You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]]. | ||
<!--T:38--> | |||
2.2 Submit your job | 2.2 Submit your job | ||
{{Command|sbatch submit-mpi4py-distributed.sh}} | {{Command|sbatch submit-mpi4py-distributed.sh}} | ||
== GPU == | == GPU == <!--T:39--> | ||
1. From a login node, download the demo example: | 1. From a login node, download the demo example: | ||
{{Command | {{Command | ||
Line 174: | Line 207: | ||
The example above and others, can be found in the [https://github.com/mpi4py/mpi4py/tree/master/demo demo folder]. | The example above and others, can be found in the [https://github.com/mpi4py/mpi4py/tree/master/demo demo folder]. | ||
<!--T:40--> | |||
2. Write your submission script: | 2. Write your submission script: | ||
{{File | {{File | ||
Line 181: | Line 215: | ||
#!/bin/bash | #!/bin/bash | ||
<!--T:41--> | |||
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | #SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs | ||
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job | #SBATCH --time=08:00:00 # adjust this to match the walltime of your job | ||
Line 187: | Line 222: | ||
#SBATCH --gpus=1 | #SBATCH --gpus=1 | ||
<!--T:42--> | |||
# Load modules dependencies. | # Load modules dependencies. | ||
module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11 | module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11 | ||
<!--T:43--> | |||
# create the virtual environment on each allocated node: | # create the virtual environment on each allocated node: | ||
virtualenv --no-download $SLURM_TMPDIR/env | virtualenv --no-download $SLURM_TMPDIR/env | ||
source $SLURM_TMPDIR/env/bin/activate | source $SLURM_TMPDIR/env/bin/activate | ||
<!--T:44--> | |||
pip install --no-index --upgrade pip | pip install --no-index --upgrade pip | ||
pip install --no-index cupy numba | pip install --no-index cupy numba | ||
<!--T:45--> | |||
srun python use_cupy.py; | srun python use_cupy.py; | ||
}} | }} | ||
<!--T:46--> | |||
2. Submit your job to the scheduler. | 2. Submit your job to the scheduler. | ||
<!--T:47--> | |||
2.1 Test your script. | 2.1 Test your script. | ||
<!--T:48--> | |||
Before submitting your job, it is important to test that your submission script will start without errors. | Before submitting your job, it is important to test that your submission script will start without errors. | ||
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]]. | You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]]. | ||
<!--T:49--> | |||
2.2 Submit your job | 2.2 Submit your job | ||
{{Command|sbatch submit-mpi4py-gpu.sh}} | {{Command|sbatch submit-mpi4py-gpu.sh}} | ||
= Troubleshooting = | = Troubleshooting = <!--T:50--> | ||
== ModuleNotFoundError: No module named 'mpi4py' == | == ModuleNotFoundError: No module named 'mpi4py' == | ||
If <code>mpi4py</code> is not accessible, one may get the following error when importing it: | If <code>mpi4py</code> is not accessible, one may get the following error when importing it: | ||
Line 217: | Line 260: | ||
</code> | </code> | ||
<!--T:51--> | |||
Possible solution to fix this error: | Possible solution to fix this error: | ||
* check compatible python version with <code>module spider mpi4py/X.Y.Z</code> and that <code>python -c 'import mpi4py'</code> works. | * check compatible python version with <code>module spider mpi4py/X.Y.Z</code> and that <code>python -c 'import mpi4py'</code> works. | ||
* load the module before activating a virtual environment: please see the above section [[MPI4py#mpi4py as a package dependency]] | * load the module before activating a virtual environment: please see the above section [[MPI4py#mpi4py as a package dependency]] | ||
<!--T:52--> | |||
And also see [[Python#ModuleNotFoundError:_No_module_named_'X']] | And also see [[Python#ModuleNotFoundError:_No_module_named_'X']] | ||
</translate> | </translate> |