MPI4py: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Added MPI4py page)
 
No edit summary
 
(35 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<languages />
<languages />
<translate>
<translate>
<!--T:1-->
[https://mpi4py.readthedocs.io/en/stable/ MPI for Python] provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers.
[https://mpi4py.readthedocs.io/en/stable/ MPI for Python] provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers.
__FORCETOC__
__FORCETOC__


= Available versions =  
= Available versions = <!--T:2-->
<code>mpi4py</code> is available as a module, and not from the [https://docs.alliancecan.ca/wiki/Available_Python_wheels wheelhouse] as typical Python packages do.
<code>mpi4py</code> is available as a module, and not from the [[Available Python wheels|wheelhouse]] as typical Python packages are.
One can find available version using:
You can find available version with
{{Command|module spider mpi4py}}
{{Command|module spider mpi4py}}


and look for more information on a specific version by using:
<!--T:3-->
and look for more information on a specific version with
{{Command|module spider mpi4py/X.Y.Z}}
{{Command|module spider mpi4py/X.Y.Z}}
where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>.  
where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>.  


= A simple Hello World =  
= Famous first words: Hello World = <!--T:4-->
1. Run a short [https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs interactive job] :
1. Run a short [[Running jobs#Interactive_jobs|interactive job]].
{{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}}
{{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}}


2. Load the module:
<!--T:5-->
2. Load the module.
{{Command|module load mpi4py/4.0.0 python/3.12}}
{{Command|module load mpi4py/4.0.0 python/3.12}}


3. Run a Hello World test:
<!--T:6-->
3. Run a Hello World test.
{{Command
{{Command
|srun python -m mpi4py.bench helloworld
|srun python -m mpi4py.bench helloworld
Line 30: Line 36:
Hello, World! I am process 4 of 5 on node3.
Hello, World! I am process 4 of 5 on node3.
}}
}}
In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the tasks were distributed accross the available ressources.
In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the jobs were distributed across the available resources.


= mpi4py as a package dependency =
= mpi4py as a package dependency = <!--T:7-->
Often <code>mpi4py</code> is a dependency for another package. In order to fulfill this dependency :
Often <code>mpi4py</code> is a dependency of another package. In order to fulfill this dependency :


1. Deactivate any Python virtual environment:
<!--T:8-->
1. Deactivate any Python virtual environment.
{{Command|test $VIRTUAL_ENV && deactivate}}
{{Command|test $VIRTUAL_ENV && deactivate}}


'''Note:''' If you had a virtual environment activated, it is important to deactivate it first, then load the module, before re-activating your virtual environment.
<!--T:9-->
<b>Note:</b> If you had a virtual environment activated, it is important to deactivate it first, then load the module, before reactivating your virtual environment.


2. Load the module:
<!--T:10-->
2. Load the module.
{{Command|module load mpi4py/4.0.0 python/3.12}}
{{Command|module load mpi4py/4.0.0 python/3.12}}


3. Check that it is visible by <code>pip</code>:
<!--T:11-->
3. Check that it is visible by <code>pip</code>
{{Command
{{Command
|pip list {{!}} grep mpi4py
|pip list {{!}} grep mpi4py
Line 49: Line 59:
mpi4py            4.0.0
mpi4py            4.0.0
}}
}}
and is accessible for your current python module loaded:
and is accessible for your currently loaded python module.
{{Command|python -c 'import mpi4py'}}
{{Command|python -c 'import mpi4py'}}
If no errors are raised, then everything is ok!
If no errors are raised, then everything is OK!


4. [https://docs.alliancecan.ca/wiki/Python#Creating_and_using_a_virtual_environment Create a virtual env. and install your packages]
<!--T:12-->
4. [[Python#Creating_and_using_a_virtual_environment|Create a virtual environment and install your packages]].


= Running jobs with mpi4py =
= Running jobs = <!--T:13-->
You can run mpi jobs distributed across multiple nodes or cores.  
You can run mpi jobs distributed across multiple nodes or cores.  
For efficient MPI scheduling, please see:
For efficient MPI scheduling, please see:
Line 61: Line 72:
* [[Advanced MPI scheduling]]
* [[Advanced MPI scheduling]]


== CPU ==
== CPU == <!--T:14-->
1. Write your python code, for instance broadcasting a numpy array:
1. Write your python code, for instance, broadcasting a numpy array.
{{File
{{File
|name="mpi4py-np-bc.py"
|name="mpi4py-np-bc.py"
Line 70: Line 81:
import numpy as np
import numpy as np


<!--T:15-->
comm = MPI.COMM_WORLD
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
rank = comm.Get_rank()


<!--T:16-->
if rank == 0:
if rank == 0:
     data = np.arange(100, dtype='i')
     data = np.arange(100, dtype='i')
Line 78: Line 91:
     data = np.empty(100, dtype='i')
     data = np.empty(100, dtype='i')


<!--T:17-->
comm.Bcast(data, root=0)
comm.Bcast(data, root=0)


<!--T:18-->
for i in range(100):
for i in range(100):
     assert data[i] == i
     assert data[i] == i
}}
}}
The example above is based on [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial].
The example above is based on the [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial].


2. Write your submission script:
<!--T:19-->
2. Write your submission script.
<tabs>
<tabs>
<tab name="Distributed">
<tab name="Distributed">
Line 94: Line 110:
#!/bin/bash
#!/bin/bash


<!--T:20-->
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00          # adjust this to match the walltime of your job
#SBATCH --time=08:00:00          # adjust this to match the walltime of your job
Line 99: Line 116:
#SBATCH --mem-per-cpu=4G          # adjust this according to the memory you need per process
#SBATCH --mem-per-cpu=4G          # adjust this according to the memory you need per process


<!--T:21-->
# Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes
# Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes


<!--T:22-->
# Load modules dependencies.
# Load modules dependencies.
module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12
module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12


<!--T:23-->
# create the virtual environment on each allocated node:  
# create the virtual environment on each allocated node:  
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
Line 109: Line 129:
source $SLURM_TMPDIR/env/bin/activate
source $SLURM_TMPDIR/env/bin/activate


<!--T:24-->
pip install --no-index --upgrade pip
pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
pip install --no-index numpy==2.1.1
EOF
EOF


<!--T:25-->
# activate only on main node
# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;
source $SLURM_TMPDIR/env/bin/activate;


<!--T:26-->
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;
srun python mpi4py-np-bc.py;
Line 121: Line 144:
</tab>
</tab>


<!--T:27-->
<tab name="Whole nodes">
<tab name="Whole nodes">
{{File
{{File
Line 128: Line 152:
#!/bin/bash
#!/bin/bash


<!--T:28-->
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=01:00:00          # adjust this to match the walltime of your job
#SBATCH --time=01:00:00          # adjust this to match the walltime of your job
Line 134: Line 159:
#SBATCH --mem-per-cpu=1G          # adjust this according to the memory you need per process
#SBATCH --mem-per-cpu=1G          # adjust this according to the memory you need per process


<!--T:29-->
# Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes
# Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes


<!--T:30-->
# Load modules dependencies.
# Load modules dependencies.
module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12
module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12


<!--T:31-->
# create the virtual environment on each allocated node:  
# create the virtual environment on each allocated node:  
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
Line 144: Line 172:
source $SLURM_TMPDIR/env/bin/activate
source $SLURM_TMPDIR/env/bin/activate


<!--T:32-->
pip install --no-index --upgrade pip
pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
pip install --no-index numpy==2.1.1
EOF
EOF


<!--T:33-->
# activate only on main node
# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;
source $SLURM_TMPDIR/env/bin/activate;


<!--T:34-->
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;
srun python mpi4py-np-bc.py;
Line 157: Line 188:
</tabs>
</tabs>


2. Submit your job to the scheduler.
<!--T:36-->
3. Test your script.


2.1 Test your script.
<!--T:37-->
Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].


Before submitting your job, it is important to test that your submission script will start without errors.
<!--T:38-->
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].
4. Submit your job to the scheduler.
 
2.2 Submit your job
{{Command|sbatch submit-mpi4py-distributed.sh}}
{{Command|sbatch submit-mpi4py-distributed.sh}}


== GPU ==
== GPU == <!--T:39-->
1. From a login node, download the demo example:
1. From a login node, download the demo example.
{{Command
{{Command
|wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py
|wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py
Line 174: Line 205:
The example above and others, can be found in the [https://github.com/mpi4py/mpi4py/tree/master/demo demo folder].
The example above and others, can be found in the [https://github.com/mpi4py/mpi4py/tree/master/demo demo folder].


2. Write your submission script:
<!--T:40-->
2. Write your submission script.
{{File
{{File
|name=submit-mpi4py-gpu.sh
|name=submit-mpi4py-gpu.sh
Line 181: Line 213:
#!/bin/bash
#!/bin/bash


<!--T:41-->
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00          # adjust this to match the walltime of your job
#SBATCH --time=08:00:00          # adjust this to match the walltime of your job
Line 187: Line 220:
#SBATCH --gpus=1
#SBATCH --gpus=1


<!--T:42-->
# Load modules dependencies.
# Load modules dependencies.
module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11
module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11


<!--T:43-->
# create the virtual environment on each allocated node:
# create the virtual environment on each allocated node:
virtualenv --no-download $SLURM_TMPDIR/env
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
source $SLURM_TMPDIR/env/bin/activate


<!--T:44-->
pip install --no-index --upgrade pip
pip install --no-index --upgrade pip
pip install --no-index cupy numba
pip install --no-index cupy numba


<!--T:45-->
srun python use_cupy.py;
srun python use_cupy.py;
}}
}}


2. Submit your job to the scheduler.
<!--T:47-->
 
3. Test your script.
2.1 Test your script.


<!--T:48-->
Before submitting your job, it is important to test that your submission script will start without errors.
Before submitting your job, it is important to test that your submission script will start without errors.
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].


2.2 Submit your job
<!--T:49-->
4. Submit your job
{{Command|sbatch submit-mpi4py-gpu.sh}}
{{Command|sbatch submit-mpi4py-gpu.sh}}


= Troubleshooting =  
= Troubleshooting = <!--T:50--->
== ModuleNotFoundError: No module named 'mpi4py' ==
 
If <code>mpi4py</code> is not accessible, one may get the following error when importing it:
== ModuleNotFoundError: No module named 'mpi4py' == <!--T:53-->
If <code>mpi4py</code> is not accessible, you may get the following error when importing it:
<code>
<code>
ModuleNotFoundError: No module named 'mpi4py'
ModuleNotFoundError: No module named 'mpi4py'
</code>
</code>


Possible solution to fix this error:
<!--T:51-->
* check compatible python version with <code>module spider mpi4py/X.Y.Z</code> and that <code>python -c 'import mpi4py'</code> works.
Possible solutions:
* load the module before activating a virtual environment: please see the above section [[MPI4py#mpi4py as a package dependency]]
* check which Python versions are compatible with your loaded mpi4py module using <code>module spider mpi4py/X.Y.Z</code>. Once a compatible Python module is loaded, check that <code>python -c 'import mpi4py'</code> works.
* load the module before activating your virtual environment: please see the [[MPI4py#mpi4py_as_a_package_dependency|mpi4py as a package dependency]] section above.


And also see [[Python#ModuleNotFoundError:_No_module_named_'X']]
<!--T:52-->
See also [[Python#ModuleNotFoundError:_No_module_named_'X'|ModuleNotFoundError: No module named 'X']].
</translate>
</translate>

Latest revision as of 19:22, 17 October 2024

Other languages:


MPI for Python provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers.


Available versions[edit]

mpi4py is available as a module, and not from the wheelhouse as typical Python packages are. You can find available version with

Question.png
[name@server ~]$ module spider mpi4py

and look for more information on a specific version with

Question.png
[name@server ~]$ module spider mpi4py/X.Y.Z

where X.Y.Z is the exact desired version, for instance 4.0.0.

Famous first words: Hello World[edit]

1. Run a short interactive job.

Question.png
[name@server ~]$ salloc --account=<your account> --ntasks=5

2. Load the module.

Question.png
[name@server ~]$ module load mpi4py/4.0.0 python/3.12

3. Run a Hello World test.

Question.png
[name@server ~]$ srun python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 5 on node1.
Hello, World! I am process 1 of 5 on node1.
Hello, World! I am process 2 of 5 on node3.
Hello, World! I am process 3 of 5 on node3.
Hello, World! I am process 4 of 5 on node3.

In the case above, two nodes (node1 and node3) were allocated, and the jobs were distributed across the available resources.

mpi4py as a package dependency[edit]

Often mpi4py is a dependency of another package. In order to fulfill this dependency :

1. Deactivate any Python virtual environment.

Question.png
[name@server ~]$ test $VIRTUAL_ENV && deactivate

Note: If you had a virtual environment activated, it is important to deactivate it first, then load the module, before reactivating your virtual environment.

2. Load the module.

Question.png
[name@server ~]$ module load mpi4py/4.0.0 python/3.12

3. Check that it is visible by pip

Question.png
[name@server ~]$ pip list | grep mpi4py
mpi4py            4.0.0

and is accessible for your currently loaded python module.

Question.png
[name@server ~]$ python -c 'import mpi4py'

If no errors are raised, then everything is OK!

4. Create a virtual environment and install your packages.

Running jobs[edit]

You can run mpi jobs distributed across multiple nodes or cores. For efficient MPI scheduling, please see:

CPU[edit]

1. Write your python code, for instance, broadcasting a numpy array.

File : "mpi4py-np-bc.py"

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    data = np.arange(100, dtype='i')
else:
    data = np.empty(100, dtype='i')

comm.Bcast(data, root=0)

for i in range(100):
    assert data[i] == i


The example above is based on the mpi4py tutorial.

2. Write your submission script.

File : submit-mpi4py-distributed.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --ntasks=4                # adjust this to match the number of tasks/processes to run
#SBATCH --mem-per-cpu=4G          # adjust this according to the memory you need per process

# Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes

# Load modules dependencies.
module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12

# create the virtual environment on each allocated node: 
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
EOF

# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;

# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;


File : submit-mpi4py-whole-nodes.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=01:00:00           # adjust this to match the walltime of your job
#SBATCH --nodes=2                 # adjust this to match the number of whole node
#SBATCH --ntasks-per-node=40      # adjust this to match the number of tasks/processes to run per node
#SBATCH --mem-per-cpu=1G          # adjust this according to the memory you need per process

# Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes

# Load modules dependencies.
module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12

# create the virtual environment on each allocated node: 
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
EOF

# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;

# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;


3. Test your script.

Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an interactive job.

4. Submit your job to the scheduler.

Question.png
[name@server ~]$ sbatch submit-mpi4py-distributed.sh

GPU[edit]

1. From a login node, download the demo example.

Question.png
[name@server ~]$ wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py

The example above and others, can be found in the demo folder.

2. Write your submission script.

File : submit-mpi4py-gpu.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --ntasks=2                # adjust this to match the number of tasks/processes to run
#SBATCH --mem-per-cpu=2G          # adjust this according to the memory you need per process
#SBATCH --gpus=1

# Load modules dependencies.
module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11

# create the virtual environment on each allocated node:
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index cupy numba

srun python use_cupy.py;


3. Test your script.

Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an interactive job.

4. Submit your job

Question.png
[name@server ~]$ sbatch submit-mpi4py-gpu.sh

Troubleshooting[edit]

ModuleNotFoundError: No module named 'mpi4py'[edit]

If mpi4py is not accessible, you may get the following error when importing it: ModuleNotFoundError: No module named 'mpi4py'

Possible solutions:

  • check which Python versions are compatible with your loaded mpi4py module using module spider mpi4py/X.Y.Z. Once a compatible Python module is loaded, check that python -c 'import mpi4py' works.
  • load the module before activating your virtual environment: please see the mpi4py as a package dependency section above.

See also ModuleNotFoundError: No module named 'X'.