MPI4py: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(17 intermediate revisions by the same user not shown)
Line 8: Line 8:


= Available versions = <!--T:2-->
= Available versions = <!--T:2-->
<code>mpi4py</code> is available as a module, and not from the [https://docs.alliancecan.ca/wiki/Available_Python_wheels wheelhouse] as typical Python packages are.
<code>mpi4py</code> is available as a module, and not from the [[Available Python wheels|wheelhouse]] as typical Python packages are.
You can find available version using:
You can find available version with
{{Command|module spider mpi4py}}
{{Command|module spider mpi4py}}


<!--T:3-->
<!--T:3-->
and look for more information on a specific version by using:
and look for more information on a specific version with
{{Command|module spider mpi4py/X.Y.Z}}
{{Command|module spider mpi4py/X.Y.Z}}
where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>.  
where <code>X.Y.Z</code> is the exact desired version, for instance <code>4.0.0</code>.  


= Famous first words: Hello World = <!--T:4-->
= Famous first words: Hello World = <!--T:4-->
1. Run a short [https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs interactive job] :
1. Run a short [[Running jobs#Interactive_jobs|interactive job]].
{{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}}
{{Command|salloc --account{{=}}<your account> --ntasks{{=}}5}}


<!--T:5-->
<!--T:5-->
2. Load the module:
2. Load the module.
{{Command|module load mpi4py/4.0.0 python/3.12}}
{{Command|module load mpi4py/4.0.0 python/3.12}}


<!--T:6-->
<!--T:6-->
3. Run a Hello World test:
3. Run a Hello World test.
{{Command
{{Command
|srun python -m mpi4py.bench helloworld
|srun python -m mpi4py.bench helloworld
Line 36: Line 36:
Hello, World! I am process 4 of 5 on node3.
Hello, World! I am process 4 of 5 on node3.
}}
}}
In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the tasks were distributed across the available resources.
In the case above, two nodes (<code>node1</code> and <code>node3</code>) were allocated, and the jobs were distributed across the available resources.


= mpi4py as a package dependency = <!--T:7-->
= mpi4py as a package dependency = <!--T:7-->
Line 42: Line 42:


<!--T:8-->
<!--T:8-->
1. Deactivate any Python virtual environment:
1. Deactivate any Python virtual environment.
{{Command|test $VIRTUAL_ENV && deactivate}}
{{Command|test $VIRTUAL_ENV && deactivate}}


<!--T:9-->
<!--T:9-->
'''Note:''' If you had a virtual environment activated, it is important to deactivate it first, then load the module, before re-activating your virtual environment.
<b>Note:</b> If you had a virtual environment activated, it is important to deactivate it first, then load the module, before reactivating your virtual environment.


<!--T:10-->
<!--T:10-->
2. Load the module:
2. Load the module.
{{Command|module load mpi4py/4.0.0 python/3.12}}
{{Command|module load mpi4py/4.0.0 python/3.12}}


<!--T:11-->
<!--T:11-->
3. Check that it is visible by <code>pip</code>:
3. Check that it is visible by <code>pip</code>
{{Command
{{Command
|pip list {{!}} grep mpi4py
|pip list {{!}} grep mpi4py
Line 59: Line 59:
mpi4py            4.0.0
mpi4py            4.0.0
}}
}}
and is accessible for your currently loaded python module:
and is accessible for your currently loaded python module.
{{Command|python -c 'import mpi4py'}}
{{Command|python -c 'import mpi4py'}}
If no errors are raised, then everything is ok!
If no errors are raised, then everything is OK!


<!--T:12-->
<!--T:12-->
4. [https://docs.alliancecan.ca/wiki/Python#Creating_and_using_a_virtual_environment Create a virtual env. and install your packages].
4. [[Python#Creating_and_using_a_virtual_environment|Create a virtual environment and install your packages]].


= Running jobs = <!--T:13-->
= Running jobs = <!--T:13-->
Line 73: Line 73:


== CPU == <!--T:14-->
== CPU == <!--T:14-->
1. Write your python code, for instance broadcasting a numpy array:
1. Write your python code, for instance, broadcasting a numpy array.
{{File
{{File
|name="mpi4py-np-bc.py"
|name="mpi4py-np-bc.py"
Line 98: Line 98:
     assert data[i] == i
     assert data[i] == i
}}
}}
The example above is based on [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial].
The example above is based on the [https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi mpi4py tutorial].


<!--T:19-->
<!--T:19-->
2. Write your submission script:
2. Write your submission script.
<tabs>
<tabs>
<tab name="Distributed">
<tab name="Distributed">
Line 189: Line 189:


<!--T:36-->
<!--T:36-->
2. Test your script.
3. Test your script.


<!--T:37-->
<!--T:37-->
Line 195: Line 195:


<!--T:38-->
<!--T:38-->
3. Submit your job to the scheduler.
4. Submit your job to the scheduler.
{{Command|sbatch submit-mpi4py-distributed.sh}}
{{Command|sbatch submit-mpi4py-distributed.sh}}


== GPU == <!--T:39-->
== GPU == <!--T:39-->
1. From a login node, download the demo example:
1. From a login node, download the demo example.
{{Command
{{Command
|wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py
|wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py
Line 206: Line 206:


<!--T:40-->
<!--T:40-->
2. Write your submission script:
2. Write your submission script.
{{File
{{File
|name=submit-mpi4py-gpu.sh
|name=submit-mpi4py-gpu.sh
Line 235: Line 235:
<!--T:45-->
<!--T:45-->
srun python use_cupy.py;
srun python use_cupy.py;
}}


<!--T:47-->
<!--T:47-->
2. Test your script.
3. Test your script.


<!--T:48-->
<!--T:48-->
Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].
Before submitting your job, it is important to test that your submission script will start without errors.
You can do a quick test in an [[Running_jobs#Interactive_jobs|interactive job]].


<!--T:49-->
<!--T:49-->
3. Submit your job to the scheduler.
4. Submit your job
{{Command|sbatch submit-mpi4py-gpu.sh}}
{{Command|sbatch submit-mpi4py-gpu.sh}}


= Troubleshooting = <!--T:50-->
= Troubleshooting = <!--T:50--->
== ModuleNotFoundError: No module named 'mpi4py' ==
 
== ModuleNotFoundError: No module named 'mpi4py' == <!--T:53-->
If <code>mpi4py</code> is not accessible, you may get the following error when importing it:
If <code>mpi4py</code> is not accessible, you may get the following error when importing it:
<code>
<code>

Latest revision as of 19:22, 17 October 2024

Other languages:


MPI for Python provides Python bindings for the Message Passing Interface (MPI) standard, allowing Python applications to exploit multiple processors on workstations, clusters and supercomputers.


Available versions[edit]

mpi4py is available as a module, and not from the wheelhouse as typical Python packages are. You can find available version with

Question.png
[name@server ~]$ module spider mpi4py

and look for more information on a specific version with

Question.png
[name@server ~]$ module spider mpi4py/X.Y.Z

where X.Y.Z is the exact desired version, for instance 4.0.0.

Famous first words: Hello World[edit]

1. Run a short interactive job.

Question.png
[name@server ~]$ salloc --account=<your account> --ntasks=5

2. Load the module.

Question.png
[name@server ~]$ module load mpi4py/4.0.0 python/3.12

3. Run a Hello World test.

Question.png
[name@server ~]$ srun python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 5 on node1.
Hello, World! I am process 1 of 5 on node1.
Hello, World! I am process 2 of 5 on node3.
Hello, World! I am process 3 of 5 on node3.
Hello, World! I am process 4 of 5 on node3.

In the case above, two nodes (node1 and node3) were allocated, and the jobs were distributed across the available resources.

mpi4py as a package dependency[edit]

Often mpi4py is a dependency of another package. In order to fulfill this dependency :

1. Deactivate any Python virtual environment.

Question.png
[name@server ~]$ test $VIRTUAL_ENV && deactivate

Note: If you had a virtual environment activated, it is important to deactivate it first, then load the module, before reactivating your virtual environment.

2. Load the module.

Question.png
[name@server ~]$ module load mpi4py/4.0.0 python/3.12

3. Check that it is visible by pip

Question.png
[name@server ~]$ pip list | grep mpi4py
mpi4py            4.0.0

and is accessible for your currently loaded python module.

Question.png
[name@server ~]$ python -c 'import mpi4py'

If no errors are raised, then everything is OK!

4. Create a virtual environment and install your packages.

Running jobs[edit]

You can run mpi jobs distributed across multiple nodes or cores. For efficient MPI scheduling, please see:

CPU[edit]

1. Write your python code, for instance, broadcasting a numpy array.

File : "mpi4py-np-bc.py"

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
    data = np.arange(100, dtype='i')
else:
    data = np.empty(100, dtype='i')

comm.Bcast(data, root=0)

for i in range(100):
    assert data[i] == i


The example above is based on the mpi4py tutorial.

2. Write your submission script.

File : submit-mpi4py-distributed.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --ntasks=4                # adjust this to match the number of tasks/processes to run
#SBATCH --mem-per-cpu=4G          # adjust this according to the memory you need per process

# Run on cores across the system : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Few_cores,_any_number_of_nodes

# Load modules dependencies.
module load StdEnv/2023 gcc mpi4py/4.0.0 python/3.12

# create the virtual environment on each allocated node: 
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
EOF

# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;

# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;


File : submit-mpi4py-whole-nodes.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=01:00:00           # adjust this to match the walltime of your job
#SBATCH --nodes=2                 # adjust this to match the number of whole node
#SBATCH --ntasks-per-node=40      # adjust this to match the number of tasks/processes to run per node
#SBATCH --mem-per-cpu=1G          # adjust this according to the memory you need per process

# Run on N whole nodes : https://docs.alliancecan.ca/wiki/Advanced_MPI_scheduling#Whole_nodes

# Load modules dependencies.
module load StdEnv/2023 gcc openmpi mpi4py/4.0.0 python/3.12

# create the virtual environment on each allocated node: 
srun --ntasks $SLURM_NNODES --tasks-per-node=1 bash << EOF
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index numpy==2.1.1
EOF

# activate only on main node
source $SLURM_TMPDIR/env/bin/activate;

# srun exports the current env, which contains $VIRTUAL_ENV and $PATH variables
srun python mpi4py-np-bc.py;


3. Test your script.

Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an interactive job.

4. Submit your job to the scheduler.

Question.png
[name@server ~]$ sbatch submit-mpi4py-distributed.sh

GPU[edit]

1. From a login node, download the demo example.

Question.png
[name@server ~]$ wget https://raw.githubusercontent.com/mpi4py/mpi4py/refs/heads/master/demo/cuda-aware-mpi/use_cupy.py

The example above and others, can be found in the demo folder.

2. Write your submission script.

File : submit-mpi4py-gpu.sh

#!/bin/bash

#SBATCH --account=def-someprof    # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00           # adjust this to match the walltime of your job
#SBATCH --ntasks=2                # adjust this to match the number of tasks/processes to run
#SBATCH --mem-per-cpu=2G          # adjust this according to the memory you need per process
#SBATCH --gpus=1

# Load modules dependencies.
module load StdEnv/2023 gcc cuda/12 mpi4py/4.0.0 python/3.11

# create the virtual environment on each allocated node:
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install --no-index --upgrade pip
pip install --no-index cupy numba

srun python use_cupy.py;


3. Test your script.

Before submitting your job, it is important to test that your submission script will start without errors. You can do a quick test in an interactive job.

4. Submit your job

Question.png
[name@server ~]$ sbatch submit-mpi4py-gpu.sh

Troubleshooting[edit]

ModuleNotFoundError: No module named 'mpi4py'[edit]

If mpi4py is not accessible, you may get the following error when importing it: ModuleNotFoundError: No module named 'mpi4py'

Possible solutions:

  • check which Python versions are compatible with your loaded mpi4py module using module spider mpi4py/X.Y.Z. Once a compatible Python module is loaded, check that python -c 'import mpi4py' works.
  • load the module before activating your virtual environment: please see the mpi4py as a package dependency section above.

See also ModuleNotFoundError: No module named 'X'.