MATLAB: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Corrected error in module to load for MCR: 2017b->R2017b)
No edit summary
 
(111 intermediate revisions by 14 users not shown)
Line 2: Line 2:
[[Category:Software]]
[[Category:Software]]
<translate>
<translate>
= Using MATLAB on Compute Canada clusters= <!--T:1-->
<!--T:19-->
There are two main ways of using MATLAB on Compute Canada clusters. The first one involves bringing your own license, typically owned by your institution, faculty, department or lab. The second one involves compiling your code into a binary, which you can then run using the MATLAB Compiler Runtime (MCR) libraries.
There are two ways of using MATLAB on our clusters:


== Using your own license == <!--T:2-->
<!--T:54-->
Compute Canada is a hosting provider for MATLAB . This means that we have MATLAB installed on our clusters, but we do not provide a generic license accessible to everyone. However, many institutions, faculty or department already have licenses that can be used on our cluster. In general, any research license can be used on our clusters, and more precisely, any "Total Academic Headcount" license is allowed to be  used  on our infrastructure. We have received written confirmation from Mathworks that this is allowed by their licenses. This is however not obvious in the wording of the license. If the person managing your license is arguing that it cannot be used outside of your campus, you should encourage them to reach out to their Mathworks representative to confirm with them.  
<b>1) Running MATLAB directly</b>, but that requires a license. You may either
* run MATLAB on [[Béluga/en| Béluga]], [[Cedar]] or [[Narval/en| Narval]], all of which have a license available for any student, professor or academic researcher;
* use an external license, i.e., one owned by your institution, faculty, department, or lab. See <i>[[#Using_an_external_license|Using an external license]]</i> below.
 
<!--T:55-->
<b>2) Compiling your MATLAB code</b> by using the MATLAB Compiler <code>mcc</code> and by running the generated executable file on any cluster. You can use this executable without license considerations.
 
<!--T:56-->
More details about these approaches are provided below.
 
= Using an external license = <!--T:2-->
We are a hosting provider for MATLAB. This means that we have MATLAB installed on our clusters and can allow you to access an external license to run computations on our infrastructure. Arrangements have already been made with several Canadian institutions to make this automatic.  To see if you already have access to a license, carry out the following test:
 
<!--T:21-->
<pre>
[name@cluster ~]$ module load matlab/2023b.2
[name@cluster ~]$ matlab -nojvm -nodisplay -batch license
 
<!--T:30-->
987654
[name@cluster ~]$
</pre>
 
<!--T:22-->
If any license number is printed, you're okay.  Be sure to run this test on each cluster on which you want to use MATLAB, since licenses may not be available everywhere.
 
<!--T:39-->
If you get the message <i>This version is newer than the version of the license.dat file and/or network license manager on the server machine</i>, try an older version of MATLAB in the <code>module load</code> line.
 
<!--T:40-->
Otherwise, either your institution does not have a MATLAB license, does not allow its use in this way, or no arrangements have yet been made.  Find out who administers the MATLAB license at your institution (faculty, department) and contact them or your Mathworks account manager to know if you are allowed to use the license in this way.


<!--T:3-->
<!--T:3-->
Once the legal aspects are worked out, there will be remaining technical aspects. Namely, the license server on your end will need to be reachable by our compute nodes. This will require our technical team to get in touch with the technical people managing your license software. In some cases, this has already been done. You should then be able to load the MATLAB module, and it should find its license automatically. If this is not the case, please write to [[Technical support | technical support]], so that we can arrange this for you.
If you are allowed, then some technical configuration will be required. Create a file similar to the following example:
{{File
|name=matlab.lic
|lang="bash"
|contents=
# MATLAB license server specifications
SERVER <ip address> ANY <port>
USE_SERVER
}}
Put this file in the <code>$HOME/.licenses/</code> directory where the IP address and port number correspond to the values for your campus license server. Next you will need to ensure that the license server on your campus is reachable by our compute nodes. This will require our technical team to get in touch with the technical people managing your license software. Please write to [[Technical support | technical support]] so that we can arrange this for you.
 
<!--T:29-->
For online documentation, see http://www.mathworks.com/support.
For product information, visit http://www.mathworks.com.
 
= Preparing your <code>.matlab</code> folder = <!--T:31-->
Because the /home directory is accessible in read-only mode on some compute nodes, you need to create a <code>.matlab</code> symbolic link that makes sure that the MATLAB profile and job data will be written to the /scratch space instead.
 
<!--T:32-->
<pre>
[name@cluster ~]$ cd $HOME
[name@cluster ~]$ if [ -d ".matlab" ]; then
  mv .matlab scratch/
else
  mkdir -p scratch/.matlab
fi && ln -sn scratch/.matlab .matlab
</pre>
 
= Available toolboxes = <!--T:23-->
To see a list of the MATLAB toolboxes available with the license and cluster you're using, you can use the following command:
<pre>
[name@cluster ~]$  module load matlab
[name@cluster ~]$  matlab -nojvm -batch "ver"
</pre>
 
= Running a serial or parallel MATLAB code = <!--T:57-->


<!--T:4-->
<!--T:4-->
'''Important:''' Like any other intensive job, you must always run MATLAB code within a job that you will have submitted to the scheduler. For instructions on using the scheduler, please see the [[Running jobs]] page.
<b>Important:</b> Any significant MATLAB calculation (takes more than about 5 minutes or a gigabyte of memory) must be submitted to the scheduler. For instructions on using the scheduler, please see the [[Running jobs]] page.
 
<!--T:27-->
Consider the following example code:
 
<!--T:6-->
{{File
|name=cosplot.m
|lang="Matlab"
|contents=
function cosplot()
% MATLAB file example to approximate a sawtooth
% with a truncated Fourier expansion.
nterms=5;
fourbypi=4.0/pi;
np=100;
y(1:np)=pi/2.0;
x(1:np)=linspace(-2.0*pi,2*pi,np);
 
<!--T:7-->
for k=1:nterms
twokm=2*k-1;
y=y-fourbypi*cos(twokm*x)/twokm^2;
end
 
<!--T:8-->
plot(x,y)
print -dpsc matlab_test_plot.ps
quit
end
}}


<!--T:14-->
<!--T:14-->
A simple SLURM script that you can use to submit a matlab script example, named cosplot.m - see the section below, is as follows:
Here is a simple Slurm script that you can use to run <code>cosplot.m</code>:


<!--T:15-->
<!--T:15-->
Line 23: Line 118:
|contents=
|contents=
#!/bin/bash -l
#!/bin/bash -l
#SBATCH --job-name=Matlab_test
#SBATCH --job-name=matlab_test
#SBATCH --account=def-bmoa #adjust this to match the accounting group you are using to submit jobs
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=0-03:00     #adjust this to match the walltime of your job
#SBATCH --time=0-03:00         # adjust this to match the walltime of your job
#SBATCH --nodes=1       
#SBATCH --nodes=1       
#SBATCH --ntasks=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1 #adjust this if you are using PCT
#SBATCH --cpus-per-task=1     # adjust this if you are using parallel commands
#SBATCH --mem=4000         #adjust this according to your the memory requirement per node you need
#SBATCH --mem=4000             # adjust this according to the memory requirement per node you need
#SBATCH --mail-user=bmoa@uvic.ca #adjust this to match your email address
#SBATCH --mail-user=you@youruniversity.ca # adjust this to match your email address
#SBATCH --mail-type=ALL
#SBATCH --mail-type=ALL


<!--T:16-->
<!--T:16-->
#Load the appropriate matlab module
# Choose a version of MATLAB by loading a module:
module load matlab/2017a
module load matlab/2023b.2
#Remove -singleCompThread if you are using PCT
# Remove -singleCompThread below if you are using parallel commands:
matlab -nodisplay -nosplash -singleCompThread -r "cosplot"
matlab -singleCompThread -batch "cosplot"
}}
}}


<!--T:17-->
<!--T:17-->
You can then submit the job using the usual <tt>sbatch</tt> command as follows:
Submit the job using <code>sbatch</code>:
{{Command|sbatch matlab_slurm.sl}}
{{Command|sbatch matlab_slurm.sl}}


<!--T:24-->
Do not use the <code>-singleCompThread</code> option if you request
more than one core with <code>--cpus-per-task</code>.
You should also ensure that the size of your MATLAB [https://www.mathworks.com/help/distcomp/parpool.html parpool]
matches the number of cores you are requesting.


<!--T:25-->
Each time you run MATLAB, it will create a file like <code>java.log.12345</code> unless you supply the <code>-nojvm</code> option.
However, using <code>-nojvm</code> may interfere with certain plotting functions. 
For further information on command line options <code>-nodisplay</code>, <code>-singleCompThread</code>,
<code>-nojvm</code>, and <code>-r</code>,
see [https://www.mathworks.com/help/matlab/ref/matlablinux.html MATLAB (Linux)] on the MathWorks website.


== Using the MATLAB Compiler Runtime libraries == <!--T:5-->
= Running multiple parallel MATLAB jobs simultaneously = <!--T:33-->
This requires you to have access to the MATLAB compiler on a Linux platform. Note that Compute Canada's hosting provider license agreement does not allow us to get the compiler. See documentation for the compiler at the [https://www.mathworks.com/help/compiler/index.html Mathworks] website. Here is an example function m-file.
There is a known issue when two (or more) parallel MATLAB jobs are initializing their <code>parpool</code> simultaneously: multiple new MATLAB instances are trying to read and write to the same <code>.dat</code> file in the <code>$HOME/.matlab/local_cluster_jobs/R*</code> folder, which corrupts the local parallel profile used by other MATLAB jobs. To fix the corrupted profile, delete the <code>local_cluster_jobs</code> folder when no job is running.


<!--T:6-->
<!--T:34-->
There are two main definitive solutions:
# Making sure only one MATLAB job at a time will start its <code>parpool</code>. There are many possible technical solutions, but none is perfect:
#* using a lock file (which may remain locked if a previous job has failed),
#* using random delays (which may be equal or almost equal, and still cause the corruption),
#* using always increasing delays (which are wasting compute time),
#* using Slurm options <code>--begin</code> or <code>--dependency=after:JOBID</code> to control the start time (which increases wait time in the queue).
# Making sure each MATLAB job creates a local parallel profile in a unique location of the filesystem.
 
<!--T:35-->
In your MATLAB code:
{{File
{{File
|name=cosplot.m
|name=parallel_main.m
|lang="Matlab"
|lang="Matlab"
|contents=
|contents=
% MATLAB M-file example to approximate a sawtooth
% Create a "local" cluster object
% with a truncated Fourier expansion.
local_cluster = parcluster('local')
nterms=5;
fourbypi=4.0/pi;
np=100;
y(1:np)=pi/2.0;
x(1:np)=linspace(-2.0*pi,2*pi,np);


<!--T:7-->
<!--T:36-->
for k=1:nterms
% Modify the JobStorageLocation to $SLURM_TMPDIR
twokm=2*k-1;
local_cluster.JobStorageLocation = getenv('SLURM_TMPDIR')
y=y-fourbypi*cos(twokm*x)/twokm^2;
end;


<!--T:8-->
<!--T:37-->
%(The following commands for generating graphics output work
% Start the parallel pool
% with MATLAB 7 on glacier but produce empty plots with MATLAB 6
parpool(local_cluster);
% on some other clusters.)
plot(x,y);  
print -dpsc matlab_test_plot.ps;
quit;
end
}}
}}
<!--T:38-->
References:
* FAS Research Computing, [https://www.rc.fas.harvard.edu/resources/documentation/software/matlab-pct-simultaneous-job-problem/ <i>MATLAB Parallel Computing Toolbox simultaneous job problem</i>].
* MathWorks, [https://www.mathworks.com/matlabcentral/answers/97141-why-am-i-unable-to-start-a-local-matlabpool-from-multiple-matlab-sessions-that-use-a-shared-preferen <i>Why am I unable to start a local MATLABPOOL from multiple MATLAB sessions that use a shared preference directory using Parallel Computing Toolbox 4.0 (R2008b)?</i>]
= Using the Compiler and Runtime libraries = <!--T:26-->
<!--T:13-->
<b>Important:</b> Like any other intensive job, you must always run MCR code within a job submitted to the scheduler. For instructions on using the scheduler, please see the [[Running jobs]] page.
<!--T:5-->
You can also compile your code using MATLAB Compiler, which is included among the modules we host. See documentation for the compiler on the [https://www.mathworks.com/help/compiler/index.html MathWorks] website.  At the moment, mcc is provided for versions 2014a, 2018a and later.


<!--T:9-->
<!--T:9-->
To compile it, you would use the command
To compile the <code>cosplot.m</code> example given above, you would use the command
{{Command|prompt=[name@yourserver ~]$|mcc -m -R -nodisplay cosplot.m}}
{{Command|prompt=[name@yourserver ~]$|mcc -m -R -nodisplay cosplot.m}}


<!--T:10-->
<!--T:10-->
This will produce a binary named <tt>cosplot</tt>, as well as a wrapper script. To run the binary on Compute Canada servers, you will only require the binary. The wrapper script, name <tt>run_cosplot.sh</tt>, will not work as is on our servers, because MATLAB assumes that some libraries can be found in specific locations. Instead, we provide a different wrapper script, called <tt>run_mcr_binary.sh</tt> which sets the correct paths.  
This will produce a binary named <code>cosplot</code>, as well as a wrapper script. To run the binary on our servers, you will only need the binary. The wrapper script named <code>run_cosplot.sh</code> will not work as is on our servers because MATLAB assumes that some libraries can be found in specific locations. Instead, we provide a different wrapper script called <code>run_mcr_binary.sh</code> which sets the correct paths.  


<!--T:18-->
<!--T:18-->
On one of our servers, load an MCR [[Utiliser des modules/en|module]] corresponding to the MATLAB version you used to build the executable:
On one of our servers, load an MCR [[Utiliser des modules/en|module]] corresponding to the MATLAB version you used to build the executable:
{{Command|module load mcr/R2017b}}
{{Command|module load mcr/R2018a}}


<!--T:11-->
<!--T:11-->
Run the following command:
Run the following command:
{{Command|setrpaths.sh --path cosplot}}
{{Command|setrpaths.sh --path cosplot}}
then use your binary as so:
 
{{Command|run_mcr_binary.sh cosplot}}
<!--T:28-->
then, in your submission script (<b>not on the login nodes</b>), use your binary as so:
<code>run_mcr_binary.sh cosplot</code>


<!--T:12-->
<!--T:12-->
You will only need to run the <tt>setrpaths.sh</tt> command once for each compiled binary. The <tt>run_mcr_binary.sh</tt> will instruct you to run it if it detects that it has not been done.
You will only need to run the <code>setrpaths.sh</code> command once for each compiled binary. The <code>run_mcr_binary.sh</code> will instruct you to run it if it detects that it has not been done.
 
= Using the MATLAB Parallel Server = <!--T:41-->
MATLAB Parallel Server is only worthwhile <b>if you need more workers in your parallel MATLAB job than available CPU cores on a single compute node</b>. While a regular MATLAB installation (see above sections) allows you to run parallel jobs within one node (up to 64 workers per job, depending on which node and cluster), the MATLAB Parallel Server is the licensed MathWorks solution for running a parallel job on more than one node.


<!--T:13-->
<!--T:42-->
'''Important:''' Like any other intensive job, you must always run MCR code within a job that you will have submitted to the scheduler. For instructions on using the scheduler, please see the [[Running jobs]] page.
This solution usually works by submitting MATLAB parallel jobs from a local MATLAB interface on your computer. <b>Since May 2023, some mandatory security improvements have been implemented on all clusters. Because MATLAB uses an SSH mode that is no longer permitted, job submission from a local computer is no longer possible until MATLAB uses a new connection method. There is currently no workaround.</b>
 
== Slurm plugin for MATLAB == <!--T:43-->
<b>The procedure below no longer works because the Slurm plugin is no longer available and because of the SSH issue described above.</b> The configuration steps are kept until a workaround is found:
# Have MATLAB R2022a or newer installed, <b>including the Parallel Computing Toolbox</b>.
# Go to the MathWorks Slurm Plugin page, <b>download and run</b> the <code>*.mlpkginstall</code> file. (i.e., click on the blue <i>Download</i> button on the right side, just above the <i>Overview</i> tab.)
# Enter your MathWorks credentials; if the configuration wizard does not start, run in MATLAB
#:<code>parallel.cluster.generic.runProfileWizard()</code>
# Give these responses to the configuration wizard:
#* Select <b>Unix</b> (which is usually the only choice)
#* Shared location: <b>No</b>
#* Cluster host:
#** For Béluga: <b>beluga.computecanada.ca</b>
#** For Narval: <b>narval.computecanada.ca</b>
#* Username (optional): Enter your Alliance username (the identity file can be set later if needed)
#* Remote job storage: <b>/scratch</b>
#** Keep <i>Use unique subfolders</i> checked
#* Maximum number of workers: <b>960</b>
#* Matlab installation folder for workers (both local and remote versions must match):
#** For local R2022a: <b>/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/matlab/2022a</b>
#* License type: <b>Network license manager</b>
#* Profile Name: <b>beluga</b> or <b>narval</b>
# Click on <i>Create</i> and <i>Finish</i> to finalize the profile.
 
== Edit the plugin once installed == <!--T:44-->
In MATLAB, go to the <code>nonshared</code> folder (i.e., run the following in the MATLAB terminal):
cd(fullfile(matlabshared.supportpkg.getSupportPackageRoot, 'parallel', 'slurm', 'nonshared'))
 
<!--T:49-->
Then:
# Open the <b>independentSubmitFcn.m</b> file; around line #117 is the line <p> <code>additionalSubmitArgs = sprintf('--ntasks=1 --cpus-per-task=%d', cluster.NumThreads);</code> </p><p> Replace this line with</p><p> <code>additionalSubmitArgs = ccSBATCH().getSubmitArgs();</code></p>
# Open the <b>communicatingSubmitFcn.m</b> file; around line #126 is the line <p> <code>additionalSubmitArgs = sprintf('--ntasks=%d --cpus-per-task=%d', environmentProperties.NumberOfTasks, cluster.NumThreads);</code> </p><p> Replace this line with</p><p> <code>additionalSubmitArgs = ccSBATCH().getSubmitArgs();</code></p>
# Open the <b>communicatingJobWrapper.sh</b> file; around line #20 (after the copyright statement), add the following command and adjust the module version to your local Matlab version:</p><p><code>module load matlab/2022a</code></p>
 
<!--T:50-->
Restart MATLAB and go back to your home directory:
cd(getenv('HOME'))  # or cd(getenv('HOMEPATH')) on Windows
 
== Validation == <!--T:47-->
<b>Do not</b> use the built-in validation tool in the <i>Cluster Profile Manager</i>. Instead, you should try the <code>TestParfor</code> example, along with a proper <code>ccSBATCH.m</code> script file:
# Download and extract code samples on GitHub at https://github.com/ComputeCanada/matlab-parallel-server-samples.
# In MATLAB, go to the newly extracted <code>TestParfor</code> directory.
# Follow instructions in https://github.com/ComputeCanada/matlab-parallel-server-samples/blob/master/README.md.
 
<!--T:48-->
Note: When the <code>ccSBATCH.m</code> is in your current working directory, you may try the <i>Cluster Profile Manager</i> validation tool, but only the first two tests will work. Other tests are not yet supported.
 
= External resources = <!--T:51-->
 
<!--T:52-->
MathWorks provides a variety of documentation and training about MATLAB.
* See [https://www.mathworks.com/help/matlab/ https://www.mathworks.com/help/matlab/] for documentation  (many languages)
* See [https://matlabacademy.mathworks.com/ https://matlabacademy.mathworks.com/] for self-paced online courses (EN, JP, ES, KR, CN)
 
<!--T:53-->
Some universities also provide their own MATLAB documentation:
* More examples with job scripts: [https://rcs.ucalgary.ca/MATLAB https://rcs.ucalgary.ca/MATLAB]


</translate>
</translate>

Latest revision as of 19:01, 10 September 2024

Other languages:

There are two ways of using MATLAB on our clusters:

1) Running MATLAB directly, but that requires a license. You may either

  • run MATLAB on Béluga, Cedar or Narval, all of which have a license available for any student, professor or academic researcher;
  • use an external license, i.e., one owned by your institution, faculty, department, or lab. See Using an external license below.

2) Compiling your MATLAB code by using the MATLAB Compiler mcc and by running the generated executable file on any cluster. You can use this executable without license considerations.

More details about these approaches are provided below.

Using an external license

We are a hosting provider for MATLAB. This means that we have MATLAB installed on our clusters and can allow you to access an external license to run computations on our infrastructure. Arrangements have already been made with several Canadian institutions to make this automatic. To see if you already have access to a license, carry out the following test:

[name@cluster ~]$ module load matlab/2023b.2
[name@cluster ~]$ matlab -nojvm -nodisplay -batch license

987654
[name@cluster ~]$

If any license number is printed, you're okay. Be sure to run this test on each cluster on which you want to use MATLAB, since licenses may not be available everywhere.

If you get the message This version is newer than the version of the license.dat file and/or network license manager on the server machine, try an older version of MATLAB in the module load line.

Otherwise, either your institution does not have a MATLAB license, does not allow its use in this way, or no arrangements have yet been made. Find out who administers the MATLAB license at your institution (faculty, department) and contact them or your Mathworks account manager to know if you are allowed to use the license in this way.

If you are allowed, then some technical configuration will be required. Create a file similar to the following example:

File : matlab.lic

# MATLAB license server specifications
SERVER <ip address> ANY <port>
USE_SERVER


Put this file in the $HOME/.licenses/ directory where the IP address and port number correspond to the values for your campus license server. Next you will need to ensure that the license server on your campus is reachable by our compute nodes. This will require our technical team to get in touch with the technical people managing your license software. Please write to technical support so that we can arrange this for you.

For online documentation, see http://www.mathworks.com/support. For product information, visit http://www.mathworks.com.

Preparing your .matlab folder

Because the /home directory is accessible in read-only mode on some compute nodes, you need to create a .matlab symbolic link that makes sure that the MATLAB profile and job data will be written to the /scratch space instead.

[name@cluster ~]$ cd $HOME
[name@cluster ~]$ if [ -d ".matlab" ]; then
  mv .matlab scratch/
else
  mkdir -p scratch/.matlab
fi && ln -sn scratch/.matlab .matlab

Available toolboxes

To see a list of the MATLAB toolboxes available with the license and cluster you're using, you can use the following command:

[name@cluster ~]$  module load matlab
[name@cluster ~]$  matlab -nojvm -batch "ver"

Running a serial or parallel MATLAB code

Important: Any significant MATLAB calculation (takes more than about 5 minutes or a gigabyte of memory) must be submitted to the scheduler. For instructions on using the scheduler, please see the Running jobs page.

Consider the following example code:


File : cosplot.m

function cosplot()
% MATLAB file example to approximate a sawtooth
% with a truncated Fourier expansion.
nterms=5;
fourbypi=4.0/pi;
np=100;
y(1:np)=pi/2.0;
x(1:np)=linspace(-2.0*pi,2*pi,np);

for k=1:nterms
 twokm=2*k-1;
 y=y-fourbypi*cos(twokm*x)/twokm^2;
end

plot(x,y)
print -dpsc matlab_test_plot.ps
quit
end


Here is a simple Slurm script that you can use to run cosplot.m:


File : matlab_slurm.sl

#!/bin/bash -l
#SBATCH --job-name=matlab_test
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=0-03:00         # adjust this to match the walltime of your job
#SBATCH --nodes=1      
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1      # adjust this if you are using parallel commands
#SBATCH --mem=4000             # adjust this according to the memory requirement per node you need
#SBATCH --mail-user=you@youruniversity.ca # adjust this to match your email address
#SBATCH --mail-type=ALL

# Choose a version of MATLAB by loading a module:
module load matlab/2023b.2
# Remove -singleCompThread below if you are using parallel commands:
matlab -singleCompThread -batch "cosplot"


Submit the job using sbatch:

Question.png
[name@server ~]$ sbatch matlab_slurm.sl

Do not use the -singleCompThread option if you request more than one core with --cpus-per-task. You should also ensure that the size of your MATLAB parpool matches the number of cores you are requesting.

Each time you run MATLAB, it will create a file like java.log.12345 unless you supply the -nojvm option. However, using -nojvm may interfere with certain plotting functions. For further information on command line options -nodisplay, -singleCompThread, -nojvm, and -r, see MATLAB (Linux) on the MathWorks website.

Running multiple parallel MATLAB jobs simultaneously

There is a known issue when two (or more) parallel MATLAB jobs are initializing their parpool simultaneously: multiple new MATLAB instances are trying to read and write to the same .dat file in the $HOME/.matlab/local_cluster_jobs/R* folder, which corrupts the local parallel profile used by other MATLAB jobs. To fix the corrupted profile, delete the local_cluster_jobs folder when no job is running.

There are two main definitive solutions:

  1. Making sure only one MATLAB job at a time will start its parpool. There are many possible technical solutions, but none is perfect:
    • using a lock file (which may remain locked if a previous job has failed),
    • using random delays (which may be equal or almost equal, and still cause the corruption),
    • using always increasing delays (which are wasting compute time),
    • using Slurm options --begin or --dependency=after:JOBID to control the start time (which increases wait time in the queue).
  2. Making sure each MATLAB job creates a local parallel profile in a unique location of the filesystem.

In your MATLAB code:

File : parallel_main.m

% Create a "local" cluster object
local_cluster = parcluster('local')

% Modify the JobStorageLocation to $SLURM_TMPDIR
local_cluster.JobStorageLocation = getenv('SLURM_TMPDIR')

% Start the parallel pool
parpool(local_cluster);


References:

Using the Compiler and Runtime libraries

Important: Like any other intensive job, you must always run MCR code within a job submitted to the scheduler. For instructions on using the scheduler, please see the Running jobs page.

You can also compile your code using MATLAB Compiler, which is included among the modules we host. See documentation for the compiler on the MathWorks website. At the moment, mcc is provided for versions 2014a, 2018a and later.

To compile the cosplot.m example given above, you would use the command

Question.png
[name@yourserver ~]$ mcc -m -R -nodisplay cosplot.m

This will produce a binary named cosplot, as well as a wrapper script. To run the binary on our servers, you will only need the binary. The wrapper script named run_cosplot.sh will not work as is on our servers because MATLAB assumes that some libraries can be found in specific locations. Instead, we provide a different wrapper script called run_mcr_binary.sh which sets the correct paths.

On one of our servers, load an MCR module corresponding to the MATLAB version you used to build the executable:

Question.png
[name@server ~]$ module load mcr/R2018a

Run the following command:

Question.png
[name@server ~]$ setrpaths.sh --path cosplot

then, in your submission script (not on the login nodes), use your binary as so: run_mcr_binary.sh cosplot

You will only need to run the setrpaths.sh command once for each compiled binary. The run_mcr_binary.sh will instruct you to run it if it detects that it has not been done.

Using the MATLAB Parallel Server

MATLAB Parallel Server is only worthwhile if you need more workers in your parallel MATLAB job than available CPU cores on a single compute node. While a regular MATLAB installation (see above sections) allows you to run parallel jobs within one node (up to 64 workers per job, depending on which node and cluster), the MATLAB Parallel Server is the licensed MathWorks solution for running a parallel job on more than one node.

This solution usually works by submitting MATLAB parallel jobs from a local MATLAB interface on your computer. Since May 2023, some mandatory security improvements have been implemented on all clusters. Because MATLAB uses an SSH mode that is no longer permitted, job submission from a local computer is no longer possible until MATLAB uses a new connection method. There is currently no workaround.

Slurm plugin for MATLAB

The procedure below no longer works because the Slurm plugin is no longer available and because of the SSH issue described above. The configuration steps are kept until a workaround is found:

  1. Have MATLAB R2022a or newer installed, including the Parallel Computing Toolbox.
  2. Go to the MathWorks Slurm Plugin page, download and run the *.mlpkginstall file. (i.e., click on the blue Download button on the right side, just above the Overview tab.)
  3. Enter your MathWorks credentials; if the configuration wizard does not start, run in MATLAB
    parallel.cluster.generic.runProfileWizard()
  4. Give these responses to the configuration wizard:
    • Select Unix (which is usually the only choice)
    • Shared location: No
    • Cluster host:
      • For Béluga: beluga.computecanada.ca
      • For Narval: narval.computecanada.ca
    • Username (optional): Enter your Alliance username (the identity file can be set later if needed)
    • Remote job storage: /scratch
      • Keep Use unique subfolders checked
    • Maximum number of workers: 960
    • Matlab installation folder for workers (both local and remote versions must match):
      • For local R2022a: /cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/matlab/2022a
    • License type: Network license manager
    • Profile Name: beluga or narval
  5. Click on Create and Finish to finalize the profile.

Edit the plugin once installed

In MATLAB, go to the nonshared folder (i.e., run the following in the MATLAB terminal):

cd(fullfile(matlabshared.supportpkg.getSupportPackageRoot, 'parallel', 'slurm', 'nonshared'))

Then:

  1. Open the independentSubmitFcn.m file; around line #117 is the line

    additionalSubmitArgs = sprintf('--ntasks=1 --cpus-per-task=%d', cluster.NumThreads);

    Replace this line with

    additionalSubmitArgs = ccSBATCH().getSubmitArgs();

  2. Open the communicatingSubmitFcn.m file; around line #126 is the line

    additionalSubmitArgs = sprintf('--ntasks=%d --cpus-per-task=%d', environmentProperties.NumberOfTasks, cluster.NumThreads);

    Replace this line with

    additionalSubmitArgs = ccSBATCH().getSubmitArgs();

  3. Open the communicatingJobWrapper.sh file; around line #20 (after the copyright statement), add the following command and adjust the module version to your local Matlab version:

    module load matlab/2022a

Restart MATLAB and go back to your home directory:

cd(getenv('HOME'))  # or cd(getenv('HOMEPATH')) on Windows

Validation

Do not use the built-in validation tool in the Cluster Profile Manager. Instead, you should try the TestParfor example, along with a proper ccSBATCH.m script file:

  1. Download and extract code samples on GitHub at https://github.com/ComputeCanada/matlab-parallel-server-samples.
  2. In MATLAB, go to the newly extracted TestParfor directory.
  3. Follow instructions in https://github.com/ComputeCanada/matlab-parallel-server-samples/blob/master/README.md.

Note: When the ccSBATCH.m is in your current working directory, you may try the Cluster Profile Manager validation tool, but only the first two tests will work. Other tests are not yet supported.

External resources

MathWorks provides a variety of documentation and training about MATLAB.

Some universities also provide their own MATLAB documentation: