Chapel: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(23 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
<translate>
<translate>
= Chapel = <!--T:1-->


<!--T:2-->
<!--T:2-->
Chapel is a general-purpose, compiled, high-level parallel programming language with built-in abstractions for shared- and distributed-memory parallelism. There are two styles of parallel programming in Chapel: (1) '''task parallelism''', in which parallelism is driven by ''programmer-specified tasks'', and (2) '''data parallelism''', in which parallelism is driven by ''computations over collections of data elements or their indices'' sitting in shared memory on one node or distributed among multiple nodes.
Chapel is a general-purpose, compiled, high-level parallel programming language with built-in abstractions for shared- and distributed-memory parallelism. There are two styles of parallel programming in Chapel: (1) <b>task parallelism</b>, where parallelism is driven by <i>programmer-specified tasks</i>, and (2) <b>data parallelism</b>, where parallelism is driven by applying the same computation on subsets of data elements, which may be in the shared memory of a single node, or distributed over multiple nodes.


<!--T:3-->
<!--T:3-->
These high-level abstractions make Chapel ideal for learning parallel programming for a novice HPC user. Chapel is incredibly intuitive, striving to merge the ease-of-use of Python and the performance of traditional compiled languages such as C and Fortran. Parallel blocks that typically take tens of lines of MPI code can be expressed in only a few lines of Chapel code. Chapel is open source and can run on any Unix-like operating system, with hardware support from laptops to large HPC systems.
These high-level abstractions make Chapel ideal for learning parallel programming for a novice HPC user. Chapel is incredibly intuitive, striving to merge the ease-of-use of [[Python]] and the performance of traditional compiled languages such as [[C]] and [[Fortran]]. Parallel blocks that typically take tens of lines of [[MPI]] code can be expressed in only a few lines of Chapel code. Chapel is open source and can run on any Unix-like operating system, with hardware support from laptops to large HPC systems.


<!--T:4-->
<!--T:4-->
Chapel has a relatively small user base, so many libraries that exist for C, C++, Fortran have not yet been implemented in Chapel. Hopefully, that will change in coming years, if Chapel adoption continues to gain momentum in the HPC community.
Chapel has a relatively small user base, so many libraries that exist for [[C]], [[C++]], [[Fortran]] have not yet been implemented in Chapel. Hopefully, that will change in coming years if Chapel adoption continues to gain momentum in the HPC community.


<!--T:5-->
<!--T:5-->
Line 18: Line 19:


<!--T:7-->
<!--T:7-->
For now, single-locale Chapel on Compute Canada's general purpose clusters (Cedar, Graham and Béluga) is installed in a non-central location and can be initialized with a script. For example, you can use <code>salloc</code> to test Chapel codes in serial:
Single-locale (single node; shared-memory only) Chapel on our general-purpose clusters is provided by the module <code>chapel-multicore</code>. You can use <code>salloc</code> to test Chapel codes either in serial:
</translate>
</translate>
{{Commands
{{Commands
|source /home/razoumov/startSingleLocale.sh
|module load gcc/9.3.0 chapel-multicore/1.31.0
|salloc --time{{=}}0:30:0 --ntasks{{=}}1 --mem-per-cpu{{=}}3600 --account{{=}}def-someprof
|salloc --time{{=}}0:30:0 --ntasks{{=}}1 --mem-per-cpu{{=}}3600 --account{{=}}def-someprof
|chpl test.chpl -o test
|chpl test.chpl -o test
Line 31: Line 32:
</translate>
</translate>
{{Commands
{{Commands
|source /home/razoumov/startSingleLocale.sh
|module load gcc/9.3.0 chapel-multicore/1.31.0
|salloc --time{{=}}0:30:0 --ntasks{{=}}1 --cpus-per-task{{=}}3 --mem-per-cpu{{=}}3600 --account{{=}}def-someprof
|salloc --time{{=}}0:30:0 --ntasks{{=}}1 --cpus-per-task{{=}}3 --mem-per-cpu{{=}}3600 --account{{=}}def-someprof
|chpl test.chpl -o test
|chpl test.chpl -o test
Line 38: Line 39:
<translate>
<translate>
<!--T:9-->
<!--T:9-->
For production jobs, please write a job submission script and submit it with <code>sbatch</code>.
For production jobs, please write a [[Running_jobs|job submission script]] and submit it with <code>sbatch</code>.


== Multi-locale Chapel == <!--T:10-->
== Multi-locale Chapel == <!--T:10-->


<!--T:11-->
<!--T:11-->
Installing multi-locale (distributed-memory) Chapel requires fine-tuning its launcher for the specific physical interconnect on a cluster. Since different Compute Canada clusters employ different physical interconnects, we do not have one multi-locale Chapel for all machines. Instead, multi-locale Chapel has been compiled in a separate directory as an experimental setup on each system. You can test this setup with the following Chapel code printing basic information about the nodes available inside your job:
Multi-locale (multiple nodes; hybrid shared- and distributed-memory) Chapel is provided by <code>chapel-ofi</code> (for the OmniPath interconnect on Cedar) and <code>chapel-ucx</code> (for the InfiniBand interconnect on Graham, Béluga, Narval) modules.


<!--T:20-->
Consider the following Chapel code printing basic information about the nodes available inside your job:
</translate>
</translate>
{{
{{
Line 51: Line 54:
   |lang="chapel"
   |lang="chapel"
   |contents=
   |contents=
use Memory.Diagnostics;
use MemDiagnostics;
for loc in Locales do
for loc in Locales do
   on loc {
   on loc {
Line 64: Line 67:


<!--T:18-->
<!--T:18-->
Load Chapel and start an interactive job requesting four nodes and three cores on each node:
To run this code on [[Cedar]], you need to load the <code>chapel-ofi</code> module:
{{Commands  
{{Commands  
|source /home/razoumov/startMultiLocale.sh
|module load gcc/9.3.0 chapel-ofi/1.31.0
|salloc --time{{=}}0:30:0 --nodes{{=}}4 --cpus-per-task{{=}}3 --mem-per-cpu{{=}}3500 --account{{=}}def-someprof
|salloc --time{{=}}0:30:0 --nodes{{=}}4 --cpus-per-task{{=}}3 --mem-per-cpu{{=}}3500 --account{{=}}def-someprof
}}
}}


<!--T:19-->
<!--T:19-->
Once the interactive job starts, you can compile and run your code from the prompt on the first allocated compute node:
Once the [[Running_jobs#Interactive_jobs|interactive job]] starts, you can compile and run your code from the prompt on the first allocated compute node:
{{Commands
{{Commands
|chpl probeLocales.chpl -o probeLocales
|chpl --fast probeLocales.chpl -o probeLocales
|./probeLocales -nl 4
|./probeLocales -nl 4
}}
}}
For production jobs, please write a Slurm submission script and submit your job with <code>sbatch</code> instead.
To run the same code on InfiniBand-based clusters (all those except Cedar), please use the <code>chapel-ucx</code> module.
 
<!--T:21-->
For production jobs, please write a [[Running_jobs|Slurm submission script]] and submit your job with <code>sbatch</code> instead.
 
</translate>
</translate>

Latest revision as of 19:50, 1 May 2024

Other languages:


Chapel is a general-purpose, compiled, high-level parallel programming language with built-in abstractions for shared- and distributed-memory parallelism. There are two styles of parallel programming in Chapel: (1) task parallelism, where parallelism is driven by programmer-specified tasks, and (2) data parallelism, where parallelism is driven by applying the same computation on subsets of data elements, which may be in the shared memory of a single node, or distributed over multiple nodes.

These high-level abstractions make Chapel ideal for learning parallel programming for a novice HPC user. Chapel is incredibly intuitive, striving to merge the ease-of-use of Python and the performance of traditional compiled languages such as C and Fortran. Parallel blocks that typically take tens of lines of MPI code can be expressed in only a few lines of Chapel code. Chapel is open source and can run on any Unix-like operating system, with hardware support from laptops to large HPC systems.

Chapel has a relatively small user base, so many libraries that exist for C, C++, Fortran have not yet been implemented in Chapel. Hopefully, that will change in coming years if Chapel adoption continues to gain momentum in the HPC community.

For more information, please watch our three-part Chapel webinar.

Single-locale Chapel

Single-locale (single node; shared-memory only) Chapel on our general-purpose clusters is provided by the module chapel-multicore. You can use salloc to test Chapel codes either in serial:

[name@server ~]$ module load gcc/9.3.0 chapel-multicore/1.31.0
[name@server ~]$ salloc --time=0:30:0 --ntasks=1 --mem-per-cpu=3600 --account=def-someprof
[name@server ~]$ chpl test.chpl -o test
[name@server ~]$ ./test

or on multiple cores on the same node:

[name@server ~]$ module load gcc/9.3.0 chapel-multicore/1.31.0
[name@server ~]$ salloc --time=0:30:0 --ntasks=1 --cpus-per-task=3 --mem-per-cpu=3600 --account=def-someprof
[name@server ~]$ chpl test.chpl -o test
[name@server ~]$ ./test

For production jobs, please write a job submission script and submit it with sbatch.

Multi-locale Chapel

Multi-locale (multiple nodes; hybrid shared- and distributed-memory) Chapel is provided by chapel-ofi (for the OmniPath interconnect on Cedar) and chapel-ucx (for the InfiniBand interconnect on Graham, Béluga, Narval) modules.

Consider the following Chapel code printing basic information about the nodes available inside your job:

File : probeLocales.chpl

use MemDiagnostics;
for loc in Locales do
  on loc {
    writeln("locale #", here.id, "...");
    writeln("  ...is named: ", here.name);
    writeln("  ...has ", here.numPUs(), " processor cores");
    writeln("  ...has ", here.physicalMemory(unit=MemUnits.GB, retType=real), " GB of memory");
    writeln("  ...has ", here.maxTaskPar, " maximum parallelism");
  }


To run this code on Cedar, you need to load the chapel-ofi module:

[name@server ~]$ module load gcc/9.3.0 chapel-ofi/1.31.0
[name@server ~]$ salloc --time=0:30:0 --nodes=4 --cpus-per-task=3 --mem-per-cpu=3500 --account=def-someprof


Once the interactive job starts, you can compile and run your code from the prompt on the first allocated compute node:

[name@server ~]$ chpl --fast probeLocales.chpl -o probeLocales
[name@server ~]$ ./probeLocales -nl 4

To run the same code on InfiniBand-based clusters (all those except Cedar), please use the chapel-ucx module.

For production jobs, please write a Slurm submission script and submit your job with sbatch instead.