Julia: Difference between revisions
No edit summary |
(Marked this version for translation) |
||
(11 intermediate revisions by 5 users not shown) | |||
Line 9: | Line 9: | ||
[https://julialang.org Julia] is a programming language that was designed for performance, ease of use and portability. It is is available as a [[Utiliser_des_modules/en | module]] on the Alliance clusters. | [https://julialang.org Julia] is a programming language that was designed for performance, ease of use and portability. It is is available as a [[Utiliser_des_modules/en | module]] on the Alliance clusters. | ||
= | = Installing packages = <!--T:3--> | ||
<!--T:4--> | <!--T:4--> | ||
The first time you add a package to a Julia project (using <code>Pkg.add</code> or the package mode), the package will be downloaded, installed in <code>~/.julia</code>, and | The first time you add a package to a Julia project (using <code>Pkg.add</code> or the package mode), the package will be downloaded, installed in <code>~/.julia</code>, and precompiled. The same package can be added to different projects, in which case the data in <code>~/.julia</code> will be reused. Different versions of the same package can be added to different projects; the required package versions will coexist in <code>~/.julia</code>. (Compared to Python, Julia projects replace “virtual environments” while avoiding code duplication.) | ||
<!--T:38--> | <!--T:38--> | ||
From Julia 1.6 onwards, Julia packages include their binary dependencies (such as libraries). There is therefore no need to load any software module, and we recommend not to. | <b>From Julia 1.6 onwards,</b> Julia packages include their binary dependencies (such as libraries). There is therefore no need to load any software module, and we recommend not to. | ||
<!--T:39--> | <!--T:39--> | ||
With Julia 1.5 and earlier, you may run into problems if a package depends on system-provided binaries. For instance, [https://github.com/JuliaIO/JLD.jl JLD] depends on a system-provided HDF5 library. On a personal computer, Julia attempts to install such a dependency using [https://en.wikipedia.org/wiki/Yum_(software) yum] or [https://en.wikipedia.org/wiki/APT_(Debian) apt] with [https://en.wikipedia.org/wiki/Sudo sudo]. This will not work on our clusters; instead, some extra information must be provided to allow Julia's package manager (Pkg) to find the HDF5 library. | <b>With Julia 1.5 and earlier,</b> you may run into problems if a package depends on system-provided binaries. For instance, [https://github.com/JuliaIO/JLD.jl JLD] depends on a system-provided HDF5 library. On a personal computer, Julia attempts to install such a dependency using [https://en.wikipedia.org/wiki/Yum_(software) yum] or [https://en.wikipedia.org/wiki/APT_(Debian) apt] with [https://en.wikipedia.org/wiki/Sudo sudo]. This will not work on our clusters; instead, some extra information must be provided to allow Julia's package manager (Pkg) to find the HDF5 library. | ||
<!--T:5--> | <!--T:5--> | ||
Line 33: | Line 33: | ||
<!--T:49--> | <!--T:49--> | ||
Note that JLD | Note that the example package we use here, JLD, has been superseded by [https://juliapackages.com/p/jld2 JLD2] which no longer relies on a system-installed HDF5 library, and is therefore more portable. | ||
= | == The Narval cluster == <!--T:70--> | ||
<!--T:71--> | |||
<div style="color: red; border: 1px dashed #2f6fab"> | |||
<!--T:72--> | |||
[[File:Warning sign 16px.png|frameless|warning]] Warning | |||
<!--T:73--> | |||
</div> | |||
<!--T:74--> | |||
On the Narval cluster, Julia sometimes crashes while installing packages in /home directories, due to a bug in the filesystem software. This happens during the precompilation step and causes Julia to exit with a segmentation fault. | |||
<!--T:75--> | |||
Until this bug is resolved, you should use an alternate location, such as <code>/project</code>, for your Julia “depot” on Narval, as explained in the next section. | |||
== Changing the depot path == <!--T:7--> | |||
<!--T:50--> | <!--T:50--> | ||
Line 41: | Line 58: | ||
<!--T:51--> | <!--T:51--> | ||
To avoid this issue, you can store your personal Julia “depot” (containing packages, registries, | To avoid this issue, you can store your personal Julia “depot” (containing packages, registries, precompiled files, etc.) in a different location, such as your project space. For example, user <code>alice</code>, a member of the <code>def-bob</code> project, could add the following to her <code>~/.bashrc</code> file: | ||
<!--T:52--> | <!--T:52--> | ||
Line 53: | Line 70: | ||
<!--T:56--> | <!--T:56--> | ||
Alternatively, one can create | Alternatively, one can create an [[Apptainer]] image with a chosen version of Julia and a selection of packages, and JULIA_DEPOT_PATH redirected inside the container. This does mean that you lose the advantage of our optimized Julia modules. However, your container now contains the potentially very large set of small files inside 1 container file (.sif), potentially improving IO performance. Reproducibility is also improved, the container will run anywhere as-is. Another use case is if you want to test Julia nightly builds without altering your local Julia installation, or when you need to bundle your own specific dependencies, because the container creation gives you complete control at creation. | ||
= Available versions = <!--T:9--> | = Available versions = <!--T:9--> | ||
Line 210: | Line 227: | ||
Setting the number of threads to the number of processors is a typical choice (although see [[Scalability]] for a discussion). | Setting the number of threads to the number of processors is a typical choice (although see [[Scalability]] for a discussion). | ||
In addition, one can 'pin' threads to cores, by setting | In addition, one can 'pin' threads to cores, by setting | ||
JULIA_EXCLUSIVE to anything non-zero. As per the [https://docs.julialang.org/en/v1/manual/environment-variables/#JULIA_EXCLUSIVE documentation] this takes control of thread scheduling away from the OS, and pins threads to cores (sometimes referred to 'green' threads with affinity). Depending on the computation that threads execute, this can improve performance when one has precise information on cache access patterns or otherwise unwelcome scheduling patterns used by the OS. Setting JULIA_EXCLUSIVE works only if your job has exclusive access to the compute nodes (all available CPU cores were allocated to your job). Since SLURM already pins processes and threads to CPU cores, asking Julia to re-pin threads may not lead to any performance improvement. | JULIA_EXCLUSIVE to anything non-zero. As per the [https://docs.julialang.org/en/v1/manual/environment-variables/#JULIA_EXCLUSIVE documentation], this takes control of thread scheduling away from the OS, and pins threads to cores (sometimes referred to 'green' threads with affinity). Depending on the computation that threads execute, this can improve performance when one has precise information on cache access patterns or otherwise unwelcome scheduling patterns used by the OS. Setting JULIA_EXCLUSIVE works only if your job has exclusive access to the compute nodes (all available CPU cores were allocated to your job). Since SLURM already pins processes and threads to CPU cores, asking Julia to re-pin threads may not lead to any performance improvement. | ||
<!--T:36--> | <!--T:36--> | ||
Line 216: | Line 233: | ||
<!--T:37--> | <!--T:37--> | ||
It goes without saying that configuring these values should only be done when one has accurately profiled any contention issues. Given the high pace at which Julia, and especially its threading | It goes without saying that configuring these values should only be done when one has accurately profiled any contention issues. Given the high pace at which Julia, and especially its threading subsystem Base.Threads, evolves, one should always consult the documentation to ensure changing the default configuration will have only the expected behaviour as a result. | ||
= Using GPUs with Julia = <!--T:61--> | = Using GPUs with Julia = <!--T:61--> | ||
Line 231: | Line 248: | ||
<!--T:64--> | <!--T:64--> | ||
julia> using CUDA | julia> using CUDA | ||
julia> CUDA.set_runtime_version!(v"version_of_cuda", local_toolkit=true) | |||
where version_of_cuda is 12.2 if cuda/12.2 is loaded. | |||
<!--T:76--> | |||
For older versions of cuda, please try | |||
julia> CUDA.set_runtime_version!("local") | julia> CUDA.set_runtime_version!("local") | ||
Line 257: | Line 279: | ||
3 | 3 | ||
4 | 4 | ||
= Videos = <!--T:23--> | = Videos = <!--T:23--> |
Latest revision as of 16:40, 1 October 2024
Julia is a programming language that was designed for performance, ease of use and portability. It is is available as a module on the Alliance clusters.
Installing packages[edit]
The first time you add a package to a Julia project (using Pkg.add
or the package mode), the package will be downloaded, installed in ~/.julia
, and precompiled. The same package can be added to different projects, in which case the data in ~/.julia
will be reused. Different versions of the same package can be added to different projects; the required package versions will coexist in ~/.julia
. (Compared to Python, Julia projects replace “virtual environments” while avoiding code duplication.)
From Julia 1.6 onwards, Julia packages include their binary dependencies (such as libraries). There is therefore no need to load any software module, and we recommend not to.
With Julia 1.5 and earlier, you may run into problems if a package depends on system-provided binaries. For instance, JLD depends on a system-provided HDF5 library. On a personal computer, Julia attempts to install such a dependency using yum or apt with sudo. This will not work on our clusters; instead, some extra information must be provided to allow Julia's package manager (Pkg) to find the HDF5 library.
$ module load gcc/7.3.0 hdf5 julia/1.4.1 $ julia julia> using Libdl julia> push!(Libdl.DL_LOAD_PATH, ENV["HDF5_DIR"] * "/lib") julia> using Pkg julia> Pkg.add("JLD") julia> using JLD
If we were to omit the Libdl.DL_LOAD_PATH
line from the above example, it would happen to work on Graham because Graham has HDF5 installed system-wide. It would fail on Cedar because Cedar does not. The best practice on any of our systems, though, is that shown above: Load the appropriate module first, and use the environment variable defined by the module (HDF5_DIR
in this example) to extend Libdl.DL_LOAD_PATH
. This will work uniformly on all systems.
Note that the example package we use here, JLD, has been superseded by JLD2 which no longer relies on a system-installed HDF5 library, and is therefore more portable.
The Narval cluster[edit]
On the Narval cluster, Julia sometimes crashes while installing packages in /home directories, due to a bug in the filesystem software. This happens during the precompilation step and causes Julia to exit with a segmentation fault.
Until this bug is resolved, you should use an alternate location, such as /project
, for your Julia “depot” on Narval, as explained in the next section.
Changing the depot path[edit]
Installing Julia packages in your home directory will create large numbers of files. For example, starting from an empty ~/.julia
directory (no packages installed), installing just the Gadfly.jl
plotting package will result in around 96M and 37000 files (7% of the total number of files allowed by your home directory quota). If you install a large number of Julia packages, you may exceed your quota.
To avoid this issue, you can store your personal Julia “depot” (containing packages, registries, precompiled files, etc.) in a different location, such as your project space. For example, user alice
, a member of the def-bob
project, could add the following to her ~/.bashrc
file:
export JULIA_DEPOT_PATH="/project/def-bob/alice/julia:$JULIA_DEPOT_PATH"
This will use the /project/def-bob/alice/julia
directory preferentially. Files in ~/.julia
will still be considered, and ~/.julia
will still be used for some files such as your command history. When moving your depot to a different location, it is better to remove your existing ~/.julia
depot first if you have one:
$ rm -rf $HOME/.julia
Alternatively, one can create an Apptainer image with a chosen version of Julia and a selection of packages, and JULIA_DEPOT_PATH redirected inside the container. This does mean that you lose the advantage of our optimized Julia modules. However, your container now contains the potentially very large set of small files inside 1 container file (.sif), potentially improving IO performance. Reproducibility is also improved, the container will run anywhere as-is. Another use case is if you want to test Julia nightly builds without altering your local Julia installation, or when you need to bundle your own specific dependencies, because the container creation gives you complete control at creation.
Available versions[edit]
We have removed earlier versions of Julia (< 1.0) because the old package manager was creating vast numbers of small files which in turn caused performance issues on the parallel file systems. Please start using Julia 1.4, or newer versions.
[name@server ~]$ module spider julia
--------------------------------------------------------
julia: julia/1.4.1
--------------------------------------------------------
[...]
You will need to load all module(s) on any one of the lines below before the "julia/1.4.1" module is available to load.
nixpkgs/16.09 gcc/7.3.0
[...]
[name@server ~]$ module load gcc/7.3.0 julia/1.4.1
Porting code from Julia 0.x to 1.x[edit]
In the summer of 2018 the Julia developers released version 1.0, in which they stabilized the language API and removed deprecated (outdated) functionality. To help updating Julia programs for version 1.0, the developers also released version 0.7.0. Julia 0.7.0 contains all the new functionality of 1.0 as well as the outdated functionalities from 0.x versions, which will give deprecation warnings when used. Code that runs in Julia 0.7 without warnings should be compatible with Julia 1.0.
Using PyCall.jl to call Python from Julia[edit]
Julia can interface with Python code using PyCall.jl. When using PyCall.jl, set the PYTHON
environment variable to the python executable in your virtual Python environment. On our clusters, we recommend using virtual Python environments as described in our Python documentation. After activating a virtual Python environment, you can use it in PyCall.jl:
$ source "$HOME/myenv/bin/activate" (myenv) $ julia [...] julia> using Pkg, PyCall julia> ENV["PYTHON"] = joinpath(ENV["VIRTUAL_ENV"], "bin", "python") julia> Pkg.build("PyCall")
We strongly advise against the default PyCall.jl behaviour, which is to use a Miniconda distribution inside your Julia environment. Anaconda and similar distributions are not suitable on our clusters.
Note that if you do not create a virtual environment as shown above, PyCall will default to the operating system python installation, which is never what you want. It will invoke Conda.jl, but fail to recognize the correct path unless you rebuild with ENV["PYTHON"]=""
. In addition, apart from incompatibilities with the software stack, the Miniconda installer creates a large number of files inside JULIA_DEPOT_PATH
. If that is ~/.julia
, the default, you can run into performance and quota issues.
See the PyCall.jl documentation for details.
Running Julia with multiple processes on clusters[edit]
The following is an example of running a parallel Julia code computing pi using 100 cores across nodes on a cluster.
#!/bin/bash
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1024M
#SBATCH --time=0-00:10
srun hostname -s > hostfile
sleep 5
julia --machine-file ./hostfile ./pi_p.jl 1000000000000
In this example, the command
srun hostname -s > hostfile
generates a list of names of the nodes allocated and writes it to the text file hostfile. Then the command
julia --machine-file ./hostfile ./pi_p.jl 1000000000000
starts one main Julia process and 100 worker processes on the nodes specified in the hostfile and runs the program pi_p.jl in parallel.
Running Julia with MPI[edit]
You must make sure Julia's MPI is configured to use our MPI libraries. If you are using Julia MPI 0.19 or earlier, run the following:
module load StdEnv julia export JULIA_MPI_BINARY=system export JULIA_MPI_PATH=$EBROOTOPENMPI export JULIA_MPI_LIBRARY=$EBROOTOPENMPI/lib64/libmpi.so export JULIA_MPI_ABI=OpenMPI export JULIA_MPIEXEC=$EBROOTOPENMPI/bin/mpiexec
Then start Julia and inside it run:
import Pkg;
Pkg.add("MPI")
If you are using Julia MPI 0.20 or later, run the following. Note that this will append an [MPIPreferences]
section to your .julia/environments/vX.Y/LocalPreferences.toml
file.
If that file already exists and has an [MPIPreferences]
section, first edit the file and remove the section.
module load julia mkdir -p .julia/environments/v${EBVERSIONJULIA%.*} cat >> .julia/environments/v${EBVERSIONJULIA%.*}/LocalPreferences.toml << EOF [MPIPreferences] _format = "1.0" abi = "OpenMPI" binary = "system" libmpi = "${EBROOTOPENMPI}/lib64/libmpi.so" mpiexec = "${EBROOTOPENMPI}/bin/mpiexec" EOF
Then start Julia and inside it run:
import Pkg
Pkg.add("MPIPreferences")
Pkg.add("MPI")
To use afterwards, run (with two processes in this example):
module load StdEnv julia mpirun -np 2 julia hello.jl
The hello.jl
code here is:
using MPI MPI.Init() comm = MPI.COMM_WORLD print("Hello world, I am rank $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))\n") MPI.Barrier(comm)
Configuring Julia's threading behaviour[edit]
You can restrict the number of threads Julia can use by setting JULIA_NUM_THREADS=k, for example a single process on a 12 cpus-per-task job could use k=12. Setting the number of threads to the number of processors is a typical choice (although see Scalability for a discussion). In addition, one can 'pin' threads to cores, by setting JULIA_EXCLUSIVE to anything non-zero. As per the documentation, this takes control of thread scheduling away from the OS, and pins threads to cores (sometimes referred to 'green' threads with affinity). Depending on the computation that threads execute, this can improve performance when one has precise information on cache access patterns or otherwise unwelcome scheduling patterns used by the OS. Setting JULIA_EXCLUSIVE works only if your job has exclusive access to the compute nodes (all available CPU cores were allocated to your job). Since SLURM already pins processes and threads to CPU cores, asking Julia to re-pin threads may not lead to any performance improvement.
Related is the variable JULIA_THREAD_SLEEP_THRESHOLD, controlling the number of nanoseconds after which a spinning thread is scheduled to sleep. A value of infinite (as string) indicates no sleeping on spinning. Changing this variable can be of use if many threads are contending frequently for a shared resource, where it can be preferred to schedule out spinning threads more quickly. Under heavy contention, spinning would only increase CPU load. Conversely, in a situation where a resource is only very infrequently contended, lower latency can result from prohibiting threads to sleep, that is, setting the threshold to infinity.
It goes without saying that configuring these values should only be done when one has accurately profiled any contention issues. Given the high pace at which Julia, and especially its threading subsystem Base.Threads, evolves, one should always consult the documentation to ensure changing the default configuration will have only the expected behaviour as a result.
Using GPUs with Julia[edit]
Julia's primary programming interface for GPUs is the CUDA.jl package. The Julia package manager can be used to install it:
$ module load cuda/11.4 julia/1.8.1 $ julia julia> import Pkg; Pkg.add("CUDA")
It is possible that the CUDA toolkit downloaded during installation will not work with the installed CUDA driver. This problem can be avoided by configuring Julia to use the local CUDA toolkit:
julia> using CUDA julia> CUDA.set_runtime_version!(v"version_of_cuda", local_toolkit=true)
where version_of_cuda is 12.2 if cuda/12.2 is loaded.
For older versions of cuda, please try
julia> CUDA.set_runtime_version!("local")
After restarting Julia you can verify that it is using the correct CUDA version:
julia> CUDA.versioninfo() CUDA runtime 11.4, local installation ...
The following Julia code can be used to test the installation:
julia> a = CuArray([1,2,3]) 3-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}: 1 2 3
julia> a.+=1 3-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}: 2 3 4
Videos[edit]
A series of online seminars produced by SHARCNET:
- Julia: A first perspective (47 minutes)
- Julia: A second perspective (57 minutes)
- Julia: A third perspective - parallel computing explained (65 minutes)
- Julia: Parallel computing revisited (available soon)