Julia

From Alliance Doc
Revision as of 18:33, 7 September 2021 by Mboisson (talk | contribs)
Jump to navigation Jump to search
Other languages:

Julia is a programming language that was designed from the beginning for performance, ease of use and portability. It is available as a module on Compute Canada clusters.

Compiling packages[edit]

When compiling packages for Julia, files will normally be added to ~/.julia. However, you may run into problems if the package depends on system-provided libraries. For instance, JLD depends on a system-provided HDF5 library. On a personal computer, Julia attempts to install such a dependency using yum or apt with sudo. This will not work on a Compute Canada cluster; instead, some extra information must be provided to allow Julia's package manager (Pkg) to find the HDF5 library.

$ module load gcc/7.3.0 hdf5 julia/1.4.1

$ julia

julia> using Libdl
julia> push!(Libdl.DL_LOAD_PATH, ENV["HDF5_DIR"] * "/lib")
julia> using Pkg
julia> Pkg.add("JLD")
julia> using JLD

If we were to omit the Libdl.DL_LOAD_PATH line from the above example, it would happen to work on Graham because Graham has HDF5 installed system-wide. It would fail on Cedar because Cedar does not. The best practice on any Compute Canada system, though, is that shown above: Load the appropriate module first, and use the environment variable defined by the module (HDF5_DIR in this example) to extend Libdl.DL_LOAD_PATH. This will work uniformly on all systems.

From Julia 1.6.0 onwards, the HDF5 Julia package provides its own copy of the underlying C library. Therefore, you should not load the hdf5 module or adjust the library path. (This also applies to JLD and other packages that depend on HDF5.)

Package files and storage quotas[edit]

In the example above, installing just the JLD package creates a ~/.julia tree with 18673 files and directories and using 236M of space, almost 5% of a standard user's quota for /home. It's worth remembering that installing a lot of packages will consume a lot of space.

In addition, because all Julia's user files are saved in $HOME/.julia, significant slowdowns can be experienced when working with non-trivial dependencies. When one uses Julia's distributed programming mode, this overhead is increased due to k Julia processes accessing $HOME/.julia/compiled. On interactive (salloc) or sbatch nodes, one can opt for a workaround by:

Question.png
[name@server ~]$ {{{1}}}

With $SLURM_TMPDIR residing on a very fast local disk, installation, updates, precompilation, and reads to registries are much faster (up to x100). To make Julia only use this temporary path, one can use the following workaround

File : julia_install.jl

import Pkg
## Create local env in current dir (before you add packages)
## If current dir has an existing (local) env
# Pkg.activate(".") 
## Else
Pkg.instantiate()

## Add your application(s), julia will install dependencies as per your Project.toml
### A package living in your local scratch
Pkg.add(path="/scratch/you/MYAPP.jl")
### A remote repository (public of private with your ssh key configured)
Pkg.add(path="https://github.com/someuser/MYAPP.jl.git")
### A registered package
Pkg.add("Distributions")
### Verify the installed code passes self tests.
Pkg.test("MYAPP")

# If needed
# Pkg.update() 
println("Done")



File : run_julia_ethereal.sh

#!/bin/bash
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1024M
#SBATCH --time=0-00:10

## Make sure your existing installation is not modified
mv ~/.julia ~/.juliaold
## Set package manager path
export JULIA_DEPOT_PATH="$SLURM_TMPDIR:$JULIA_DEPOT_PATH"
# Or
# export JULIA_DEPOT_PATH="$SLURM_TMPDIR"

## Setup dependencies (since your Julia install is now bare of any packages)
## This takes less time than e.g. updating existing installation in ~/.julia
julia julia_install.jl

## Run your code
julia mycode.jl

## Restore the old package directory
mv ~/.juliaold ~/.julia

## Restore the depot variable
export JULIA_DEPOT_PATH="" # Setting to empty will cause Julia to repopulate it, e.g. to ~/.julia



This ensures any permanent registries and packages in $HOME/.julia are unaltered during execution of large compute tasks, while at the same time causing a drastic speedup. For more information see the relevant section of the Julia documentation.

Available versions[edit]

We have removed earlier versions of Julia (< 1.0) because the old package manager was creating vast numbers of small files which in turn caused performance issues on the parallel file systems. Please start using Julia 1.4, or newer versions.

Question.png
[name@server ~]$ module spider julia
--------------------------------------------------------
  julia: julia/1.4.1
--------------------------------------------------------
[...]
    You will need to load all module(s) on any one of the lines below before the "julia/1.4.1" module is available to load.

      nixpkgs/16.09  gcc/7.3.0
[...]
Question.png
[name@server ~]$ module load gcc/7.3.0 julia/1.4.1

Porting code from Julia 0.x to 1.x[edit]

In the summer of 2018 the Julia developers released version 1.0, in which they stabilized the language API and removed deprecated (outdated) functionality. To help updating Julia programs for version 1.0, the developers also released version 0.7.0. Julia 0.7.0 contains all the new functionality of 1.0 as well as the outdated functionalities from 0.x versions, which will give deprecation warnings when used. Code that runs in Julia 0.7 without warnings should be compatible with Julia 1.0.

Running Julia with multiple processes on clusters[edit]

The following is an example of running a parallel Julia code computing pi using 100 cores across nodes on a cluster


File : run_julia_pi.sh

#!/bin/bash
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1024M
#SBATCH --time=0-00:10

srun hostname -s > hostfile
sleep 5
julia --machine-file ./hostfile ./pi_p.jl 1000000000000


In this example, the command

srun hostname -s > hostfile

generates a list of names of the nodes allocated and writes it to the text file hostfile. Then the command

julia --machine-file ./hostfile ./pi_p.jl 1000000000000

starts one main Julia process and 100 worker processes on the nodes specified in the hostfile and runs the program pi_p.jl in parallel.

Running Julia with MPI[edit]

You must make sure Julia's MPI is configured to use our MPI libraries. To install correctly, run the following:

module load StdEnv/2020  julia/1.5.2
export JULIA_MPI_BINARY=system
export JULIA_MPI_PATH=$EBROOTOPENMPI
export JULIA_MPI_LIBRARY=$EBROOTOPENMPI/lib64/libmpi.so
export JULIA_MPI_ABI=OpenMPI
export JULIA_MPIEXEC=$EBROOTOPENMPI/bin/mpiexec

Then start Julia and inside it run:

import Pkg;
Pkg.add("MPI")
using MPI

To use afterwards, run (with two processes in this example):

module load StdEnv/2020  julia/1.5.2
mpirun -np 2 julia hello.jl

The hello.jl code here is:

using MPI
MPI.Init()
comm = MPI.COMM_WORLD
print("Hello world, I am rank $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))\n")
MPI.Barrier(comm)

Videos[edit]

A series of online seminars produced by SHARCNET: