User:Ppomorsk
Pawel Pomorski works for SHARCNET at the University of Waterloo.
April 2023[edit]
module load python/3.8 module load scipy-stack/2023a gcc/9.3.0 cuda/11.4 arrow/8.0.0 virtualenv $HOME/jupyter_py3.8 source $HOME/jupyter_py3.8/bin/activate pip install --no-index --upgrade pip pip install ipykernel pip install pytz_deprecation_shim pip install –no-index torch python -m ipykernel install --user --name test_py38 --display-name "Python 3.8 Kernel" deactivate
- installed in $HOME/.local/share/jupyter/kernels/
- need newer numpy?
Working with processors that have non-uniform memory access (NUMA)[edit]
This is the NUMA layout on one of the Graham broadwell nodes.
[usergra245 ~]$ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 64030 MB node 0 free: 61453 MB node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 64508 MB node 1 free: 61016 MB node distances: node 0 1 0: 10 21 1: 21 10
Submitting a job that requests that type of node, with two multi-threaded tasks, each utilizing one NUMA node.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --exclusive #SBATCH --constraint=broadwell #SBATCH --mem=0 #SBATCH --ntasks=2 #SBATCH --cpus-per-task=16 #SBATCH -t 0:00:05 # time (D-HH:MM) #SBATCH --account=cc-debug export OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 ./test.x & numactl --cpunodebind=1 --membind=1 ./test.x & wait
FEniCS[edit]
For installation instructions, go to dedicated page: FEniCS
Instructions for installing VMD version 1.9.4 Alpha[edit]
1. Download the 1.9.4 LATEST ALPHA tar file from http://www.ks.uiuc.edu/, selecting the LINUX_64 version (free registration is required).
2. Copy the file to the home directory of the cluster you wish to use.
3. Unpack the file with:
tar xvf vmd-1.9.4*.opengl.tar.gz
4. Enter the created directory by:
cd vmd-1.9.4*
5.
mkdir ~/vmd_install mkdir ~/vmd_library
6. edit the configure file to read
# Directory where VMD startup script is installed, should be in users' paths. $install_bin_dir="/home/your_user_name/vmd_install";
# Directory where VMD files and executables are installed $install_library_dir="/home/your_user_name/vmd_library";
but replace your_user_name with your actual user name.
7. Run configure:
./configure
8. Run make
cd src make install
9. Add the resulting executable to your path
export PATH=~/vmd_install:$PATH
If getting a blank window on a Mac, try:
defaults write org.macosforge.xquartz.X11 enable_iglx -bool true
Benchmarking NAMD[edit]
This section shows an example of how you should conduct benchmarking of NAMD. Performance of NAMD will be different for different systems you are simulating, depending especially on the number of atoms in the simulation. Therefore, if you plan to spend a significant amount of time simulating a particular system, it would be very useful to conduct the kind of benchmarking shown below. Collecting and providing this kind of data is also very useful if you are applying for a RAC award.
For a good benchmark, please vary the number of steps so that your system runs for a few minutes, and that timing information is collected in reasonable time intervals of at least a few seconds. If your run is too short, you might see fluctuations in your timing results.
The numbers below were obtained for the standard NAMD apoa1 benchmark. The benchmarking was conducted on the graham cluster, which has CPU nodes with 32 cores and GPU nodes with 32 cores and 2 GPUs. Performing the benchmark on other clusters will have to take account of the different structure of their nodes.
In the results shown in first table below we used NAMD from verbs module. Efficiency is computed from (time with 1 core) / (N * (time with N cores) ).
# cores | Wall time (s) per step | Efficiency |
---|---|---|
1 | 0.8313 | 100% |
2 | 0.4151 | 100% |
4 | 0.1945 | 107% |
8 | 0.0987 | 105% |
16 | 0.0501 | 104% |
32 | 0.0257 | 101% |
64 | 0.0133 | 98% |
128 | 0.0074 | 88% |
256 | 0.0036 | 90% |
512 | 0.0021 | 77% |
These results show that for this system it is acceptable to use up to 256 cores. Keep in mind that if you ask for more cores, your jobs will wait in the queue for a longer time, affecting your overall throughput.
Now we perform benchmarking with GPUs. NAMD multicore module is used for simulations that fit within 1 node, and NAMD verbs-smp module is used for runs spanning nodes.
# cores | #GPUs | Wall time (s) per step | Notes |
---|---|---|---|
4 | 1 | 0.0165 | 1 node, multicore |
8 | 1 | 0.0088 | 1 node, multicore |
16 | 1 | 0.0071 | 1 node, multicore |
32 | 2 | 0.0045 | 1 node, multicore |
64 | 4 | 0.0058 | 2 nodes, verbs-smp |
128 | 8 | 0.0051 | 2 nodes, verbs-smp |
From this table it is clear that there is no point at all in using more than 1 node for this system, since performance actually becomes worse if we use 2 or more nodes. Using only 1 node, it is best to use 1GPU/16 core as that has the greatest efficiency, but also acceptable to use 2GPU/32core if you need to get your results quickly. Since on graham GPU nodes your priority is charged the same for any job using up to 16 cores and 1 GPU, there is no benefit from running with 8 cores and 4 cores in this case.
Finally, you have to ask whether to run with or without GPUs for this simulation. From our numbers we can see that using a full GPU node of graham (32 cores, 2 gpus) the job runs faster that it would on 4 non-GPU nodes of graham. Since a GPU node of graham costs about two times what a non-GPU node costs, in this case it is more cost effective to run with GPUs. So, you should run with GPUs if possible, however given that there are fewer GPU than CPU nodes, you may need to consider submitting non-GPU jobs if your wait for GPU jobs is too long.
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Instructions for installing version 1.9.4 Alpha[edit]
1. Download the 1.9.4 LATEST ALPHA tar file from http://www.ks.uiuc.edu/, selecting the LINUX_64 version (free registration is required).
2. Copy the file to the home directory of the cluster you wish to use.
3. Unpack the file with:
tar xvf vmd-1.9.4*.opengl.tar.gz
4. Enter the created directory by:
cd vmd-1.9.4*
5.
mkdir ~/vmd_install mkdir ~/vmd_library
6. edit the configure file to read
# Directory where VMD startup script is installed, should be in users' paths. $install_bin_dir="/home/your_user_name/vmd_install";
# Directory where VMD files and executables are installed $install_library_dir="/home/your_user_name/vmd_library";
but replace your_user_name with your actual user name.
7. Run configure:
./configure
8. Run make
cd src make install
9. Add the resulting executable to your path
export PATH=~/vmd_install:$PATH
If getting a blank window on a Mac, try:
defaults write org.macosforge.xquartz.X11 enable_iglx -bool true