R: Difference between revisions

Jump to navigation Jump to search
53 bytes removed ,  2 years ago
no edit summary
(Marked this version for translation)
No edit summary
Line 4: Line 4:


<!--T:1-->
<!--T:1-->
R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.
R is a system for statistical computation and graphics. It consists of a language plus a runtime environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.


<!--T:2-->
<!--T:2-->
Line 10: Line 10:


<!--T:83-->
<!--T:83-->
Compute Canada user Julie Fortin has written a blog post, [https://medium.com/the-nature-of-food/how-to-run-your-r-script-with-compute-canada-c325c0ab2973 "How to run your R script with Compute Canada"], which you might find useful.
User Julie Fortin has written a blog post, [https://medium.com/the-nature-of-food/how-to-run-your-r-script-with-compute-canada-c325c0ab2973 "How to run your R script with Compute Canada"] which you might find useful.


== The R interpreter == <!--T:3-->
== The R interpreter == <!--T:3-->
Line 72: Line 72:


<!--T:74-->
<!--T:74-->
A simple jobscript looks like this:
A simple job script looks like this:
</translate>
</translate>
{{File
{{File
Line 116: Line 116:


==== Installing for one or many R versions ==== <!--T:79-->
==== Installing for one or many R versions ==== <!--T:79-->
Specify the local installation directory according to currently R module that is loaded.
Specify the local installation directory according to the R module that is currently loaded.
{{Commands
{{Commands
|mkdir -p ~/.local/R/$EBVERSIONR/
|mkdir -p ~/.local/R/$EBVERSIONR/
Line 150: Line 150:


<!--T:77-->
<!--T:77-->
Using the R command <tt>system()</tt> you can execute commands in the ambient environment from inside R. On Compute Canada clusters this can lead to problems because R will give an incorrect value to the environment variable <tt>LD_LIBRARY_PATH</tt>. You can avoid this problem by using the syntax <tt>system("LD_LIBRARY_PATH=$RSNT_LD_LIBRARY_PATH <my system call>")</tt> in your R system calls.
Using the R command <tt>system()</tt> you can execute commands in the ambient environment from inside R. On our clusters, this can lead to problems because R will give an incorrect value to the environment variable <tt>LD_LIBRARY_PATH</tt>. You can avoid this problem by using the syntax <tt>system("LD_LIBRARY_PATH=$RSNT_LD_LIBRARY_PATH <my system call>")</tt> in your R system calls.




== Passing arguments to R scripts == <!--T:84-->
== Passing arguments to R scripts == <!--T:84-->
Sometimes it can be useful to pass parameters as arguments to R scripts, to avoid having to either change the R script for every job or having to manage multiple copies of otherwise identical scripts. This can be useful to specify the names for input- or output-files, or maybe numerical parameters.
Sometimes it can be useful to pass parameters as arguments to R scripts, to avoid having to either change the R script for every job or having to manage multiple copies of otherwise identical scripts. This can be useful to specify the names for input or output files, or maybe numerical parameters.


<!--T:85-->
<!--T:85-->
Line 190: Line 190:


<!--T:71-->
<!--T:71-->
The processors on Compute Canada clusters are quite ordinary.  
The processors on our clusters are quite ordinary.  
What makes these supercomputers ''super'' is that you have access to thousands of CPU cores with a high-performance network.
What makes these supercomputers ''super'' is that you have access to thousands of CPU cores with a high-performance network.
In order to take advantage of this hardware you must run code "in parallel". Note however that prior to investing a lot of time and effort
In order to take advantage of this hardware you must run code "in parallel." Note however that prior to investing a lot of time and effort
in parallelizing your R code, you should first ensure that your serial implementation is as efficient as possible. As an interpreted  
in parallelizing your R code, you should first ensure that your serial implementation is as efficient as possible. As an interpreted  
language, the use of loops in R, and especially nested loops, constitutes a significant performance bottleneck. Whenever possible you  
language, the use of loops in R, and especially nested loops, constitutes a significant performance bottleneck. Whenever possible you  
Line 201: Line 201:
<!--T:72-->
<!--T:72-->
The [https://cran.r-project.org/web/views/HighPerformanceComputing.html CRAN Task View on High-Performance and Parallel Computing with R]
The [https://cran.r-project.org/web/views/HighPerformanceComputing.html CRAN Task View on High-Performance and Parallel Computing with R]
describes a bewildering collection of inter-related R packages for parallel computing.  
describes a bewildering collection of interrelated R packages for parallel computing.  
In the following subsections we present two methods of parallelizing an R code, both of which  
In the following subsections, we present two methods of parallelizing an R code, both of which  
are supported on Compute Canada clusters.
are supported on our clusters.


<!--T:73-->
<!--T:73-->
'''A note on terminology:''' In most Compute Canada documentation the term 'node' refers  
'''A note on terminology:''' In most of our documentation the term 'node' refers  
to an individual machine, also called a 'host', and a collection of such nodes makes up a 'cluster'.   
to an individual machine, also called a 'host', and a collection of such nodes makes up a 'cluster'.   
In a lot of R documentation however, the term 'node' refers to a worker process and a 'cluster' is a
In a lot of R documentation however, the term 'node' refers to a worker process and a 'cluster' is a
rsnt_translations
56,420

edits

Navigation menu