R: Difference between revisions

Jump to navigation Jump to search
no edit summary
(Marked this version for translation)
No edit summary
Line 7: Line 7:


<!--T:2-->
<!--T:2-->
Even though R was not developed for high performance computing (HPC), its popularity with scientists from a variety of disciplines, including engineering, mathematics, statistics, bioinformatics, etc. makes it an essential tool on HPC installations dedicated to academic research. Features such as C extensions, byte-compiled code and parallelisation allow for reasonable performance in single-node jobs. Thanks to R’s modular nature, users can customize the R functions available to them by installing packages from the Comprehensive R Archive Network ([https://cran.r-project.org/ CRAN]) into their home directories.
Even though R was not developed for high-performance computing (HPC), its popularity with scientists from a variety of disciplines, including engineering, mathematics, statistics, bioinformatics, etc. makes it an essential tool on HPC installations dedicated to academic research. Features such as C extensions, byte-compiled code and parallelization allow for reasonable performance in single-node jobs. Thanks to R’s modular nature, users can customize the R functions available to them by installing packages from the Comprehensive R Archive Network ([https://cran.r-project.org/ CRAN]) into their home directories.


<!--T:83-->
<!--T:83-->
Line 206: Line 206:
<!--T:71-->
<!--T:71-->
The processors on our clusters are quite ordinary.  
The processors on our clusters are quite ordinary.  
What makes these supercomputers ''super'' is that you have access to thousands of CPU cores with a high-performance network.
What makes these supercomputers <i>super</i> is that you have access to thousands of CPU cores with a high-performance network.
In order to take advantage of this hardware you must run code "in parallel." Note however that prior to investing a lot of time and effort
In order to take advantage of this hardware, you must run code "in parallel." However, note that prior to investing a lot of time and effort
in parallelizing your R code, you should first ensure that your serial implementation is as efficient as possible. As an interpreted  
in parallelizing your R code, you should first ensure that your serial implementation is as efficient as possible. As an interpreted  
language, the use of loops in R, and especially nested loops, constitutes a significant performance bottleneck. Whenever possible you  
language, the use of loops in R, and especially nested loops, constitutes a significant performance bottleneck. Whenever possible you  
Line 225: Line 225:


<!--T:73-->
<!--T:73-->
'''A note on terminology:''' In most of our documentation the term 'node' refers  
<b>A note on terminology:</b> In most of our documentation the term 'node' refers  
to an individual machine, also called a 'host', and a collection of such nodes makes up a 'cluster'.   
to an individual machine, also called a 'host', and a collection of such nodes makes up a 'cluster'.   
In a lot of R documentation however, the term 'node' refers to a worker process and a 'cluster' is a
In a lot of R documentation however, the term 'node' refers to a worker process and a 'cluster' is a
collection of such processes. As an example, consider the following quote, "Following '''snow''', a pool  
collection of such processes. As an example, consider the following quote, "Following <b>snow</b>, a pool  
of worker processes listening ''via'' sockets for commands from the master is called a 'cluster' of  
of worker processes listening ''via'' sockets for commands from the master is called a 'cluster' of  
nodes."<ref>Core package "parallel" vignette, https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf</ref>.
nodes."<ref>Core package "parallel" vignette, https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf</ref>.
rsnt_translations
57,772

edits

Navigation menu