Java: Difference between revisions

Jump to navigation Jump to search
147 bytes added ,  7 years ago
Marked this version for translation
No edit summary
(Marked this version for translation)
Line 1: Line 1:
<languages />
<languages />
<translate>
<translate>
<!--T:1-->
Java is a general-purpose, high-level, object-oriented programming language developed in 1995 by Sun Microsystems (purchased by Oracle in 2010). One of the principal design goals for Java was a high degree of portability across platforms, summarized by the slogan ''write once, run anywhere'', and which is realized by having Java source code compiled to 'byte code' which then runs inside a Java virtual machine (JVM), ensuring a very uniform environment across numerous architectures and platforms. This has made Java a popular language choice in some environments and it is also widely used as a language for teaching programming. While performance was not one of the original design goals for Java, there are ways to help Java code run quickly and it has enjoyed a certain popularity in some scientific domains such as the life sciences, e.g. software like the Broad Institute's [https://software.broadinstitute.org/gatk/ GATK]. This page is not designed to teach the Java programming language but merely to provide some tips and hints for the use of Java in a high-performance computing environment such as Compute Canada.  
Java is a general-purpose, high-level, object-oriented programming language developed in 1995 by Sun Microsystems (purchased by Oracle in 2010). One of the principal design goals for Java was a high degree of portability across platforms, summarized by the slogan ''write once, run anywhere'', and which is realized by having Java source code compiled to 'byte code' which then runs inside a Java virtual machine (JVM), ensuring a very uniform environment across numerous architectures and platforms. This has made Java a popular language choice in some environments and it is also widely used as a language for teaching programming. While performance was not one of the original design goals for Java, there are ways to help Java code run quickly and it has enjoyed a certain popularity in some scientific domains such as the life sciences, e.g. software like the Broad Institute's [https://software.broadinstitute.org/gatk/ GATK]. This page is not designed to teach the Java programming language but merely to provide some tips and hints for the use of Java in a high-performance computing environment such as Compute Canada.  


<!--T:2-->
Compute Canada's systems have several different Java virtual machines installed which are made available to users via the <tt>module</tt> command like other software packages. You should normally only have one Java module loaded at a time. The principal commands associated with such Java modules are <tt>java</tt> to launch the Java virtual machine and <tt>javac</tt> to call the Java compiler for converting a Java source file into byte code.  
Compute Canada's systems have several different Java virtual machines installed which are made available to users via the <tt>module</tt> command like other software packages. You should normally only have one Java module loaded at a time. The principal commands associated with such Java modules are <tt>java</tt> to launch the Java virtual machine and <tt>javac</tt> to call the Java compiler for converting a Java source file into byte code.  


<!--T:3-->
Java software is frequently distributed in the form of a JAR file with the extension <tt>jar</tt>. You can use such software by means of the following command,
Java software is frequently distributed in the form of a JAR file with the extension <tt>jar</tt>. You can use such software by means of the following command,
{{Command|java -jar file.jar}}
{{Command|java -jar file.jar}}
assuming the JAR file has been compiled to operate as an autonomous program (i.e. it possesses a <tt>Main-class</tt> manifest header).   
assuming the JAR file has been compiled to operate as an autonomous program (i.e. it possesses a <tt>Main-class</tt> manifest header).   


==Parallelism in Java==
==Parallelism in Java== <!--T:4-->


===Threading===
===Threading=== <!--T:5-->
Java includes built-in support for threading, obviating the need for separate interfaces and libraries like OpenMP, pthreads and Boost threads used in other languages. The principal Java object for handling concurrency is the <tt>Thread</tt> class which a programmer can use by either providing a <tt>Runnable</tt> method to the standard <tt>Thread</tt> class or by subclassing the <tt>Thread</tt> class. As an example of this second approach, consider the following toy program:   
Java includes built-in support for threading, obviating the need for separate interfaces and libraries like OpenMP, pthreads and Boost threads used in other languages. The principal Java object for handling concurrency is the <tt>Thread</tt> class which a programmer can use by either providing a <tt>Runnable</tt> method to the standard <tt>Thread</tt> class or by subclassing the <tt>Thread</tt> class. As an example of this second approach, consider the following toy program:   
{{File
{{File
Line 28: Line 31:
This second approach is generally the simplest to use but suffers from the drawback that Java does not permit multiple inheritance, so the class which implements multithreading is unable to subclass any other - potentially more useful - class.  
This second approach is generally the simplest to use but suffers from the drawback that Java does not permit multiple inheritance, so the class which implements multithreading is unable to subclass any other - potentially more useful - class.  


===MPI and Java===
===MPI and Java=== <!--T:6-->
One common method for using MPI-style parallelism in a Java program is the [http://mpj-express.org/ MPJ Express] library.
One common method for using MPI-style parallelism in a Java program is the [http://mpj-express.org/ MPJ Express] library.


==Pitfalls==
==Pitfalls== <!--T:7-->


===Memory Issues===
===Memory Issues=== <!--T:8-->
Java uses an automatic system called ''garbage collection'' to identify variables which are out of scope and return the memory associated with them to the operating system which however doesn't stop many Java programs from requiring significant amounts of memory to run correctly. When a Java virtual machine is launched using the <tt>java</tt> command by default the initial and maximum heap size are set to 1/64 and 1/4 of the system's physical memory respectively. This amount, particularly the maximum heap size, may well be inadequate and leaves a substantial amount of physical memory unused. To correct this problem, you can tell the Java virtual machine the maximum amount of memory to use with the command line argument <tt>Xmx</tt>, for instance
Java uses an automatic system called ''garbage collection'' to identify variables which are out of scope and return the memory associated with them to the operating system which however doesn't stop many Java programs from requiring significant amounts of memory to run correctly. When a Java virtual machine is launched using the <tt>java</tt> command by default the initial and maximum heap size are set to 1/64 and 1/4 of the system's physical memory respectively. This amount, particularly the maximum heap size, may well be inadequate and leaves a substantial amount of physical memory unused. To correct this problem, you can tell the Java virtual machine the maximum amount of memory to use with the command line argument <tt>Xmx</tt>, for instance
{{Command|java -Xmx8192m -jar file.jar}}   
{{Command|java -Xmx8192m -jar file.jar}}   
tells the Java virtual machine that it can use up to 8192 MB (8 GB) of memory. You can set the initial heap size with the argument <tt>Xms</tt> and you can see all the command line options the JVM is going to run with by specifying the following flag <tt>-XX:+PrintCommandLineFlags</tt>.
tells the Java virtual machine that it can use up to 8192 MB (8 GB) of memory. You can set the initial heap size with the argument <tt>Xms</tt> and you can see all the command line options the JVM is going to run with by specifying the following flag <tt>-XX:+PrintCommandLineFlags</tt>.


<!--T:9-->
Alternatively, you can use the <tt>_JAVA_OPTIONS</tt> environment variable to set the run-time options rather that passing them on the command line. This is especially convenient if you launch multiple Java calls, or call a Java program from another Java program. Here is an example how to do it:  
Alternatively, you can use the <tt>_JAVA_OPTIONS</tt> environment variable to set the run-time options rather that passing them on the command line. This is especially convenient if you launch multiple Java calls, or call a Java program from another Java program. Here is an example how to do it:  
{{Command|export _JAVA_OPTIONS{{=}}"-Xms256m -Xmx2g"}}
{{Command|export _JAVA_OPTIONS{{=}}"-Xms256m -Xmx2g"}}
When your Java program is run, it will produce a diagnostic message like this one "Picked up _JAVA_OPTIONS", verifying that the options have been picked up.
When your Java program is run, it will produce a diagnostic message like this one "Picked up _JAVA_OPTIONS", verifying that the options have been picked up.


<!--T:10-->
Please remember that the Java virtual machine itself creates a memory usage overhead. We recommend specifying the memory limit for your job as 1-2GB more than your setting on the Java command line option -Xmx.
Please remember that the Java virtual machine itself creates a memory usage overhead. We recommend specifying the memory limit for your job as 1-2GB more than your setting on the Java command line option -Xmx.


===Garbage Collection===
===Garbage Collection=== <!--T:11-->
By default, the Java VM uses a parallel garbage collector (GC) and sets a number of GC threads equal to the number of CPU cores on a given node, whether a Java job is threaded or not. Each GC thread consumes memory. Moreover, the amount of memory each GC thread consumes is proportional to the amount of physical memory. Therefore, we highly recommend matching the number of GC threads to the number of CPU cores you requested from the scheduler in your job submission script, like so <tt>-XX:ParallelGCThreads=12</tt> for example. You can also use the serial garbage collector by specifying the following option <tt>-XX:+UseSerialGC</tt>, whether your job is parallel or not.
By default, the Java VM uses a parallel garbage collector (GC) and sets a number of GC threads equal to the number of CPU cores on a given node, whether a Java job is threaded or not. Each GC thread consumes memory. Moreover, the amount of memory each GC thread consumes is proportional to the amount of physical memory. Therefore, we highly recommend matching the number of GC threads to the number of CPU cores you requested from the scheduler in your job submission script, like so <tt>-XX:ParallelGCThreads=12</tt> for example. You can also use the serial garbage collector by specifying the following option <tt>-XX:+UseSerialGC</tt>, whether your job is parallel or not.


===The <tt>volatile</tt> Keyword===
===The <tt>volatile</tt> Keyword=== <!--T:12-->
This keyword has a sense very different from that which C/C++ programmers are accustomed to. In Java <tt>volatile</tt> when applied to a variable has the effect of ensuring that its value is always read from and written to main memory, which can help to ensure that modifications of this variable are made visible to other threads. That said, there are contexts in which the use of the <tt>volatile</tt> keyword are not sufficient to avoid race conditions and the <tt>synchronized</tt> keyword is required to ensure program consistency.
This keyword has a sense very different from that which C/C++ programmers are accustomed to. In Java <tt>volatile</tt> when applied to a variable has the effect of ensuring that its value is always read from and written to main memory, which can help to ensure that modifications of this variable are made visible to other threads. That said, there are contexts in which the use of the <tt>volatile</tt> keyword are not sufficient to avoid race conditions and the <tt>synchronized</tt> keyword is required to ensure program consistency.


==Further Reading==
==Further Reading== <!--T:13-->
Scott Oaks and Henry Wong, ''Java Threads: Understanding and Mastering Concurrent Programming'' (3rd edition) (O'Reilly, 2012)
Scott Oaks and Henry Wong, ''Java Threads: Understanding and Mastering Concurrent Programming'' (3rd edition) (O'Reilly, 2012)
</translate>
</translate>
Bureaucrats, cc_docs_admin, cc_staff
2,306

edits

Navigation menu