Vtune: Difference between revisions
m (→MPI example) |
mNo edit summary |
||
(5 intermediate revisions by 3 users not shown) | |||
Line 9: | Line 9: | ||
[https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html VTune] is Intel's Performance Analysis tool for applications and systems. It is capable of [https://software.intel.com/content/www/us/en/develop/documentation/itac-vtune-mpi-openmp-tutorial-lin/top.html Analyzing both OpenMP and MPI] based applications. | [https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html VTune] is Intel's Performance Analysis tool for applications and systems. It is capable of [https://software.intel.com/content/www/us/en/develop/documentation/itac-vtune-mpi-openmp-tutorial-lin/top.html Analyzing both OpenMP and MPI] based applications. | ||
= Software | = Software module = <!--T:2--> | ||
<!--T:21--> | <!--T:21--> | ||
To load the module on any | To load the module on any Alliance cluster run: | ||
{{Command|module load vtune}} | {{Command|module load vtune}} | ||
= Tool | = Tool renaming = <!--T:3--> | ||
<!--T:31--> | <!--T:31--> | ||
The content of this | The content of this page is largely concerned with the legacy version named Intel® VTune™ Amplifier. Please note this tool has been renamed throughout Intel's documentation in latest versions (newer than the latest <tt>vtune</tt> module versions presently available on Alliance clusters) from Intel® VTune™ Amplifier to Intel® VTune™ Profiler. Likewise the application commands amplxe-cl and amplxe-gui have been renamed to vtune and vtune-gui for both the command line and graphical tools respectively. Further information can be found [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/launch.html here]. | ||
= | = Analysis types = <!--T:4--> | ||
<!--T:41--> | <!--T:41--> | ||
To collect analysis information run: | To collect analysis information run: | ||
{{Command| | {{Command|vtune -collect <analysis-type> <target_exe> <exe_arguments>}} | ||
where <analysis-type> should be replaced by one of the available analysis, e.g. hotspots, and <target_exe> is the path to the executable you would like to analyze. It is recommended to compile your executable with the "-g" option and to use the same optimization level as normal so as to obtain accurate results. A listing of version specific argument options and several usage examples maybe displayed on the command line by running <code> | where <analysis-type> should be replaced by one of the available analysis, e.g. hotspots, and <target_exe> is the path to the executable you would like to analyze. It is recommended to compile your executable with the "-g" option and to use the same optimization level as normal so as to obtain accurate results. A listing of version specific argument options and several usage examples maybe displayed on the command line by running <code>vtune -help</code>, after loading the vtune module. Complete downloadable documentation for Parallel Studio XE (including VTune) for all recent versions can be found [https://software.intel.com/content/www/us/en/develop/articles/download-documentation-intel-parallel-studio-xe-current-previous.html here]. The latest version of the Intel VTune Profiler User Guide may be found [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top.html here]. | ||
= Create | = Create reports = <!--T:5--> | ||
<!--T:51--> | <!--T:51--> | ||
To create a report run this command: | To create a report run this command: | ||
{{Command| | {{Command|vtune -report <report-type> }} | ||
where <report-type> is the type of the report to generate, e.g. hotspots. See also: | where <report-type> is the type of the report to generate, e.g. hotspots. See also: | ||
* [https://software.intel.com/en-us/vtune-amplifier-help-generating-command-line-reports https://software.intel.com/en-us/vtune-amplifier-help-generating-command-line-reports] | * [https://software.intel.com/en-us/vtune-amplifier-help-generating-command-line-reports https://software.intel.com/en-us/vtune-amplifier-help-generating-command-line-reports] | ||
= Matrix | = Matrix example = <!--T:6--> | ||
<!--T:61--> | <!--T:61--> | ||
Line 42: | Line 42: | ||
<!--T:62--> | <!--T:62--> | ||
salloc --time=1:00:00 --cpus-per-task=4 --ntasks=1 --mem=16G --account=def-yours | salloc --time=1:00:00 --cpus-per-task=4 --ntasks=1 --mem=16G --account=def-yours | ||
module load StdEnv/ | module load StdEnv/2020 vtune | ||
cp -a $EBROOTVTUNE/vtune/$EBVERSIONVTUNE*/samples/en/C++/matrix . cd matrix/linux | |||
cp -a $EBROOTVTUNE/ | |||
make icc | make icc | ||
vtune -collect hotspots ../matrix | |||
vtune -report summary | |||
<!--T:63--> | <!--T:63--> | ||
The latest version of matrix_multiply (uses cmake to build) can be found [https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/VTuneProfiler here]. | The latest version of matrix_multiply (uses cmake to build) can be found [https://github.com/oneapi-src/oneAPI-samples/tree/master/Tools/VTuneProfiler here]. | ||
= Graphical | = Graphical mode = <!--T:7--> | ||
<!--T:71--> | <!--T:71--> | ||
The Intel Matrix Sample Project | The Intel Matrix Sample Project can also be run using Vtune in GUI mode as explored here [https://software.intel.com/content/www/us/en/develop/documentation/vtune-hotspots-tutorial-linux-c/top/run-hotspots-analysis.html]. To run VTune over VNC follow the below directions depending on which system you wish to use. Running VTune graphically can be useful to generate command line configurations as discussed in [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/control-data-collection/generating-command-line-configuration-from-gui.html]. | ||
== Cluster | == Cluster nodes == <!--T:72--> | ||
<!--T:721--> | <!--T:721--> | ||
# Connect to a cluster compute or login node with [https://docs. | # Connect to a cluster compute or login node with [https://docs.alliancecan.ca/wiki/VNC#Connect TigerVNC] | ||
# <code>module load StdEnv/ | # <code>module load StdEnv/2020 vtune</code> | ||
# <code> | # <code>vtune-gui</code><br> | ||
== VDI | == VDI nodes == <!--T:73--> | ||
<!--T:731--> | <!--T:731--> | ||
# Connect to gra-vdi. | # Connect to gra-vdi.alliancecan.ca with [https://docs.alliancecan.ca/wiki/VNC#VDI_Nodes TigerVNC] | ||
# <code>module load CcEnv StdEnv/ | # <code>module load CcEnv StdEnv/2020 vtune</code> | ||
# <code> | # <code>vtune-gui</code><br> | ||
</translate> | </translate> | ||
= MPI example = | |||
First, load the latest VTune module. | First, load the latest VTune module. |
Latest revision as of 19:54, 12 February 2024
Introduction[edit]
VTune is Intel's Performance Analysis tool for applications and systems. It is capable of Analyzing both OpenMP and MPI based applications.
Software module[edit]
To load the module on any Alliance cluster run:
[name@server ~]$ module load vtune
Tool renaming[edit]
The content of this page is largely concerned with the legacy version named Intel® VTune™ Amplifier. Please note this tool has been renamed throughout Intel's documentation in latest versions (newer than the latest vtune module versions presently available on Alliance clusters) from Intel® VTune™ Amplifier to Intel® VTune™ Profiler. Likewise the application commands amplxe-cl and amplxe-gui have been renamed to vtune and vtune-gui for both the command line and graphical tools respectively. Further information can be found here.
Analysis types[edit]
To collect analysis information run:
[name@server ~]$ vtune -collect <analysis-type> <target_exe> <exe_arguments>
where <analysis-type> should be replaced by one of the available analysis, e.g. hotspots, and <target_exe> is the path to the executable you would like to analyze. It is recommended to compile your executable with the "-g" option and to use the same optimization level as normal so as to obtain accurate results. A listing of version specific argument options and several usage examples maybe displayed on the command line by running vtune -help
, after loading the vtune module. Complete downloadable documentation for Parallel Studio XE (including VTune) for all recent versions can be found here. The latest version of the Intel VTune Profiler User Guide may be found here.
Create reports[edit]
To create a report run this command:
[name@server ~]$ vtune -report <report-type>
where <report-type> is the type of the report to generate, e.g. hotspots. See also:
Matrix example[edit]
Analyze and generate a summary report for the Intel Matrix Sample Project run from the command line with 4 cores:
salloc --time=1:00:00 --cpus-per-task=4 --ntasks=1 --mem=16G --account=def-yours module load StdEnv/2020 vtune cp -a $EBROOTVTUNE/vtune/$EBVERSIONVTUNE*/samples/en/C++/matrix . cd matrix/linux make icc vtune -collect hotspots ../matrix vtune -report summary
The latest version of matrix_multiply (uses cmake to build) can be found here.
Graphical mode[edit]
The Intel Matrix Sample Project can also be run using Vtune in GUI mode as explored here [1]. To run VTune over VNC follow the below directions depending on which system you wish to use. Running VTune graphically can be useful to generate command line configurations as discussed in [2].
Cluster nodes[edit]
- Connect to a cluster compute or login node with TigerVNC
module load StdEnv/2020 vtune
vtune-gui
VDI nodes[edit]
- Connect to gra-vdi.alliancecan.ca with TigerVNC
module load CcEnv StdEnv/2020 vtune
vtune-gui
MPI example[edit]
First, load the latest VTune module.
module load StdEnv/2020 module load vtune
Then compile your MPI program as you usually would and run it inside a job or in an interactive session started by a salloc command using:
srun aps your_mpi_program.x
After the program finishes, the profiling data will be stored in a directory called aps_result_YYYYMMDD where YYYYMMDD is the current date.
There is a lot of information you can extract from that data. To get the basic summary report of your program's performance, run:
aps-report -D aps_result_YYYYMMDD
where you would replace YYYYMMDD to match the actual directory that has been created. This command creates an HTML file, which can be copied to your own computer and viewed in a browser. The report will clearly identify performance issues that are affecting your code.