Translations:OpenACC Tutorial - Optimizing loops/19/en
Jump to navigation
Jump to search
As instructed in the third section of this tutorial, open the NVidia Visual Profiler and start a new session with the latest executable we have built. Then, follow the following steps (see beside for screenshots of each step):
- Go in in the "Analysis" tab, and click on "Examine GPU Usage". Once the analysis is run, the profiler gives you a series of warning. This gives you indications on what it might be possible to improve upon.
- Then click on "Examine Individual Kernels". This will show you a list of kernels.
- Select the top one, and click on "Perform Kernel Analysis". The profiler will show you a more detailed analysis of this specific kernel, highlighting the most likely bottleneck. In this case, the performance is limited by memory latency.
- Click on "Perform Latency Analysis"