Nvprof: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 35: Line 35:
This is the default operating mode for Nvprof. It outputs a single result line for each instruction such as  a kernel function or  CUDA memory copy/set performed by the application. For each kernel function, Nvprof outputs the total time of all instances of the kernel or type of memory copy as well as the average, minimum, and maximum time.
This is the default operating mode for Nvprof. It outputs a single result line for each instruction such as  a kernel function or  CUDA memory copy/set performed by the application. For each kernel function, Nvprof outputs the total time of all instances of the kernel or type of memory copy as well as the average, minimum, and maximum time.
In this example, the application is <code>a.out</code> and we run Nvprof to get the profiling :
In this example, the application is <code>a.out</code> and we run Nvprof to get the profiling :
{{Command|nvprof  ./a.out}}
{{Command|nvprof  ./a.out|Result
[Matrix Multiply Using CUDA] - Starting...
==27694== NVPROF is profiling process 27694, command: matrixMul
GPU Device 0: "GeForce GT 640M LE" with compute capability 3.0
 
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 35.35 GFlop/s, Time= 3.708 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: OK
 
Note: For peak performance, please refer to the matrixMulCUBLAS example.
==27694== Profiling application: matrixMul
==27694== Profiling result:
Time(%)      Time    Calls      Avg      Min      Max  Name
99.94%  1.11524s      301  3.7051ms  3.6928ms  3.7174ms  void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
  0.04%  406.30us        2  203.15us  136.13us  270.18us  [CUDA memcpy HtoD]
  0.02%  248.29us        1  248.29us  248.29us  248.29us  [CUDA memcpy DtoH]
}}
Bureaucrats, cc_docs_admin, cc_staff
337

edits

Navigation menu