Bureaucrats, cc_docs_admin, cc_staff
337
edits
No edit summary |
No edit summary |
||
Line 35: | Line 35: | ||
This is the default operating mode for Nvprof. It outputs a single result line for each instruction such as a kernel function or CUDA memory copy/set performed by the application. For each kernel function, Nvprof outputs the total time of all instances of the kernel or type of memory copy as well as the average, minimum, and maximum time. | This is the default operating mode for Nvprof. It outputs a single result line for each instruction such as a kernel function or CUDA memory copy/set performed by the application. For each kernel function, Nvprof outputs the total time of all instances of the kernel or type of memory copy as well as the average, minimum, and maximum time. | ||
In this example, the application is <code>a.out</code> and we run Nvprof to get the profiling : | In this example, the application is <code>a.out</code> and we run Nvprof to get the profiling : | ||
{{Command|nvprof ./a.out}} | {{Command|nvprof ./a.out|Result | ||
[Matrix Multiply Using CUDA] - Starting... | |||
==27694== NVPROF is profiling process 27694, command: matrixMul | |||
GPU Device 0: "GeForce GT 640M LE" with compute capability 3.0 | |||
MatrixA(320,320), MatrixB(640,320) | |||
Computing result using CUDA Kernel... | |||
done | |||
Performance= 35.35 GFlop/s, Time= 3.708 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block | |||
Checking computed result for correctness: OK | |||
Note: For peak performance, please refer to the matrixMulCUBLAS example. | |||
==27694== Profiling application: matrixMul | |||
==27694== Profiling result: | |||
Time(%) Time Calls Avg Min Max Name | |||
99.94% 1.11524s 301 3.7051ms 3.6928ms 3.7174ms void matrixMulCUDA<int=32>(float*, float*, float*, int, int) | |||
0.04% 406.30us 2 203.15us 136.13us 270.18us [CUDA memcpy HtoD] | |||
0.02% 248.29us 1 248.29us 248.29us 248.29us [CUDA memcpy DtoH] | |||
}} |