cc_staff
782
edits
(Making this section a subsection) |
(Updated example with Minfo) |
||
Line 127: | Line 127: | ||
From the above output, the <code>matvec()</code> function is responsible for 83.5% of the execution time, and this function call can be found in the <code>main()</code> function. | From the above output, the <code>matvec()</code> function is responsible for 83.5% of the execution time, and this function call can be found in the <code>main()</code> function. | ||
== Compiler Feedback | == Compiler Feedback == <!--T:16--> | ||
Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions: | Before working on the routine, we need to understand what the compiler is actually doing by asking ourselves the following questions: | ||
* What optimizations were applied? | * What optimizations were applied automatically by the compiler? | ||
* What prevented further optimizations? | * What prevented further optimizations? | ||
* Can very minor modifications of the code affect performance? | * Can very minor modifications of the code affect performance? | ||
<!--T:17--> | <!--T:17--> | ||
The | The NVIDIA compiler offers a <code>-Minfo</code> flag with the following options: | ||
* accel | * <code>all</code> - Print almost all types of compilation information, including: | ||
* | ** <code>accel</code> - Print compiler operations related to the accelerator | ||
* intensity | ** <code>inline</code> - Print information about functions extracted and inlined | ||
* | ** <code>loop,mp,par,stdpar,vect</code> - Print various information about loop optimization and vectorization | ||
* <code>intensity</code> - Print loop intensity information | |||
* (none) - If <code>-Minfo</code> is used without any option, it is the same as with the <code>all</code> option, but without the <code>inline</code> information | |||
=== How to Enable Compiler Feedback === <!--T:18--> | === How to Enable Compiler Feedback === <!--T:18--> | ||
* Edit the Makefile | * Edit the <code>Makefile</code>: | ||
CXX=nvc++ | CXX=nvc++ | ||
CXXFLAGS=-fast -Minfo=all,intensity | CXXFLAGS=-fast -Minfo=all,intensity | ||
LDFLAGS=${CXXFLAGS} | LDFLAGS=${CXXFLAGS} | ||
* Rebuild | * Rebuild | ||
</translate> | </translate> | ||
Line 150: | Line 153: | ||
|make clean; make | |make clean; make | ||
|result= | |result= | ||
nvc++ -fast -Minfo=all,intensity | ... | ||
nvc++ -fast -Minfo=all,intensity -c -o main.o main.cpp | |||
initialize_vector(vector &, double): | initialize_vector(vector &, double): | ||
20, include "vector.h" | 20, include "vector.h" | ||
Line 159: | Line 163: | ||
27, Intensity = 1.00 | 27, Intensity = 1.00 | ||
Generated vector simd code for the loop containing reductions | Generated vector simd code for the loop containing reductions | ||
28, FMA (fused multiply-add) instruction(s) generated | |||
waxpby(double, const vector &, double, const vector &, const vector &): | waxpby(double, const vector &, double, const vector &, const vector &): | ||
21, include "vector_functions.h" | 21, include "vector_functions.h" | ||
Line 167: | Line 171: | ||
Loop unrolled 2 times | Loop unrolled 2 times | ||
FMA (fused multiply-add) instruction(s) generated | FMA (fused multiply-add) instruction(s) generated | ||
40, FMA (fused multiply-add) instruction(s) generated | |||
allocate_3d_poisson_matrix(matrix &, int): | allocate_3d_poisson_matrix(matrix &, int): | ||
22, include "matrix.h" | 22, include "matrix.h" | ||
Line 174: | Line 179: | ||
Loop not vectorized/parallelized: loop count too small | Loop not vectorized/parallelized: loop count too small | ||
45, Intensity = 0.0 | 45, Intensity = 0.0 | ||
Loop unrolled 3 times (completely unrolled) | |||
57, Intensity = 0.0 | 57, Intensity = 0.0 | ||
59, Intensity = 0.0 | 59, Intensity = 0.0 | ||
Line 180: | Line 186: | ||
23, include "matrix_functions.h" | 23, include "matrix_functions.h" | ||
29, Intensity = (num_rows*((row_end-row_start)* 2))/(num_rows+(num_rows+(num_rows+((row_end-row_start)+(row_end-row_start))))) | 29, Intensity = (num_rows*((row_end-row_start)* 2))/(num_rows+(num_rows+(num_rows+((row_end-row_start)+(row_end-row_start))))) | ||
33, Intensity = 1.00 | 33, Intensity = 1.00 | ||
Generated vector simd code for the loop containing reductions | |||
37, FMA (fused multiply-add) instruction(s) generated | |||
main: | main: | ||
38, allocate_3d_poisson_matrix(matrix &, int) inlined, size=41 (inline) file main.cpp (29) | 38, allocate_3d_poisson_matrix(matrix &, int) inlined, size=41 (inline) file main.cpp (29) | ||
Line 193: | Line 196: | ||
Loop not vectorized/parallelized: loop count too small | Loop not vectorized/parallelized: loop count too small | ||
45, Intensity = 0.0 | 45, Intensity = 0.0 | ||
Loop unrolled 3 times (completely unrolled) | |||
57, Intensity = 0.0 | 57, Intensity = 0.0 | ||
Loop not fused: function call before adjacent loop | Loop not fused: function call before adjacent loop | ||
Line 204: | Line 208: | ||
48, initialize_vector(vector &, double) inlined, size=5 (inline) file main.cpp (34) | 48, initialize_vector(vector &, double) inlined, size=5 (inline) file main.cpp (34) | ||
36, Intensity = 0.0 | 36, Intensity = 0.0 | ||
Memory set idiom, loop replaced by call to __c_mset8 | |||
49, initialize_vector(vector &, double) inlined, size=5 (inline) file main.cpp (34) | 49, initialize_vector(vector &, double) inlined, size=5 (inline) file main.cpp (34) | ||
36, Intensity = 0.0 | 36, Intensity = 0.0 | ||
Memory set idiom, loop replaced by call to __c_mset8 | |||
52, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | 52, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | ||
39, Intensity = 0.0 | 39, Intensity = 0.0 | ||
Line 215: | Line 219: | ||
Loop not fused: different loop trip count | Loop not fused: different loop trip count | ||
33, Intensity = 1.00 | 33, Intensity = 1.00 | ||
Generated vector simd code for the loop containing reductions | |||
54, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | 54, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | ||
27, FMA (fused multiply-add) instruction(s) generated | 27, FMA (fused multiply-add) instruction(s) generated | ||
36, FMA (fused multiply-add) instruction(s) generated | |||
39, Intensity = 0.67 | 39, Intensity = 0.67 | ||
Loop not fused: different loop trip count | Loop not fused: different loop trip count | ||
Line 238: | Line 239: | ||
65, dot(const vector &, const vector &) inlined, size=9 (inline) file main.cpp (21) | 65, dot(const vector &, const vector &) inlined, size=9 (inline) file main.cpp (21) | ||
27, Intensity = 1.00 | 27, Intensity = 1.00 | ||
Loop not fused: different | Loop not fused: different controlling conditions | ||
Generated vector simd code for the loop containing reductions | Generated vector simd code for the loop containing reductions | ||
67, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | 67, waxpby(double, const vector &, double, const vector &, const vector &) inlined, size=10 (inline) file main.cpp (33) | ||
Line 250: | Line 251: | ||
Loop not fused: different loop trip count | Loop not fused: different loop trip count | ||
33, Intensity = 1.00 | 33, Intensity = 1.00 | ||
Generated vector simd code for the loop containing reductions | |||
73, dot(const vector &, const vector &) inlined, size=9 (inline) file main.cpp (21) | 73, dot(const vector &, const vector &) inlined, size=9 (inline) file main.cpp (21) | ||
27, Intensity = 1.00 | 27, Intensity = 1.00 | ||
Line 274: | Line 273: | ||
91, free_vector(vector &) inlined, size=2 (inline) file main.cpp (29) | 91, free_vector(vector &) inlined, size=2 (inline) file main.cpp (29) | ||
92, free_matrix(matrix &) inlined, size=5 (inline) file main.cpp (73) | 92, free_matrix(matrix &) inlined, size=5 (inline) file main.cpp (73) | ||
}} | }} | ||
<translate> | <translate> |