[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
2.5 A Full Example
Let us consider the allocation problem of a Gaussian elimination, i.e. we want to distribute the various statement instances of the compute kernel onto different processors. The original code is the following:
for (i=1;j<=N-1;i++) { for (j=i+1;j<=N;j++) { c[i][j] = a[j][i]/a[i][i] ; /* S1 */ for (k=i+1;k<=N;k++) { a[j][k] -= c[i][j]*a[i][k] ; /* S2 */ } } }
The best affine allocation functions can be found by any good automatic parallelizer like LooPo (see Gri04):
T_S1(i,j)^T = (i) T_S2(i,j,k)^T = (k)
To ensure that on each processor, the set of statement instances is executed according to the original ordering, we add as minor scattering dimensions the original scheduling (see section Defining a Scanning Order: Scattering Functions):
T_S1(i,j)^T = (i,0,i,0,j,0)^T T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T
To ensure that the scattering functions have the same dimensionality, we complete the first function with zeroes (this is a CLooG 0.18.0-UNKNOWN and previous versions requirement, it should be removed in a future version, don’t worry it’s absolutely legal !):
T_S1(i,j)^T = (i,0,i,0,j,0,0,0)^T T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T
The input file corresponding to this code generation problem
could be (this file is provided in the CLooG distribution
as test/manual_gauss.cloog
:
# ---------------------- CONTEXT ---------------------- c # language is C # Context (no constraints on one parameter) 1 3 # 1 line and 3 columns # eq/in n 1 1 0 0 # 0 >= 0, always true 1 # We want to set manually the parameter name n # parameter name # --------------------- STATEMENTS -------------------- 2 # Number of statements 1 # First statement: one domain 4 5 # 4 lines and 3 columns # eq/in i j n 1 1 1 0 0 -1 # i >= 1 1 -1 0 1 -1 # i <= n-1 1 -1 1 0 -1 # j >= i+1 1 0 -1 1 0 # j <= n 0 0 0 # for future options 1 # Second statement: one domain 6 6 # 6 lines and 3 columns # eq/in i j k n 1 1 1 0 0 0 -1 # i >= 1 1 -1 0 0 1 -1 # i <= n-1 1 -1 1 0 0 -1 # j >= i+1 1 0 -1 0 1 0 # j <= n 1 -1 0 1 0 -1 # k >= i+1 1 0 0 -1 1 0 # k <= n 0 0 0 # for future options 0 # We let CLooG set the iterator names # --------------------- SCATTERING -------------------- 2 # Scattering functions # First function 8 13 # 3 lines and 3 columns # eq/in p1 p2 p3 p4 p5 p6 p7 p8 i j n 1 0 1 0 0 0 0 0 0 0 -1 0 0 0 # p1 = i 0 0 1 0 0 0 0 0 0 0 0 0 0 # p2 = 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 # p3 = i 0 0 0 0 1 0 0 0 0 0 0 0 0 # p4 = 0 0 0 0 0 0 1 0 0 0 0 -1 0 0 # p5 = j 0 0 0 0 0 0 1 0 0 0 0 0 0 # p6 = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 # p7 = 0 0 0 0 0 0 0 0 0 1 0 0 0 0 # p8 = 0 # Second function 8 14 # 3 lines and 3 columns # eq/in p1 p2 p3 p4 p5 p6 p7 p8 i j k n 1 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 # p1 = k 0 0 1 0 0 0 0 0 0 0 0 0 0 0 # p2 = 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 # p3 = i 0 0 0 0 1 0 0 0 0 0 0 0 0 0 # p4 = 0 0 0 0 0 0 1 0 0 0 0 -1 0 0 0 # p5 = j 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 # p6 = 1 0 0 0 0 0 0 0 1 0 0 0 -1 0 0 # p7 = k 0 0 0 0 0 0 0 0 1 0 0 0 0 0 # p8 = 0 1 # We want to set manually the scattering dimension names p1 p2 p3 p4 p5 p6 p7 p8 # scattering dimension names
Calling CLooG, with for instance the command line
cloog -fsp 2 gauss.cloog
for a better view
of the allocation (the processor number is given by p1
),
will result on the following target code that actually implements
the transformation. A minor processing on the dimension p1
to implement, e.g., MPI calls, which is not shown here may
result in dramatic speedups !
if (n >= 2) { p1 = 1 ; for (p5=2;p5<=n;p5++) { S1(i = 1,j = p5) ; } } for (p1=2;p1<=n-1;p1++) { for (p3=1;p3<=p1-1;p3++) { for (p5=p3+1;p5<=n;p5++) { S2(i = p3,j = p5,k = p1) ; } } for (p5=p1+1;p5<=n;p5++) { S1(i = p1,j = p5) ; } } if (n >= 2) { p1 = n ; for (p3=1;p3<=n-1;p3++) { for (p5=p3+1;p5<=n;p5++) { S2(i = p3,j = p5,k = n) ; } } }
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on August 20, 2013 using texi2html 5.0.